neuralnets-2012-001 14_Lecture14 01_Learning_layers_of_features_by_stacking_RBMs_17_min_第1页
neuralnets-2012-001 14_Lecture14 01_Learning_layers_of_features_by_stacking_RBMs_17_min_第2页
neuralnets-2012-001 14_Lecture14 01_Learning_layers_of_features_by_stacking_RBMs_17_min_第3页
neuralnets-2012-001 14_Lecture14 01_Learning_layers_of_features_by_stacking_RBMs_17_min_第4页
neuralnets-2012-001 14_Lecture14 01_Learning_layers_of_features_by_stacking_RBMs_17_min_第5页
已阅读5页,还剩34页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

NeuralNetworksforMachineLearningLecture14aLearninglayersoffeaturesbystackingRBMs TrainingadeepnetworkbystackingRBMs Firsttrainalayeroffeaturesthatreceiveinputdirectlyfromthepixels Thentreattheactivationsofthetrainedfeaturesasiftheywerepixelsandlearnfeaturesoffeaturesinasecondhiddenlayer Thendoitagain Itcanbeprovedthateachtimeweaddanotherlayeroffeaturesweimproveavariationallowerboundonthelogprobabilityofgeneratingthetrainingdata Theproofiscomplicatedandonlyappliestounrealcases ItisbasedonaneatequivalencebetweenanRBMandaninfinitelydeepbeliefnet seelecture14b CombiningtwoRBMstomakeaDBN copybinarystateforeachv ComposethetwoRBMmodelstomakeasingleDBNmodel TrainthisRBMfirst ThentrainthisRBM It snotaBoltzmannmachine Thegenerativemodelafterlearning3layers Togeneratedata Getanequilibriumsamplefromthetop levelRBMbyperformingalternatingGibbssamplingforalongtime Performatop downpasstogetstatesforalltheotherlayers Thelowerlevelbottom upconnectionsarenotpartofthegenerativemodel Theyarejustusedforinference h2 data h1 h3 Anaside Averagingfactorialdistributions Ifyouaveragesomefactorialdistributions youdoNOTgetafactorialdistribution InanRBM theposteriorover4hiddenunitsisfactorialforeachvisiblevector Posteriorforv1 0 9 0 9 0 1 0 1Posteriorforv2 0 1 0 1 0 9 0 9Aggregated 0 5 0 5 0 5 0 5 Considerthebinaryvector1 1 0 0 intheposteriorforv1 p 1 1 0 0 0 9 4 0 43intheposteriorforv2 p 1 1 0 0 0 1 4 0001intheaggregatedposterior p 1 1 0 0 0 215 Iftheaggregatedposteriorwasfactorialitwouldhavep 0 5 4 Whydoesgreedylearningwork Theweights W inthebottomlevelRBMdefinemanydifferentdistributions p v h p h v p v h p h p v WecanexpresstheRBMmodelas Ifweleavep v h aloneandimprovep h wewillimprovep v Toimprovep h weneedittobeabettermodelthanp h W oftheaggregatedposteriordistributionoverhiddenvectorsproducedbyapplyingWtransposetothedata Fine tuningwithacontrastiveversionofthewake sleepalgorithm Afterlearningmanylayersoffeatures wecanfine tunethefeaturestoimprovegeneration 1 Doastochasticbottom uppassThenadjustthetop downweightsoflowerlayerstobegoodatreconstructingthefeatureactivitiesinthelayerbelow 2 DoafewiterationsofsamplinginthetoplevelRBM Thenadjusttheweightsinthetop levelRBMusingCD 3 Doastochastictop downpassThenAdjustthebottom upweightstobegoodatreconstructingthefeatureactivitiesinthelayerabove TheDBNusedformodelingthejointdistributionofMNISTdigitsandtheirlabels 2000units 500units 500units 28x28pixelimage 10labels Thefirsttwohiddenlayersarelearnedwithoutusinglabels ThetoplayerislearnedasanRBMformodelingthelabelsconcatenatedwiththefeaturesinthesecondhiddenlayer Theweightsarethenfine tunedtobeabettergenerativemodelusingcontrastivewake sleep NeuralNetworksforMachineLearningLecture14bDiscriminativefine tuningforDBNs Fine tuningfordiscrimination FirstlearnonelayeratatimebystackingRBMs Treatthisas pre training thatfindsagoodinitialsetofweightswhichcanthenbefine tunedbyalocalsearchprocedure Contrastivewake sleepisawayoffine tuningthemodeltobebetteratgeneration Backpropagationcanbeusedtofine tunethemodeltobebetteratdiscrimination Thisovercomesmanyofthelimitationsofstandardbackpropagation Itmakesiteasiertolearndeepnets Itmakesthenetsgeneralizebetter Whybackpropagationworksbetterwithgreedypre training Theoptimizationview Greedilylearningonelayeratatimescaleswelltoreallybignetworks especiallyifwehavelocalityineachlayer Wedonotstartbackpropagationuntilwealreadyhavesensiblefeaturedetectorsthatshouldalreadybeveryhelpfulforthediscriminationtask Sotheinitialgradientsaresensibleandbackpropagationonlyneedstoperformalocalsearchfromasensiblestartingpoint Whybackpropagationworksbetterwithgreedypre training Theoverfittingview Mostoftheinformationinthefinalweightscomesfrommodelingthedistributionofinputvectors Theinputvectorsgenerallycontainalotmoreinformationthanthelabels Thepreciousinformationinthelabelsisonlyusedforthefine tuning Thefine tuningonlymodifiesthefeaturesslightlytogetthecategoryboundariesright Itdoesnotneedtodiscovernewfeatures Thistypeofback propagationworkswellevenifmostofthetrainingdataisunlabeled Theunlabeleddataisstillveryusefulfordiscoveringgoodfeatures Anobjection Surely manyofthefeatureswillbeuselessforanyparticulardiscriminativetask considershape pose Buttheonesthatareusefulwillbemuchmoreusefulthantherawinputs First modelthedistributionofdigitimages 2000units 500units 500units 28x28pixelimage Thenetworklearnsadensitymodelforunlabeleddigitimages Whenwegeneratefromthemodelwegetthingsthatlooklikerealdigitsofallclasses Butdothehiddenfeaturesreallyhelpwithdigitdiscrimination Adda10 waysoftmaxatthetopanddobackpropagation ThetoptwolayersformarestrictedBoltzmannmachinewhoseenergylandscapeshouldmodelthelowdimensionalmanifoldsofthedigits Resultsonthepermutation invariantMNISTtask Backpropnetwithoneortwohiddenlayers Platt Hinton BackpropwithL2constraintsonincomingweightsSupportVectorMachines Decoste Schoelkopf 2002 Generativemodelofjointdensityofimagesandlabels generativefine tuning Generativemodelofunlabelleddigitsfollowedbygentlebackpropagation Hinton Salakhutdinov 2006 1 6 1 5 1 4 1 25 1 15 1 0 Errorrate Unsupervised pre training alsohelpsformodelsthathavemoredataandbetterpriors Ranzatoet al NIPS2006 usedanadditional600 000distorteddigits Theyalsousedconvolutionalmultilayerneuralnetworks Back propagationalone 0 49 Unsupervisedlayer by layerpre trainingfollowedbybackprop 0 39 recordatthetime PhonerecognitionontheTIMITbenchmark Mohamed Dahl Hinton 2009 2012 Afterstandardpost processingusingabi phonemodel adeepnetwith8layersgets20 7 errorrate Thebestpreviousspeaker independentresultonTIMITwas24 4 andthisrequiredaveragingseveralmodels LiDeng atMSR realisedthatthisresultcouldchangethewayspeechrecognitionwasdone Ithas 15framesof40filterbankoutputs theirtemporalderivatives 2000logistichiddenunits 2000logistichiddenunits 183HMM statelabels notpre trained 6morelayersofpre trainedweights http www bbc co uk news technology 20266427 NeuralNetworksforMachineLearningLecture14cWhathappensduringdiscriminativefine tuning LearningDynamicsofDeepNetsthenext4slidesdescribeworkbyYoshuaBengio sgroup Beforefine tuning Afterfine tuning EffectofUnsupervisedPre training Erhanet al AISTATS 2009 EffectofDepth w opre training withpre training withoutpre training Trajectoriesofthelearninginfunctionspace a2 Dvisualizationproducedwitht SNE EachpointisamodelinfunctionspaceColor epochTop trajectorieswithoutpre training Eachtrajectoryconvergestoadifferentlocalmin Bottom Trajectorieswithpre training Nooverlap Erhanet alAISTATS 2009 Whyunsupervisedpre trainingmakessense stuff image label stuff image label Ifimage labelpairsweregeneratedthisway itwouldmakesensetotrytogostraightfromimagestolabels Forexample dothepixelshaveevenparity Ifimage labelpairsaregeneratedthisway itmakessensetofirstlearntorecoverthestuffthatcausedtheimagebyinvertingthehighbandwidthpathway highbandwidth lowbandwidth NeuralNetworksforMachineLearningLecture14dModelingreal valueddatawithanRBM Modelingreal valueddata Forimagesofdigits intermediateintensitiescanberepresentedasiftheywereprobabilitiesbyusing mean field logisticunits Wetreatintermediatevaluesastheprobabilitythatthepixelisinked Thiswillnotworkforrealimages Inarealimage theintensityofapixelisalmostalways almostexactlytheaverageoftheneighboringpixels Mean fieldlogisticunitscannotrepresentpreciseintermediatevalues Astandardtypeofreal valuedvisibleunit ModelpixelsasGaussianvariables AlternatingGibbssamplingisstilleasy thoughlearningneedstobemuchslower E energy gradientproducedbythetotalinputtoavisibleunit paraboliccontainmentfunction Gaussian BinaryRBM s Lotsofpeoplehavefailedtogetthesetoworkproperly Itsextremelyhardtolearntightvariancesforthevisibleunits Ittookalongtimeforustofigureoutwhyitissohardtolearnthevisiblevariances Whensigmaissmall weneedmanymorehiddenunitsthanvisibleunits Thisallowssmallweightstoproducebigtop downeffects Whensigmaismuchlessthan1 thebottom upeffectsaretoobigandthetop downeffectsaretoosmall Steppedsigmoidunits Aneatwaytoimplementintegervalues Makemanycopiesofastochasticbinaryunit Allcopieshavethesameweightsandthesameadaptivebias b buttheyhavedifferentfixedoffsetstothebias Fastapproximations Contrastivedivergencelearningworkswellforthesumofstochasticlogisticunitswithoffsetbiases ThenoisevarianceisItalsoworksforrectifiedlinearunits Thesearemuchfastertocomputethanthesumofmanylogisticunitswithdifferentbiases Anicepropertyofrectifiedlinearunits Ifareluhasabiasofzero itexhibitsscaleequivariance Thisisaverynicepropertytohaveforimages Itisliketheequivariancetotranslationexhibitedbyconvolutionalnets NeuralNetworksforMachineLearningLecture14eRBMsareInfiniteSigmoidBeliefNetsADVANCEDMATERIAL NOTONQUIZZESORFINALTEST Anotherviewofwhylayer by layerlearningworks Hinton Osindero Teh2006 ThereisanunexpectedequivalencebetweenRBM sanddirectednetworkswithmanylayersthatallsharethesameweightmatrix Thisequivalencealsogivesinsightintowhycontrastivedivergencelearningworks AnRBMisactuallyjustaninfinitelydeepsigmoidbeliefnetwithalotofweightsharing TheMarkovchainwerunwhenwewanttosamplefromtheequilibriumdistributionofanRBMcanbeviewedasasigmoidbeliefnet AninfinitesigmoidbeliefnetthatisequivalenttoanRBM Thedistributiongeneratedbythisinfinitedirectednetwithreplicatedweightsistheequilibriumdistributionforacompatiblepairofconditionaldistributions p v h andp h v thatarebothdefinedbyWAtop downpassofthedirectednetisexactlyequivalenttolettingaRestrictedBoltzmannMachinesettletoequilibrium SothisinfinitedirectednetdefinesthesamedistributionasanRBM v1 h1 v0 h0 v2 h2 etc Thevariablesinh0areconditionallyindependentgivenv0 Inferenceistrivial Justmultiplyv0byThemodelaboveh0implementsacomplementaryprior Multiplyingv0bygivestheproductofthelikelihoodtermandthepriorterm Thecomplementarypriorcancelstheexplainingaway InferenceinthedirectednetisexactlyequivalenttolettinganRBMsettletoequilibriumstartingatthedata Inferenceinaninfinitesigmoidbeliefnet v1 h1 v0 h0 v2 h2 etc Thelearningruleforasigmoidbeliefnetis Withreplicatedweightsthisrulebecomes v1 h1 v0 h0 v2 h2 etc isanunbiasedsamplefrom Learningadeepdirectednetwork Firstlearnwithalltheweightstied ThisisexactlyequivalenttolearninganRBM Thinkofthesymmetricconnectionsasashorthandnotationforaninfinitedirectednetwithtiedweights Weoughttousemaximumlikelihoodlearning butweuseCD1asashortcut v1 h1 v0 h0 v2 h2 etc v0 h0 Thenfreezethefirstlayerofweightsinbothdirectionsandlearntheremainingweights stilltiedtogether ThisisequivalenttolearninganotherRBM usingtheaggregatedposteriordistributionofh0asthedata v1 h1 v0 h0 v2 h2 etc v1 h0 Whathappenswhentheweightsinhigherlayersbecomedifferentfromtheweightsinthefirstlayer Thehigherlayersnolongerimplem

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论