




已阅读5页,还剩34页未读, 继续免费阅读
版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
NeuralNetworksforMachineLearningLecture14aLearninglayersoffeaturesbystackingRBMs TrainingadeepnetworkbystackingRBMs Firsttrainalayeroffeaturesthatreceiveinputdirectlyfromthepixels Thentreattheactivationsofthetrainedfeaturesasiftheywerepixelsandlearnfeaturesoffeaturesinasecondhiddenlayer Thendoitagain Itcanbeprovedthateachtimeweaddanotherlayeroffeaturesweimproveavariationallowerboundonthelogprobabilityofgeneratingthetrainingdata Theproofiscomplicatedandonlyappliestounrealcases ItisbasedonaneatequivalencebetweenanRBMandaninfinitelydeepbeliefnet seelecture14b CombiningtwoRBMstomakeaDBN copybinarystateforeachv ComposethetwoRBMmodelstomakeasingleDBNmodel TrainthisRBMfirst ThentrainthisRBM It snotaBoltzmannmachine Thegenerativemodelafterlearning3layers Togeneratedata Getanequilibriumsamplefromthetop levelRBMbyperformingalternatingGibbssamplingforalongtime Performatop downpasstogetstatesforalltheotherlayers Thelowerlevelbottom upconnectionsarenotpartofthegenerativemodel Theyarejustusedforinference h2 data h1 h3 Anaside Averagingfactorialdistributions Ifyouaveragesomefactorialdistributions youdoNOTgetafactorialdistribution InanRBM theposteriorover4hiddenunitsisfactorialforeachvisiblevector Posteriorforv1 0 9 0 9 0 1 0 1Posteriorforv2 0 1 0 1 0 9 0 9Aggregated 0 5 0 5 0 5 0 5 Considerthebinaryvector1 1 0 0 intheposteriorforv1 p 1 1 0 0 0 9 4 0 43intheposteriorforv2 p 1 1 0 0 0 1 4 0001intheaggregatedposterior p 1 1 0 0 0 215 Iftheaggregatedposteriorwasfactorialitwouldhavep 0 5 4 Whydoesgreedylearningwork Theweights W inthebottomlevelRBMdefinemanydifferentdistributions p v h p h v p v h p h p v WecanexpresstheRBMmodelas Ifweleavep v h aloneandimprovep h wewillimprovep v Toimprovep h weneedittobeabettermodelthanp h W oftheaggregatedposteriordistributionoverhiddenvectorsproducedbyapplyingWtransposetothedata Fine tuningwithacontrastiveversionofthewake sleepalgorithm Afterlearningmanylayersoffeatures wecanfine tunethefeaturestoimprovegeneration 1 Doastochasticbottom uppassThenadjustthetop downweightsoflowerlayerstobegoodatreconstructingthefeatureactivitiesinthelayerbelow 2 DoafewiterationsofsamplinginthetoplevelRBM Thenadjusttheweightsinthetop levelRBMusingCD 3 Doastochastictop downpassThenAdjustthebottom upweightstobegoodatreconstructingthefeatureactivitiesinthelayerabove TheDBNusedformodelingthejointdistributionofMNISTdigitsandtheirlabels 2000units 500units 500units 28x28pixelimage 10labels Thefirsttwohiddenlayersarelearnedwithoutusinglabels ThetoplayerislearnedasanRBMformodelingthelabelsconcatenatedwiththefeaturesinthesecondhiddenlayer Theweightsarethenfine tunedtobeabettergenerativemodelusingcontrastivewake sleep NeuralNetworksforMachineLearningLecture14bDiscriminativefine tuningforDBNs Fine tuningfordiscrimination FirstlearnonelayeratatimebystackingRBMs Treatthisas pre training thatfindsagoodinitialsetofweightswhichcanthenbefine tunedbyalocalsearchprocedure Contrastivewake sleepisawayoffine tuningthemodeltobebetteratgeneration Backpropagationcanbeusedtofine tunethemodeltobebetteratdiscrimination Thisovercomesmanyofthelimitationsofstandardbackpropagation Itmakesiteasiertolearndeepnets Itmakesthenetsgeneralizebetter Whybackpropagationworksbetterwithgreedypre training Theoptimizationview Greedilylearningonelayeratatimescaleswelltoreallybignetworks especiallyifwehavelocalityineachlayer Wedonotstartbackpropagationuntilwealreadyhavesensiblefeaturedetectorsthatshouldalreadybeveryhelpfulforthediscriminationtask Sotheinitialgradientsaresensibleandbackpropagationonlyneedstoperformalocalsearchfromasensiblestartingpoint Whybackpropagationworksbetterwithgreedypre training Theoverfittingview Mostoftheinformationinthefinalweightscomesfrommodelingthedistributionofinputvectors Theinputvectorsgenerallycontainalotmoreinformationthanthelabels Thepreciousinformationinthelabelsisonlyusedforthefine tuning Thefine tuningonlymodifiesthefeaturesslightlytogetthecategoryboundariesright Itdoesnotneedtodiscovernewfeatures Thistypeofback propagationworkswellevenifmostofthetrainingdataisunlabeled Theunlabeleddataisstillveryusefulfordiscoveringgoodfeatures Anobjection Surely manyofthefeatureswillbeuselessforanyparticulardiscriminativetask considershape pose Buttheonesthatareusefulwillbemuchmoreusefulthantherawinputs First modelthedistributionofdigitimages 2000units 500units 500units 28x28pixelimage Thenetworklearnsadensitymodelforunlabeleddigitimages Whenwegeneratefromthemodelwegetthingsthatlooklikerealdigitsofallclasses Butdothehiddenfeaturesreallyhelpwithdigitdiscrimination Adda10 waysoftmaxatthetopanddobackpropagation ThetoptwolayersformarestrictedBoltzmannmachinewhoseenergylandscapeshouldmodelthelowdimensionalmanifoldsofthedigits Resultsonthepermutation invariantMNISTtask Backpropnetwithoneortwohiddenlayers Platt Hinton BackpropwithL2constraintsonincomingweightsSupportVectorMachines Decoste Schoelkopf 2002 Generativemodelofjointdensityofimagesandlabels generativefine tuning Generativemodelofunlabelleddigitsfollowedbygentlebackpropagation Hinton Salakhutdinov 2006 1 6 1 5 1 4 1 25 1 15 1 0 Errorrate Unsupervised pre training alsohelpsformodelsthathavemoredataandbetterpriors Ranzatoet al NIPS2006 usedanadditional600 000distorteddigits Theyalsousedconvolutionalmultilayerneuralnetworks Back propagationalone 0 49 Unsupervisedlayer by layerpre trainingfollowedbybackprop 0 39 recordatthetime PhonerecognitionontheTIMITbenchmark Mohamed Dahl Hinton 2009 2012 Afterstandardpost processingusingabi phonemodel adeepnetwith8layersgets20 7 errorrate Thebestpreviousspeaker independentresultonTIMITwas24 4 andthisrequiredaveragingseveralmodels LiDeng atMSR realisedthatthisresultcouldchangethewayspeechrecognitionwasdone Ithas 15framesof40filterbankoutputs theirtemporalderivatives 2000logistichiddenunits 2000logistichiddenunits 183HMM statelabels notpre trained 6morelayersofpre trainedweights http www bbc co uk news technology 20266427 NeuralNetworksforMachineLearningLecture14cWhathappensduringdiscriminativefine tuning LearningDynamicsofDeepNetsthenext4slidesdescribeworkbyYoshuaBengio sgroup Beforefine tuning Afterfine tuning EffectofUnsupervisedPre training Erhanet al AISTATS 2009 EffectofDepth w opre training withpre training withoutpre training Trajectoriesofthelearninginfunctionspace a2 Dvisualizationproducedwitht SNE EachpointisamodelinfunctionspaceColor epochTop trajectorieswithoutpre training Eachtrajectoryconvergestoadifferentlocalmin Bottom Trajectorieswithpre training Nooverlap Erhanet alAISTATS 2009 Whyunsupervisedpre trainingmakessense stuff image label stuff image label Ifimage labelpairsweregeneratedthisway itwouldmakesensetotrytogostraightfromimagestolabels Forexample dothepixelshaveevenparity Ifimage labelpairsaregeneratedthisway itmakessensetofirstlearntorecoverthestuffthatcausedtheimagebyinvertingthehighbandwidthpathway highbandwidth lowbandwidth NeuralNetworksforMachineLearningLecture14dModelingreal valueddatawithanRBM Modelingreal valueddata Forimagesofdigits intermediateintensitiescanberepresentedasiftheywereprobabilitiesbyusing mean field logisticunits Wetreatintermediatevaluesastheprobabilitythatthepixelisinked Thiswillnotworkforrealimages Inarealimage theintensityofapixelisalmostalways almostexactlytheaverageoftheneighboringpixels Mean fieldlogisticunitscannotrepresentpreciseintermediatevalues Astandardtypeofreal valuedvisibleunit ModelpixelsasGaussianvariables AlternatingGibbssamplingisstilleasy thoughlearningneedstobemuchslower E energy gradientproducedbythetotalinputtoavisibleunit paraboliccontainmentfunction Gaussian BinaryRBM s Lotsofpeoplehavefailedtogetthesetoworkproperly Itsextremelyhardtolearntightvariancesforthevisibleunits Ittookalongtimeforustofigureoutwhyitissohardtolearnthevisiblevariances Whensigmaissmall weneedmanymorehiddenunitsthanvisibleunits Thisallowssmallweightstoproducebigtop downeffects Whensigmaismuchlessthan1 thebottom upeffectsaretoobigandthetop downeffectsaretoosmall Steppedsigmoidunits Aneatwaytoimplementintegervalues Makemanycopiesofastochasticbinaryunit Allcopieshavethesameweightsandthesameadaptivebias b buttheyhavedifferentfixedoffsetstothebias Fastapproximations Contrastivedivergencelearningworkswellforthesumofstochasticlogisticunitswithoffsetbiases ThenoisevarianceisItalsoworksforrectifiedlinearunits Thesearemuchfastertocomputethanthesumofmanylogisticunitswithdifferentbiases Anicepropertyofrectifiedlinearunits Ifareluhasabiasofzero itexhibitsscaleequivariance Thisisaverynicepropertytohaveforimages Itisliketheequivariancetotranslationexhibitedbyconvolutionalnets NeuralNetworksforMachineLearningLecture14eRBMsareInfiniteSigmoidBeliefNetsADVANCEDMATERIAL NOTONQUIZZESORFINALTEST Anotherviewofwhylayer by layerlearningworks Hinton Osindero Teh2006 ThereisanunexpectedequivalencebetweenRBM sanddirectednetworkswithmanylayersthatallsharethesameweightmatrix Thisequivalencealsogivesinsightintowhycontrastivedivergencelearningworks AnRBMisactuallyjustaninfinitelydeepsigmoidbeliefnetwithalotofweightsharing TheMarkovchainwerunwhenwewanttosamplefromtheequilibriumdistributionofanRBMcanbeviewedasasigmoidbeliefnet AninfinitesigmoidbeliefnetthatisequivalenttoanRBM Thedistributiongeneratedbythisinfinitedirectednetwithreplicatedweightsistheequilibriumdistributionforacompatiblepairofconditionaldistributions p v h andp h v thatarebothdefinedbyWAtop downpassofthedirectednetisexactlyequivalenttolettingaRestrictedBoltzmannMachinesettletoequilibrium SothisinfinitedirectednetdefinesthesamedistributionasanRBM v1 h1 v0 h0 v2 h2 etc Thevariablesinh0areconditionallyindependentgivenv0 Inferenceistrivial Justmultiplyv0byThemodelaboveh0implementsacomplementaryprior Multiplyingv0bygivestheproductofthelikelihoodtermandthepriorterm Thecomplementarypriorcancelstheexplainingaway InferenceinthedirectednetisexactlyequivalenttolettinganRBMsettletoequilibriumstartingatthedata Inferenceinaninfinitesigmoidbeliefnet v1 h1 v0 h0 v2 h2 etc Thelearningruleforasigmoidbeliefnetis Withreplicatedweightsthisrulebecomes v1 h1 v0 h0 v2 h2 etc isanunbiasedsamplefrom Learningadeepdirectednetwork Firstlearnwithalltheweightstied ThisisexactlyequivalenttolearninganRBM Thinkofthesymmetricconnectionsasashorthandnotationforaninfinitedirectednetwithtiedweights Weoughttousemaximumlikelihoodlearning butweuseCD1asashortcut v1 h1 v0 h0 v2 h2 etc v0 h0 Thenfreezethefirstlayerofweightsinbothdirectionsandlearntheremainingweights stilltiedtogether ThisisequivalenttolearninganotherRBM usingtheaggregatedposteriordistributionofh0asthedata v1 h1 v0 h0 v2 h2 etc v1 h0 Whathappenswhentheweightsinhigherlayersbecomedifferentfromtheweightsinthefirstlayer Thehigherlayersnolongerimplem
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 新gre阅读解析题目译文及答案
- 销售合同审核流程表风险控制要点版
- 写景作文冬日滇池400字(13篇)
- 我家的端午节作文350字15篇范文
- 重游故地高三作文600字14篇
- 业务谈判策略模板与场景应对方案
- 红楼梦之黛玉之死:文学名著深度解读教案
- 状物作文美丽的桂花400字(7篇)
- 第3课 太平天国运动 课件 统编版历史八年级上册
- 商务活动策划与执行服务协议条款书
- 浙江省湖州市2024-2025学年高一下学期期末考试数学试卷
- 2025至2030中国酒店用品行业产业运行态势及投资规划深度研究报告
- 2025年中国热敏标签市场调查研究报告
- 仓库不良品管理制度
- 高纯气体不锈钢管道施工方案
- 干部出国境管理课件
- VR模拟器飞行员训练评估-洞察及研究
- 生产班组考核方案
- 超声引导下动静脉内瘘穿刺技术培训课件
- DBJ04-T306-2025 建筑基坑工程技术标准
- 鸡肉购销合同协议书
评论
0/150
提交评论