已阅读5页,还剩34页未读, 继续免费阅读
版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
NeuralNetworksforMachineLearningLecture14aLearninglayersoffeaturesbystackingRBMs TrainingadeepnetworkbystackingRBMs Firsttrainalayeroffeaturesthatreceiveinputdirectlyfromthepixels Thentreattheactivationsofthetrainedfeaturesasiftheywerepixelsandlearnfeaturesoffeaturesinasecondhiddenlayer Thendoitagain Itcanbeprovedthateachtimeweaddanotherlayeroffeaturesweimproveavariationallowerboundonthelogprobabilityofgeneratingthetrainingdata Theproofiscomplicatedandonlyappliestounrealcases ItisbasedonaneatequivalencebetweenanRBMandaninfinitelydeepbeliefnet seelecture14b CombiningtwoRBMstomakeaDBN copybinarystateforeachv ComposethetwoRBMmodelstomakeasingleDBNmodel TrainthisRBMfirst ThentrainthisRBM It snotaBoltzmannmachine Thegenerativemodelafterlearning3layers Togeneratedata Getanequilibriumsamplefromthetop levelRBMbyperformingalternatingGibbssamplingforalongtime Performatop downpasstogetstatesforalltheotherlayers Thelowerlevelbottom upconnectionsarenotpartofthegenerativemodel Theyarejustusedforinference h2 data h1 h3 Anaside Averagingfactorialdistributions Ifyouaveragesomefactorialdistributions youdoNOTgetafactorialdistribution InanRBM theposteriorover4hiddenunitsisfactorialforeachvisiblevector Posteriorforv1 0 9 0 9 0 1 0 1Posteriorforv2 0 1 0 1 0 9 0 9Aggregated 0 5 0 5 0 5 0 5 Considerthebinaryvector1 1 0 0 intheposteriorforv1 p 1 1 0 0 0 9 4 0 43intheposteriorforv2 p 1 1 0 0 0 1 4 0001intheaggregatedposterior p 1 1 0 0 0 215 Iftheaggregatedposteriorwasfactorialitwouldhavep 0 5 4 Whydoesgreedylearningwork Theweights W inthebottomlevelRBMdefinemanydifferentdistributions p v h p h v p v h p h p v WecanexpresstheRBMmodelas Ifweleavep v h aloneandimprovep h wewillimprovep v Toimprovep h weneedittobeabettermodelthanp h W oftheaggregatedposteriordistributionoverhiddenvectorsproducedbyapplyingWtransposetothedata Fine tuningwithacontrastiveversionofthewake sleepalgorithm Afterlearningmanylayersoffeatures wecanfine tunethefeaturestoimprovegeneration 1 Doastochasticbottom uppassThenadjustthetop downweightsoflowerlayerstobegoodatreconstructingthefeatureactivitiesinthelayerbelow 2 DoafewiterationsofsamplinginthetoplevelRBM Thenadjusttheweightsinthetop levelRBMusingCD 3 Doastochastictop downpassThenAdjustthebottom upweightstobegoodatreconstructingthefeatureactivitiesinthelayerabove TheDBNusedformodelingthejointdistributionofMNISTdigitsandtheirlabels 2000units 500units 500units 28x28pixelimage 10labels Thefirsttwohiddenlayersarelearnedwithoutusinglabels ThetoplayerislearnedasanRBMformodelingthelabelsconcatenatedwiththefeaturesinthesecondhiddenlayer Theweightsarethenfine tunedtobeabettergenerativemodelusingcontrastivewake sleep NeuralNetworksforMachineLearningLecture14bDiscriminativefine tuningforDBNs Fine tuningfordiscrimination FirstlearnonelayeratatimebystackingRBMs Treatthisas pre training thatfindsagoodinitialsetofweightswhichcanthenbefine tunedbyalocalsearchprocedure Contrastivewake sleepisawayoffine tuningthemodeltobebetteratgeneration Backpropagationcanbeusedtofine tunethemodeltobebetteratdiscrimination Thisovercomesmanyofthelimitationsofstandardbackpropagation Itmakesiteasiertolearndeepnets Itmakesthenetsgeneralizebetter Whybackpropagationworksbetterwithgreedypre training Theoptimizationview Greedilylearningonelayeratatimescaleswelltoreallybignetworks especiallyifwehavelocalityineachlayer Wedonotstartbackpropagationuntilwealreadyhavesensiblefeaturedetectorsthatshouldalreadybeveryhelpfulforthediscriminationtask Sotheinitialgradientsaresensibleandbackpropagationonlyneedstoperformalocalsearchfromasensiblestartingpoint Whybackpropagationworksbetterwithgreedypre training Theoverfittingview Mostoftheinformationinthefinalweightscomesfrommodelingthedistributionofinputvectors Theinputvectorsgenerallycontainalotmoreinformationthanthelabels Thepreciousinformationinthelabelsisonlyusedforthefine tuning Thefine tuningonlymodifiesthefeaturesslightlytogetthecategoryboundariesright Itdoesnotneedtodiscovernewfeatures Thistypeofback propagationworkswellevenifmostofthetrainingdataisunlabeled Theunlabeleddataisstillveryusefulfordiscoveringgoodfeatures Anobjection Surely manyofthefeatureswillbeuselessforanyparticulardiscriminativetask considershape pose Buttheonesthatareusefulwillbemuchmoreusefulthantherawinputs First modelthedistributionofdigitimages 2000units 500units 500units 28x28pixelimage Thenetworklearnsadensitymodelforunlabeleddigitimages Whenwegeneratefromthemodelwegetthingsthatlooklikerealdigitsofallclasses Butdothehiddenfeaturesreallyhelpwithdigitdiscrimination Adda10 waysoftmaxatthetopanddobackpropagation ThetoptwolayersformarestrictedBoltzmannmachinewhoseenergylandscapeshouldmodelthelowdimensionalmanifoldsofthedigits Resultsonthepermutation invariantMNISTtask Backpropnetwithoneortwohiddenlayers Platt Hinton BackpropwithL2constraintsonincomingweightsSupportVectorMachines Decoste Schoelkopf 2002 Generativemodelofjointdensityofimagesandlabels generativefine tuning Generativemodelofunlabelleddigitsfollowedbygentlebackpropagation Hinton Salakhutdinov 2006 1 6 1 5 1 4 1 25 1 15 1 0 Errorrate Unsupervised pre training alsohelpsformodelsthathavemoredataandbetterpriors Ranzatoet al NIPS2006 usedanadditional600 000distorteddigits Theyalsousedconvolutionalmultilayerneuralnetworks Back propagationalone 0 49 Unsupervisedlayer by layerpre trainingfollowedbybackprop 0 39 recordatthetime PhonerecognitionontheTIMITbenchmark Mohamed Dahl Hinton 2009 2012 Afterstandardpost processingusingabi phonemodel adeepnetwith8layersgets20 7 errorrate Thebestpreviousspeaker independentresultonTIMITwas24 4 andthisrequiredaveragingseveralmodels LiDeng atMSR realisedthatthisresultcouldchangethewayspeechrecognitionwasdone Ithas 15framesof40filterbankoutputs theirtemporalderivatives 2000logistichiddenunits 2000logistichiddenunits 183HMM statelabels notpre trained 6morelayersofpre trainedweights http www bbc co uk news technology 20266427 NeuralNetworksforMachineLearningLecture14cWhathappensduringdiscriminativefine tuning LearningDynamicsofDeepNetsthenext4slidesdescribeworkbyYoshuaBengio sgroup Beforefine tuning Afterfine tuning EffectofUnsupervisedPre training Erhanet al AISTATS 2009 EffectofDepth w opre training withpre training withoutpre training Trajectoriesofthelearninginfunctionspace a2 Dvisualizationproducedwitht SNE EachpointisamodelinfunctionspaceColor epochTop trajectorieswithoutpre training Eachtrajectoryconvergestoadifferentlocalmin Bottom Trajectorieswithpre training Nooverlap Erhanet alAISTATS 2009 Whyunsupervisedpre trainingmakessense stuff image label stuff image label Ifimage labelpairsweregeneratedthisway itwouldmakesensetotrytogostraightfromimagestolabels Forexample dothepixelshaveevenparity Ifimage labelpairsaregeneratedthisway itmakessensetofirstlearntorecoverthestuffthatcausedtheimagebyinvertingthehighbandwidthpathway highbandwidth lowbandwidth NeuralNetworksforMachineLearningLecture14dModelingreal valueddatawithanRBM Modelingreal valueddata Forimagesofdigits intermediateintensitiescanberepresentedasiftheywereprobabilitiesbyusing mean field logisticunits Wetreatintermediatevaluesastheprobabilitythatthepixelisinked Thiswillnotworkforrealimages Inarealimage theintensityofapixelisalmostalways almostexactlytheaverageoftheneighboringpixels Mean fieldlogisticunitscannotrepresentpreciseintermediatevalues Astandardtypeofreal valuedvisibleunit ModelpixelsasGaussianvariables AlternatingGibbssamplingisstilleasy thoughlearningneedstobemuchslower E energy gradientproducedbythetotalinputtoavisibleunit paraboliccontainmentfunction Gaussian BinaryRBM s Lotsofpeoplehavefailedtogetthesetoworkproperly Itsextremelyhardtolearntightvariancesforthevisibleunits Ittookalongtimeforustofigureoutwhyitissohardtolearnthevisiblevariances Whensigmaissmall weneedmanymorehiddenunitsthanvisibleunits Thisallowssmallweightstoproducebigtop downeffects Whensigmaismuchlessthan1 thebottom upeffectsaretoobigandthetop downeffectsaretoosmall Steppedsigmoidunits Aneatwaytoimplementintegervalues Makemanycopiesofastochasticbinaryunit Allcopieshavethesameweightsandthesameadaptivebias b buttheyhavedifferentfixedoffsetstothebias Fastapproximations Contrastivedivergencelearningworkswellforthesumofstochasticlogisticunitswithoffsetbiases ThenoisevarianceisItalsoworksforrectifiedlinearunits Thesearemuchfastertocomputethanthesumofmanylogisticunitswithdifferentbiases Anicepropertyofrectifiedlinearunits Ifareluhasabiasofzero itexhibitsscaleequivariance Thisisaverynicepropertytohaveforimages Itisliketheequivariancetotranslationexhibitedbyconvolutionalnets NeuralNetworksforMachineLearningLecture14eRBMsareInfiniteSigmoidBeliefNetsADVANCEDMATERIAL NOTONQUIZZESORFINALTEST Anotherviewofwhylayer by layerlearningworks Hinton Osindero Teh2006 ThereisanunexpectedequivalencebetweenRBM sanddirectednetworkswithmanylayersthatallsharethesameweightmatrix Thisequivalencealsogivesinsightintowhycontrastivedivergencelearningworks AnRBMisactuallyjustaninfinitelydeepsigmoidbeliefnetwithalotofweightsharing TheMarkovchainwerunwhenwewanttosamplefromtheequilibriumdistributionofanRBMcanbeviewedasasigmoidbeliefnet AninfinitesigmoidbeliefnetthatisequivalenttoanRBM Thedistributiongeneratedbythisinfinitedirectednetwithreplicatedweightsistheequilibriumdistributionforacompatiblepairofconditionaldistributions p v h andp h v thatarebothdefinedbyWAtop downpassofthedirectednetisexactlyequivalenttolettingaRestrictedBoltzmannMachinesettletoequilibrium SothisinfinitedirectednetdefinesthesamedistributionasanRBM v1 h1 v0 h0 v2 h2 etc Thevariablesinh0areconditionallyindependentgivenv0 Inferenceistrivial Justmultiplyv0byThemodelaboveh0implementsacomplementaryprior Multiplyingv0bygivestheproductofthelikelihoodtermandthepriorterm Thecomplementarypriorcancelstheexplainingaway InferenceinthedirectednetisexactlyequivalenttolettinganRBMsettletoequilibriumstartingatthedata Inferenceinaninfinitesigmoidbeliefnet v1 h1 v0 h0 v2 h2 etc Thelearningruleforasigmoidbeliefnetis Withreplicatedweightsthisrulebecomes v1 h1 v0 h0 v2 h2 etc isanunbiasedsamplefrom Learningadeepdirectednetwork Firstlearnwithalltheweightstied ThisisexactlyequivalenttolearninganRBM Thinkofthesymmetricconnectionsasashorthandnotationforaninfinitedirectednetwithtiedweights Weoughttousemaximumlikelihoodlearning butweuseCD1asashortcut v1 h1 v0 h0 v2 h2 etc v0 h0 Thenfreezethefirstlayerofweightsinbothdirectionsandlearntheremainingweights stilltiedtogether ThisisequivalenttolearninganotherRBM usingtheaggregatedposteriordistributionofh0asthedata v1 h1 v0 h0 v2 h2 etc v1 h0 Whathappenswhentheweightsinhigherlayersbecomedifferentfromtheweightsinthefirstlayer Thehigherlayersnolongerimplem
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 网络设备检修合同范本
- 职业培训就业合同范本
- 聘用制合同续约协议书
- 股东内部约定合同范本
- 股权平价转让合同范本
- 苹果分期分付合同协议
- 装修半包协议合同范本
- 装修合同附加工期协议
- 视频制作方案合同范本
- 设备仪器租借合同范本
- DB13-T 5821-2023 预拌流态固化土回填技术规程
- 2024中国铁路上海局集团限公司招聘1101人一(本科及以上)高频500题难、易错点模拟试题附带答案详解
- 2024年国家开放大学电大开放英语考试题题库
- 高中生物试讲稿汇编(逐字逐句-适用于教师招聘、教师资格证面试)
- 基于无人机的公路裂缝自动检测与分类识别
- 气体充装站试生产方案
- 高中地理 人教版 选修二《资源、环境与区域发展》第五课时:玉门之变-玉门市的转型发展
- 建筑结构检测与加固课程复习考试试题及答案B
- 羧酸及其衍生物(习题)
- 摩尔斯电报码
- 猪圆环病毒病课件
评论
0/150
提交评论