已阅读5页,还剩32页未读, 继续免费阅读
版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
GradientDescent,Review:GradientDescent,Instep3,wehavetosolvethefollowingoptimizationproblem:,=argmin,L:lossfunction,:parameters,Supposethathastwovariables1,2,Randomlystartat0=1020,1121=1020101202,1=00,1222=1121111212,2=11,Review:GradientDescent,Startatposition0,Computegradientat0,Moveto1=0-0,Computegradientat1,Moveto2=11,Movement,Gradient,0,1,2,3,0,1,2,3,1,2,Gradient:Loss的等高線的法線方向,GradientDescent,Tip1:Tuningyourlearningrates,LearningRate,No.ofparametersupdates,Loss,Loss,VeryLarge,Large,small,Justmake,Setthelearningratecarefully,Iftherearemorethanthreeparameters,youcannotvisualizethis.,Butyoucanalwaysvisualizethis.,AdaptiveLearningRates,Popular&SimpleIdea:Reducethelearningratebysomefactoreveryfewepochs.Atthebeginning,wearefarfromthedestination,soweuselargerlearningrateAfterseveralepochs,weareclosetothedestination,sowereducethelearningrateE.g.1/tdecay:=+1Learningratecannotbeone-size-fits-allGivingdifferentparametersdifferentlearningrates,Adagrad,Dividethelearningrateofeachparameterbytherootmeansquareofitspreviousderivatives,:rootmeansquareofthepreviousderivativesofparameterw,wisoneparameters,=,VanillaGradientdescent,Adagrad,+1,=+1,+1,Parameterdependent,Adagrad,10000,21111,+1,0=02,1=1202+12,=1+1=02,32222,2=1302+12+22,:rootmeansquareofthepreviousderivativesofparameterw,Adagrad,Dividethelearningrateofeachparameterbytherootmeansquareofitspreviousderivatives,=+1,+1=02,1/tdecay,+1,=1+1=02,Contradiction?,+1=02,VanillaGradientdescent,Adagrad,Largergradient,largerstep,Largergradient,smallerstep,Largergradient,largerstep,+1,=,=+1,IntuitiveReason,Howsurpriseitis,+1=02,造成反差的效果,反差,=,=+1,特別大,特別小,Largergradient,largersteps?,=2+,=|2+|,0,|0+2|,0,|20+|,Beststep:,2,|20+|2,Larger1storderderivativemeansfarfromtheminima,Comparisonbetweendifferentparameters,1,2,a,b,c,d,cd,ab,Larger1storderderivativemeansfarfromtheminima,Donotcrossparameters,SecondDerivative,=2+,=|2+|,2,0,|0+2|,0,|20+|,Beststep:,22=2,|20+|2,Comparisonbetweendifferentparameters,1,2,a,b,c,d,cd,ab,Larger1storderderivativemeansfarfromtheminima,Donotcrossparameters,SmallerSecond,LargerSecond,Largersecondderivative,smallersecondderivative,Usefirstderivativetoestimatesecondderivative,firstderivative2,1,2,largersecondderivative,smallersecondderivative,+1=02,?,GradientDescent,Tip2:StochasticGradientDescent,Makethetrainingfaster,StochasticGradientDescent,GradientDescent,StochasticGradientDescent,Pickanexamplexn,Faster!,=+2,Lossisthesummationoveralltrainingexamples,=+2,Lossforonlyoneexample,Demo,StochasticGradientDescent,GradientDescent,StochasticGradientDescent,Seeallexamples,Seeallexamples,Seeonlyoneexample,Updateafterseeingallexamples,Ifthereare20examples,20timesfaster.,Updateforeachexample,GradientDescent,Tip3:FeatureScaling,FeatureScaling,Makedifferentfeatureshavethesamescaling,Sourceoffigure:http:/cs231n.github.io/neural-networks-2/,=+11+22,1,1,2,2,FeatureScaling,1,2,100,200,LossL,1,2,LossL,1,2,=+11+22,FeatureScaling,1,2,3,mean:,standarddeviation:,Themeansofalldimensionsare0,andthevariancesareall1,Foreachdimensioni:,11,21,12,22,GradientDescent,Theory,Question,Whensolving:Eachtimeweupdatetheparameters,weobtainthatmakessmaller.,=argmax,bygradientdescent,012,Isthisstatementcorrect?,WarningofMath,FormalDerivation,Supposethathastwovariables1,2,How?,L(),Givenapoint,wecaneasilyfindthepointwiththesmallestvaluenearby.,TaylorSeries,Taylorseries:Leth(x)beanyfunctioninfinitelydifferentiablearoundx=x0.,Whenxisclosetox0,sin(x)=,E.g.Taylorseriesforh(x)=sin(x)aroundx0=/4,Theapproximationisgoodaround/4.,MultivariableTaylorSeries,Whenxandyisclosetox0andy0,+somethingrelatedto(x-x0)2and(y-y0)2+,BacktoFormalDerivation,BasedonTaylorSeries:,Iftheredcircleissmallenough,intheredcircle,L(),BacktoFormalDerivation,BasedonTaylorSeries:,Iftheredcircleissmallenough,intheredcircle,L(),Find1and2intheredcircleminimizingL(),constant,d,Simple,right?,Gradientdescenttwovariables,RedCircle:,(Iftheradiusissmall),TominimizeL(),Find1and2intheredcircleminimizingL(),BacktoFormalDerivation,Find1and2yieldingthesmallestvalueofinthecircle,Thisisgradientdescent.,BasedonTaylorSeries:,Iftheredcircleissmallenough,intheredcircle,constant,Notsatisfiediftheredcircle(learni
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 新能源推广方案
- 植物花研究报告
- 汽车美容活动方案6
- 猫咪心脏疾病常见症状和治疗方案
- 智能穿戴设备项目设计方案
- 远程医疗运营方案
- 母婴护理会所方案书(新)
- 景区运营思路方案
- 马术旅游运营方案
- 2025年中国多级泵市场专项调查分析及投资前景预测报告
- 邻近铁路营业线施工安全监测技术规程 (TB 10314-2021)
- 国企园林公司招聘笔试题
- 三农政策解读
- 企业咨询报告范文模板
- 高中语文北师大(必修3)第四单元课件:第12课《论睁了眼看》
- MySQL数据库PPT完整全套教学课件
- 小洋葱大作战【经典绘本】
- 护理查房阴茎癌护理
- 旅行社经营管理第七章课件
- 岩石破裂数值方法
- 2023年ITIL 4 Foundation中文考试预测试题库(含答案)
评论
0/150
提交评论