




已阅读5页,还剩32页未读, 继续免费阅读
版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
GradientDescent,Review:GradientDescent,Instep3,wehavetosolvethefollowingoptimizationproblem:,=argmin,L:lossfunction,:parameters,Supposethathastwovariables1,2,Randomlystartat0=1020,1121=1020101202,1=00,1222=1121111212,2=11,Review:GradientDescent,Startatposition0,Computegradientat0,Moveto1=0-0,Computegradientat1,Moveto2=11,Movement,Gradient,0,1,2,3,0,1,2,3,1,2,Gradient:Loss的等高線的法線方向,GradientDescent,Tip1:Tuningyourlearningrates,LearningRate,No.ofparametersupdates,Loss,Loss,VeryLarge,Large,small,Justmake,Setthelearningratecarefully,Iftherearemorethanthreeparameters,youcannotvisualizethis.,Butyoucanalwaysvisualizethis.,AdaptiveLearningRates,Popular&SimpleIdea:Reducethelearningratebysomefactoreveryfewepochs.Atthebeginning,wearefarfromthedestination,soweuselargerlearningrateAfterseveralepochs,weareclosetothedestination,sowereducethelearningrateE.g.1/tdecay:=+1Learningratecannotbeone-size-fits-allGivingdifferentparametersdifferentlearningrates,Adagrad,Dividethelearningrateofeachparameterbytherootmeansquareofitspreviousderivatives,:rootmeansquareofthepreviousderivativesofparameterw,wisoneparameters,=,VanillaGradientdescent,Adagrad,+1,=+1,+1,Parameterdependent,Adagrad,10000,21111,+1,0=02,1=1202+12,=1+1=02,32222,2=1302+12+22,:rootmeansquareofthepreviousderivativesofparameterw,Adagrad,Dividethelearningrateofeachparameterbytherootmeansquareofitspreviousderivatives,=+1,+1=02,1/tdecay,+1,=1+1=02,Contradiction?,+1=02,VanillaGradientdescent,Adagrad,Largergradient,largerstep,Largergradient,smallerstep,Largergradient,largerstep,+1,=,=+1,IntuitiveReason,Howsurpriseitis,+1=02,造成反差的效果,反差,=,=+1,特別大,特別小,Largergradient,largersteps?,=2+,=|2+|,0,|0+2|,0,|20+|,Beststep:,2,|20+|2,Larger1storderderivativemeansfarfromtheminima,Comparisonbetweendifferentparameters,1,2,a,b,c,d,cd,ab,Larger1storderderivativemeansfarfromtheminima,Donotcrossparameters,SecondDerivative,=2+,=|2+|,2,0,|0+2|,0,|20+|,Beststep:,22=2,|20+|2,Comparisonbetweendifferentparameters,1,2,a,b,c,d,cd,ab,Larger1storderderivativemeansfarfromtheminima,Donotcrossparameters,SmallerSecond,LargerSecond,Largersecondderivative,smallersecondderivative,Usefirstderivativetoestimatesecondderivative,firstderivative2,1,2,largersecondderivative,smallersecondderivative,+1=02,?,GradientDescent,Tip2:StochasticGradientDescent,Makethetrainingfaster,StochasticGradientDescent,GradientDescent,StochasticGradientDescent,Pickanexamplexn,Faster!,=+2,Lossisthesummationoveralltrainingexamples,=+2,Lossforonlyoneexample,Demo,StochasticGradientDescent,GradientDescent,StochasticGradientDescent,Seeallexamples,Seeallexamples,Seeonlyoneexample,Updateafterseeingallexamples,Ifthereare20examples,20timesfaster.,Updateforeachexample,GradientDescent,Tip3:FeatureScaling,FeatureScaling,Makedifferentfeatureshavethesamescaling,Sourceoffigure:http:/cs231n.github.io/neural-networks-2/,=+11+22,1,1,2,2,FeatureScaling,1,2,100,200,LossL,1,2,LossL,1,2,=+11+22,FeatureScaling,1,2,3,mean:,standarddeviation:,Themeansofalldimensionsare0,andthevariancesareall1,Foreachdimensioni:,11,21,12,22,GradientDescent,Theory,Question,Whensolving:Eachtimeweupdatetheparameters,weobtainthatmakessmaller.,=argmax,bygradientdescent,012,Isthisstatementcorrect?,WarningofMath,FormalDerivation,Supposethathastwovariables1,2,How?,L(),Givenapoint,wecaneasilyfindthepointwiththesmallestvaluenearby.,TaylorSeries,Taylorseries:Leth(x)beanyfunctioninfinitelydifferentiablearoundx=x0.,Whenxisclosetox0,sin(x)=,E.g.Taylorseriesforh(x)=sin(x)aroundx0=/4,Theapproximationisgoodaround/4.,MultivariableTaylorSeries,Whenxandyisclosetox0andy0,+somethingrelatedto(x-x0)2and(y-y0)2+,BacktoFormalDerivation,BasedonTaylorSeries:,Iftheredcircleissmallenough,intheredcircle,L(),BacktoFormalDerivation,BasedonTaylorSeries:,Iftheredcircleissmallenough,intheredcircle,L(),Find1and2intheredcircleminimizingL(),constant,d,Simple,right?,Gradientdescenttwovariables,RedCircle:,(Iftheradiusissmall),TominimizeL(),Find1and2intheredcircleminimizingL(),BacktoFormalDerivation,Find1and2yieldingthesmallestvalueofinthecircle,Thisisgradientdescent.,BasedonTaylorSeries:,Iftheredcircleissmallenough,intheredcircle,constant,Notsatisfiediftheredcircle(learni
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 2025华信玫瑰园商业地产贷款集合资金信托计划信托合同
- 钳工高级技师考试试题
- 企业安全文化建设安全培训考试题
- 充电桩安全培训考试题
- 2025鞋类招商加盟合同
- 2025商业房产预售合同范本
- 检验医学基础知识考试题
- 初中物理光学考试试题
- 照明设备安装材料员的试题及答案
- 业绩对赌协议范文5篇
- MySQL数据库PPT完整全套教学课件
- 人工智能机器学习课件
- GB/T 6441-1986企业职工伤亡事故分类
- 第一章 电渣冶金
- GB/T 12719-2021矿区水文地质工程地质勘查规范
- 剖宫产术后护理常规
- 老年康复理论知识考核试题及答案
- 分子杂交技术hu
- 第3章-信息可视化设计的流程课件
- 实验数据的误差分析课件
- 供水管网爆管事故应急抢修全新预案修订
评论
0/150
提交评论