一天理解深度学习PPT课件.ppt_第1页
一天理解深度学习PPT课件.ppt_第2页
一天理解深度学习PPT课件.ppt_第3页
一天理解深度学习PPT课件.ppt_第4页
一天理解深度学习PPT课件.ppt_第5页
已阅读5页,还剩299页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

一天理解深度学习 Hung yiLee 2019 11 4 Outline 2 2020 3 22 LectureI IntroductionofDeepLearning 3 2019 11 4 Outline 4 2020 3 22 MachineLearning LookingforaFunction SpeechRecognitionImageRecognitionPlayingGoDialogueSystem Cat Howareyou 5 5 Hello Hi whattheusersaid systemresponse nextmove 5 2020 3 22 Framework Asetoffunction cat dog money snake Model cat ImageRecognition 6 2020 3 22 Framework Asetoffunction cat ImageRecognition Model TrainingData Goodnessoffunctionf Better monkey cat dog functioninput functionoutput SupervisedLearning 7 2020 3 22 Framework Asetoffunction cat ImageRecognition Model TrainingData Goodnessoffunctionf monkey cat dog Pickthe Best Function Using cat Training Testing Step1 Step2 Step3 8 2020 3 22 ThreeStepsforDeepLearning NeuralNetwork 9 2020 3 22 NeuralNetwork bias weights Neuron Asimplefunction Activationfunction 10 2020 3 22 NeuralNetwork bias Activationfunction weights Neuron 1 4 0 98 11 2020 3 22 NeuralNetwork Differentconnectionsleadtodifferentnetworkstructures Weightsandbiasesarenetworkparameters Theneuronshavedifferentvaluesofweightsandbiases 12 2020 3 22 FullyConnectFeedforwardNetwork 1 1 1 2 1 1 1 0 4 2 0 98 0 12 13 2020 3 22 FullyConnectFeedforwardNetwork 1 2 1 1 4 2 0 98 0 12 2 1 1 2 3 1 4 1 0 86 0 11 0 62 0 83 1 1 14 2020 3 22 FullyConnectFeedforwardNetwork 1 2 1 1 1 0 0 73 0 5 2 1 1 2 3 1 4 1 0 72 0 12 0 51 0 85 0 0 2 2 00 0 510 85 Givenparameters defineafunction 1 1 0 620 83 0 0 Thisisafunction Inputvector outputvector Givennetworkstructure defineafunctionset 15 2020 3 22 OutputLayer HiddenLayers InputLayer FullyConnectFeedforwardNetwork Input Output y1 y2 yM Deepmeansmanyhiddenlayers neuron 16 2020 3 22 WhyDeep UniversalityTheorem Referenceforthereason Anycontinuousfunctionf Canberealizedbyanetworkwithonehiddenlayer givenenoughhiddenneurons Why Deep neuralnetworknot Fat neuralnetwork 17 2020 3 22 LogiccircuitsconsistsofgatesAtwolayersoflogicgatescanrepresentanyBooleanfunction Usingmultiplelayersoflogicgatestobuildsomefunctionsaremuchsimpler NeuralnetworkconsistsofneuronsAhiddenlayernetworkcanrepresentanycontinuousfunction Usingmultiplelayersofneuronstorepresentsomefunctionsaremuchsimpler Logiccircuits Neuralnetwork lessdata Morereason WhyDeep Analogy 18 2020 3 22 8layers 19layers 22layers AlexNet 2012 VGG 2014 GoogleNet 2014 16 4 7 3 6 7 http cs231n stanford edu slides winter1516 lecture8 pdf Deep Manyhiddenlayers 19 2020 3 22 AlexNet 2012 VGG 2014 GoogleNet 2014 152layers 3 57 ResidualNet 2015 16 4 7 3 6 7 Deep Manyhiddenlayers Specialstructure 20 2020 3 22 OutputLayer Softmaxlayerastheoutputlayer OrdinaryLayer Ingeneral theoutputofnetworkcanbeanyvalue Maynotbeeasytointerpret 21 2020 3 22 OutputLayer Softmaxlayerastheoutputlayer SoftmaxLayer 3 3 1 2 7 20 0 05 0 88 0 12 0 Probability 1 0 1 22 2020 3 22 ExampleApplication Input Output 16x16 256 Ink 1Noink 0 Eachdimensionrepresentstheconfidenceofadigit is1 is2 is0 0 1 0 7 0 2 Theimageis 2 23 2020 3 22 ExampleApplication HandwritingDigitRecognition Machine 2 is1 is2 is0 Whatisneededisafunction Input 256 dimvector output 10 dimvector NeuralNetwork 24 2020 3 22 OutputLayer HiddenLayers InputLayer ExampleApplication Input Output 2 is1 is2 is0 AfunctionsetcontainingthecandidatesforHandwritingDigitRecognition Youneedtodecidethenetworkstructuretoletagoodfunctioninyourfunctionset 25 2020 3 22 FAQ Q Howmanylayers Howmanyneuronsforeachlayer Q Canwedesignthenetworkstructure Q Canthestructurebeautomaticallydetermined Yes butnotwidelystudiedyet ConvolutionalNeuralNetwork CNN inthenextlecture 26 2020 3 22 HighwayNetwork ResidualNetwork HighwayNetwork DeepResidualLearningforImageRecognitionhttp arxiv org abs 1512 03385 TrainingVeryDeepNetworkshttps arxiv org pdf 1507 06228v2 pdf copy copy Gatecontroller 27 2020 3 22 Inputlayer outputlayer Inputlayer outputlayer Inputlayer outputlayer HighwayNetworkautomaticallydeterminesthelayersneeded 28 2020 3 22 ThreeStepsforDeepLearning 29 2020 3 22 TrainingData Preparingtrainingdata imagesandtheirlabels Thelearningtargetisdefinedonthetrainingdata 5 0 4 1 3 1 2 9 30 2020 3 22 LearningTarget 16x16 256 Ink 1Noink 0 y1 y2 y10 y1hasthemaximumvalue Thelearningtargetis Input y2hasthemaximumvalue Input is1 is2 is0 Softmax 31 2020 3 22 Loss y1 y2 y10 Loss 1 Losscanbesquareerrororcrossentropybetweenthenetworkoutputandtarget target Softmax Ascloseaspossible Agoodfunctionshouldmakethelossofallexamplesassmallaspossible Givenasetofparameters 32 2020 3 22 TotalLoss NN NN NN 1 2 1 NN 3 Foralltrainingdata 1 Findthenetworkparameters thatminimizetotallossL TotalLoss 2 3 Assmallaspossible FindafunctioninfunctionsetthatminimizestotallossL 33 2020 3 22 ThreeStepsforDeepLearning 34 2020 3 22 Howtopickthebestfunction Findnetworkparameters thatminimizetotallossL Networkparameters 1 2 3 1 2 3 Enumerateallpossiblevalues E g speechrecognition 8layersand1000neuronseachlayer 1000neurons 1000neurons 106weights 35 2020 3 22 GradientDescent TotalLoss Random RBMpre train Usuallygoodenough Networkparameters 1 2 1 2 Pickaninitialvalueforw Findnetworkparameters thatminimizetotallossL 36 2020 3 22 GradientDescent TotalLoss Networkparameters 1 2 1 2 Pickaninitialvalueforw Compute Positive Negative Decreasew Increasew Findnetworkparameters thatminimizetotallossL 37 2020 3 22 GradientDescent TotalLoss Networkparameters 1 2 1 2 Pickaninitialvalueforw Compute iscalled learningrate Repeat Findnetworkparameters thatminimizetotallossL 38 2020 3 22 GradientDescent TotalLoss Networkparameters 1 2 1 2 Pickaninitialvalueforw Compute Repeat Until isapproximatelysmall whenupdateislittle Findnetworkparameters thatminimizetotallossL 39 2020 3 22 GradientDescent Color ValueofTotalLossL Randomlypickastartingpoint 40 2020 3 22 GradientDescent Hopfully wewouldreachaminima Compute 1 2 1 2 Color ValueofTotalLossL 41 2020 3 22 LocalMinima TotalLoss Thevalueofanetworkparameterw Veryslowattheplateau Stuckatlocalminima 0 Stuckatsaddlepoint 0 0 42 2020 3 22 LocalMinima Gradientdescentneverguaranteeglobalminima 1 2 Differentinitialpoint Reachdifferentminima sodifferentresults 43 2020 3 22 GradientDescent Thisisthe learning ofmachinesindeeplearning Evenalphagousingthisapproach Ihopeyouarenottoodisappointed p Peopleimage Actually 44 2020 3 22 Backpropagation Backpropagation anefficientwaytocompute inneuralnetwork Ref 45 2020 3 22 ThreeStepsforDeepLearning DeepLearningissosimple NowIfyouwanttofindafunction Ifyouhavelotsoffunctioninput output astrainingdata Youcanusedeeplearning 46 2020 3 22 Forexample youcando ImageRecognition Network monkey cat dog monkey cat dog 47 2020 3 22 Forexample youcando Spamfiltering http spam filter Network Yes No 1 0 1 Yes 0 No free ine mail Talk ine mail 48 2020 3 22 Forexample youcando http top breaking Network 政治 體育 經濟 president indocument stock indocument 體育 政治 財經 49 2020 3 22 Outline 50 2020 3 22 Keras keras http speech ee ntu edu tw tlkagk courses MLDS 2015 2 Lecture Theano 20DNN ecm mp4 index html http speech ee ntu edu tw tlkagk courses MLDS 2015 2 Lecture RNN 20training 20 v6 ecm mp4 index html Veryflexible Needsomeefforttolearn Easytolearnanduse stillhavesomeflexibility YoucanmodifyitifyoucanwriteTensorFloworTheano InterfaceofTensorFloworTheano or Ifyouwanttolearntheano 51 2020 3 22 Keras Fran oisCholletistheauthorofKeras HecurrentlyworksforGoogleasadeeplearningengineerandresearcher KerasmeanshorninGreekDocumentation http keras io Example 52 2020 3 22 使用Keras心得 感謝沈昇勳同學提供圖檔 53 2020 3 22 ExampleApplication HandwritingDigitRecognition Machine 1 Helloworld fordeeplearning MNISTData Kerasprovidesdatasetsloadingfunction http keras io datasets 28x28 54 2020 3 22 Keras y1 y2 y10 Softmax 500 500 28x28 55 2020 3 22 Keras 56 2020 3 22 Keras Step3 1 Configuration Step3 2 Findtheoptimalnetworkparameters 0 1 Trainingdata Images Labels digits 57 2020 3 22 Keras Step3 2 Findtheoptimalnetworkparameters https www tensorflow org versions r0 8 tutorials mnist beginners index html Numberoftrainingexamples numpyarray 28x28 784 numpyarray 10 Numberoftrainingexamples 58 2020 3 22 Keras http keras io getting started faq how can i save a keras model Howtousetheneuralnetwork testing case1 case2 Saveandloadmodels 59 2020 3 22 Keras UsingGPUtospeedtrainingWay1THEANO FLAGS device gpu0pythonYourCode pyWay2 inyourcode importosos environ THEANO FLAGS device gpu0 60 2020 3 22 Demo 61 2020 3 22 ThreeStepsforDeepLearning DeepLearningissosimple 62 2020 3 22 Outline 63 2020 3 22 NeuralNetwork GoodResultsonTestingData GoodResultsonTrainingData YES YES NO NO Overfitting RecipeofDeepLearning 64 2020 3 22 DonotalwaysblameOverfitting DeepResidualLearningforImageRecognitionhttp arxiv org abs 1512 03385 TestingData Overfitting TrainingData Notwelltrained 65 2020 3 22 NeuralNetwork GoodResultsonTestingData GoodResultsonTrainingData YES YES RecipeofDeepLearning Differentapproachesfordifferentproblems e g dropoutforgoodresultsontestingdata 66 2020 3 22 GoodResultsonTestingData GoodResultsonTrainingData YES YES RecipeofDeepLearning 67 2020 3 22 ChoosingProperLoss y1 y2 y10 loss 1 1 0 0 target Softmax 110 2 SquareError CrossEntropy 110 Whichoneisbetter 1 2 10 1 0 0 0 0 68 2020 3 22 Demo SquareError CrossEntropy Severalalternatives https keras io objectives 69 2020 3 22 Demo 70 2020 3 22 ChoosingProperLoss TotalLoss w1 w2 CrossEntropy SquareError Whenusingsoftmaxoutputlayer choosecrossentropy http jmlr org proceedings papers v9 glorot10a glorot10a pdf 71 2020 3 22 GoodResultsonTestingData GoodResultsonTrainingData YES YES RecipeofDeepLearning 72 2020 3 22 Mini batch NN 1 1 NN 31 31 NN 2 2 NN 16 16 Pickthe1stbatch Randomlyinitializenetworkparameters Pickthe2ndbatch Mini batch Mini batch 1 31 2 16 Updateparametersonce Updateparametersonce Untilallmini batcheshavebeenpicked oneepoch Repeattheaboveprocess Wedonotreallyminimizetotalloss 73 2020 3 22 Mini batch 100examplesinamini batch Repeat20times 74 2020 3 22 Mini batch OriginalGradientDescent WithMini batch Unstable Thecolorsrepresentthetotalloss 75 2020 3 22 Mini batchisFaster 1epoch Seeallexamples Seeonlyonebatch Updateafterseeingallexamples Ifthereare20batches update20timesinoneepoch OriginalGradientDescent WithMini batch Notalwaystruewithparallelcomputing Canhavethesamespeed notsuperlargedataset Mini batchhasbetterperformance 76 2020 3 22 Demo 77 2020 3 22 NN 1 1 NN 31 31 NN 2 2 NN 16 16 Mini batch Mini batch Shufflethetrainingexamplesforeachepoch Epoch1 NN 1 1 NN 17 17 NN 2 2 NN 26 26 Mini batch Mini batch Epoch2 Don tworry ThisisthedefaultofKeras 78 2020 3 22 GoodResultsonTestingData GoodResultsonTrainingData YES YES RecipeofDeepLearning 79 2020 3 22 HardtogetthepowerofDeep Deeperusuallydoesnotimplybetter ResultsonTrainingData 80 2020 3 22 Demo 81 2020 3 22 VanishingGradientProblem Largergradients Almostrandom Alreadyconverge basedonrandom Learnveryslow Learnveryfast Smallergradients 82 2020 3 22 VanishingGradientProblem Intuitivewaytocomputethederivatives Smallergradients 83 2020 3 22 HardtogetthepowerofDeep In2006 peopleusedRBMpre training In2015 peopleuseReLU 84 2020 3 22 ReLU RectifiedLinearUnit ReLU Reason 1 Fasttocompute 2 Biologicalreason 3 Infinitesigmoidwithdifferentbiases 4 Vanishinggradientproblem XavierGlorot AISTATS 11 AndrewL Maas ICML 13 KaimingHe arXiv 15 85 2020 3 22 ReLU 0 0 0 0 86 2020 3 22 ReLU AThinnerlinearnetwork Donothavesmallergradients 87 2020 3 22 Demo 88 2020 3 22 ReLU variant alsolearnedbygradientdescent 89 2020 3 22 Maxout Learnableactivationfunction IanJ Goodfellow ICML 13 Max Input Max 7 1 Max Max 2 4 ReLUisaspecialcasesofMaxout Youcanhavemorethan2elementsinagroup neuron 90 2020 3 22 Maxout Learnableactivationfunction IanJ Goodfellow ICML 13 ActivationfunctioninmaxoutnetworkcanbeanypiecewiselinearconvexfunctionHowmanypiecesdependingonhowmanyelementsinagroup ReLUisaspecialcasesofMaxout 2elementsinagroup 3elementsinagroup 91 2020 3 22 GoodResultsonTestingData GoodResultsonTrainingData YES YES RecipeofDeepLearning 92 2020 3 22 LearningRates Iflearningrateistoolarge Totallossmaynotdecreaseaftereachupdate Setthelearningrate carefully 93 2020 3 22 LearningRates Iflearningrateistoolarge Setthelearningrate carefully Iflearningrateistoosmall Trainingwouldbetooslow Totallossmaynotdecreaseaftereachupdate 94 2020 3 22 LearningRates Popular SimpleIdea Reducethelearningratebysomefactoreveryfewepochs Atthebeginning wearefarfromthedestination soweuselargerlearningrateAfterseveralepochs weareclosetothedestination sowereducethelearningrateE g 1 tdecay 1Learningratecannotbeone size fits allGivingdifferentparametersdifferentlearningrates 95 2020 3 22 Adagrad Parameterdependentlearningrate w constant is obtainedatthei thupdate 0 2 Summationofthesquareofthepreviousderivatives Original Adagrad 96 2020 3 22 Adagrad Observation 1 Learningrateissmallerandsmallerforallparameters 2 Smallerderivatives largerlearningrate andviceversa 0 12 0 12 0 22 202 202 102 0 1 0 22 20 22 Why Learningrate Learningrate 1 2 97 2020 3 22 SmallerDerivatives LargerLearningRate 2 Smallerderivatives largerlearningrate andviceversa Why SmallerLearningRate Largerderivatives 98 2020 3 22 Notthewholestory Adagrad JohnDuchi JMLR 11 RMSprop 99 2020 3 22 GoodResultsonTestingData GoodResultsonTrainingData YES YES RecipeofDeepLearning 100 2020 3 22 Hardtofindoptimalnetworkparameters TotalLoss Thevalueofanetworkparameterw Veryslowattheplateau Stuckatlocalminima 0 Stuckatsaddlepoint 0 0 101 2020 3 22 Inphysicalworld Momentum Howaboutputthisphenomenoningradientdescent 102 2020 3 22 Movement Negativeof Momentum Momentum cost 0 Stillnotguaranteereachingglobalminima butgivesomehope 103 2020 3 22 Adam RMSProp AdvancedAdagrad Momentum 104 2020 3 22 Demo 105 2020 3 22 GoodResultsonTestingData GoodResultsonTrainingData YES YES RecipeofDeepLearning 106 2020 3 22 PanaceaforOverfitting HavemoretrainingdataCreatemoretrainingdata OriginalTrainingData CreatedTrainingData Shift15 Handwritingrecognition 107 2020 3 22 GoodResultsonTestingData GoodResultsonTrainingData YES YES RecipeofDeepLearning 108 2020 3 22 Dropout Training Eachtimebeforeupdatingtheparameters Eachneuronhasp todropout 109 2020 3 22 Dropout Training Eachtimebeforeupdatingtheparameters Eachneuronhasp todropout Usingthenewnetworkfortraining Thestructureofthenetworkischanged Thinner Foreachmini batch weresamplethedropoutneurons 110 2020 3 22 Dropout Testing Nodropout Ifthedropoutrateattrainingisp alltheweightstimes1 p Assumethatthedropoutrateis50 Ifaweightw 1bytraining set 0 5fortesting 111 2020 3 22 Dropout IntuitiveReason Training Testing Dropout 腳上綁重物 Nodropout 拿下重物後就變很強 112 2020 3 22 Dropout IntuitiveReason Whytheweightsshouldmultiply 1 p dropoutrate whentesting TrainingofDropout TestingofDropout 1 2 3 4 1 2 3 4 Assumedropoutrateis50 0 5 0 5 0 5 0 5 Nodropout 113 2020 3 22 Dropoutisakindofensemble Ensemble Network1 Network2 Network3 Network4 Trainabunchofnetworkswithdifferentstructures TrainingSet Set1 Set2 Set3 Set4 114 2020 3 22 Dropoutisakindofensemble Ensemble y1 Network1 Network2 Network3 Network4 Testingdatax y2 y3 y4 average 115 2020 3 22 Dropoutisakindofensemble TrainingofDropout minibatch1 Usingonemini batchtotrainonenetwork Someparametersinthenetworkareshared minibatch2 minibatch3 minibatch4 Mneurons 2Mpossiblenetworks 116 2020 3 22 Dropoutisakindofensemble testingdatax TestingofDropout average y1 y2 y3 Alltheweightsmultiply1 p y 117 2020 3 22 Moreaboutdropout Morereferencefordropout NitishSrivastava JMLR 14 PierreBaldi NIPS 13 GeoffreyE Hinton arXiv 12 DropoutworksbetterwithMaxout IanJ Goodfellow ICML 13 Dropconnect LiWan ICML 13 DropoutdeleteneuronsDropconnectdeletestheconnectionbetweenneuronsAnnealeddropout S J Rennie SLT 14 DropoutratedecreasesbyepochsStandout J Ba NISP 13 Eachneuralhasdifferentdropoutrate 118 2020 3 22 Demo y1 y2 y10 Softmax 500 500 model add dropout 0 8 model add dropout 0 8 119 2020 3 22 Demo 120 2020 3 22 GoodResultsonTestingData GoodResultsonTrainingData YES YES RecipeofDeepLearning CNNisaverygoodexample nextlecture 121 2020 3 22 ConcludingRemarks 122 2019 11 4 RecipeofDeepLearning NeuralNetwork GoodResultsonTestingData GoodResultsonTrainingData YES YES NO NO 123 2020 3 22 LectureII VariantsofNeuralNetworks 124 2019 11 4 VariantsofNeuralNetworks Widelyusedinimageprocessing 125 2020 3 22 WhyCNNforImage Canthenetworkbesimplifiedbyconsideringthepropertiesofimages Themostbasicclassifiers Use1stlayerasmoduletobuildclassifiers Use2ndlayerasmodule Zeiler M D ECCV2014 Representedaspixels 126 2020 3 22 WhyCNNforImage Somepatternsaremuchsmallerthanthewholeimage Aneurondoesnothavetoseethewholeimagetodiscoverthepattern beak detector Connectingtosmallregionwithlessparameters 127 2020 3 22 WhyCNNforImage Thesamepatternsappearindifferentregions upper leftbeak detector middlebeak detector Theycanusethesamesetofparameters Doalmostthesamething 128 2020 3 22 WhyCNNforImage Subsamplingthepixelswillnotchangetheobject subsampling bird bird Wecansubsamplethepixelstomakeimagesmaller Lessparametersforthenetworktoprocesstheimage 129 2020 3 22 ThreeStepsforDeepLearning DeepLearningissosimple ConvolutionalNeuralNetwork 130 2020 3 22 ThewholeCNN FullyConnectedFeedforwardnetwork catdog Convolution MaxPooling Convolution MaxPooling Flatten Canrepeatmanytimes 131 2020 3 22 ThewholeCNN Convolution MaxPooling Convolution MaxPooling Flatten Canrepeatmanytimes Somepatternsaremuchsmallerthanthewholeimage Thesamepatternsappearindifferentregions Subsamplingthepixelswillnotchangetheobject Property1 Property2 Property3 132 2020 3 22 ThewholeCNN FullyConnectedFeedforwardnetwork catdog Convolution MaxPooling Convolution MaxPooling Flatten Canrepeatmanytimes 133 2020 3 22 CNN Convolution 6x6image Filter1 Filter2 Thosearethenetworkparameterstobelearned Matrix Matrix Eachfilterdetectsasmallpattern 3x3 Property1 134 2020 3 22 CNN Convolution 6x6image Filter1 3 1 stride 1 135 2020 3 22 CNN Convolution 6x6image Filter1 3 3 Ifstride 2 Wesetstride 1below 136 2020 3 22 CNN Convolution 6x6image Filter1 3 1 3 1 3 1 0 3 3 3 0 1 3 2 2 1 stride 1 Property2 137 2020 3 22 CNN Convolution 6x6image 3 1 3 1 3 1 0 3 3 3 0 1 3 2 2 1 Filter2 1 1 1 1 1 1 2 1 1 1 2 1 1 0 4 3 Dothesame

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论