深度学习的随机矩阵理论模型-v0.1_第1页
深度学习的随机矩阵理论模型-v0.1_第2页
深度学习的随机矩阵理论模型-v0.1_第3页
深度学习的随机矩阵理论模型-v0.1_第4页
深度学习的随机矩阵理论模型-v0.1_第5页
已阅读5页,还剩64页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

深度学习的随机矩阵理论模型_v0.1深度学习理论-AReview神经网络将许多单一的神经元连接在一起一个神经元的输出作为另一个神经元的输入多层神经网络模型可以理解为多个非线性函数“嵌套”多层神经网络层数可以无限叠加具有无限建模能力,可以拟合任意函数多层神经网络分布式聚类通信成本K均值成本DistributedPCACombinePCA1.71.61.51.41.31.21.01.10123546前向传播SigmoidTanhRectifiedlinearunits(ReLU)常用激活函数层数逐年增加误差逐年下降层数逐年增加今天>1000layersFeaturesarelearnedratherthanhand-craftedMorelayerscapturemoreinvariancesMoredatatotraindeepernetworksMorecomputing(GPUs)Betterregularization:DropoutNewnonlinearitiesMaxpooling,Rectifiedlinearunits(ReLU)Theoreticalunderstandingofdeepnetworksremainsshallow为什么深度学习性能如此之好?[1]Razavian,Azizpour,Sullivan,Carlsson,CNNFeaturesoff-the-shelf:anAstoundingBaselineforRecognition.CVPRW’14[2]Srivastava,N.,Hinton,G.E.,Krizhevsky,A.,Sutskever,I.,andSalakhutdinov,R.(2014).Dropout:asimplewaytopreventneuralnetworksfromoverfitting.Journalofmachinelearningresearch,15(1):1929–1958.[3]Ioffe,S.andSzegedy,C.(2015).Batchnormalization:Acceleratingdeepnetworktrainingbyreducinginternalcovariateshift.InInternationalConferenceonMachineLearning,pages448–456.ExperimentalNeuroscienceuncovered:NeuralarchitectureofRetina/LGN/V1/V2/V3/etcExistenceofneuronswithweightsandactivationfunctions(simplecells)Poolingneurons(complexcells)AllthesefeaturesaresomehowpresentinDeepLearningsystems神经科学带来的启示NeuroscienceDeepNetworkSimplecellsFirstlayerComplexcellsPoolingLayerGrandmothercellsLastlayerOlshausenandFielddemonstratedthatreceptivefieldslearnedfromimagepatches.OlshausenandFieldshowedthatoptimizationprocesscandrivelearningimagerepresentations.OlshausenandField’sWork(Nature,1996)Olshausen-Fieldrepresentationsbearstrongresemblancetodefinedmathematicalobjectsfromharmonicanalysiswavelets,ridgelets,curvelets.Harmonicanalysis:longhistoryofdevelopingoptimalrepresentationsviaoptimizationResearchin1990's:WaveletsetcareoptimalsparsifyingtransformsforcertainclassesofimagesHarmonicanalysisClasspredictionrulecanbeviewedasfunctionf(x)ofhigh-dimensionalargumentCurseofDimensionalityTraditionaltheoreticalobstacletohigh-dimensionalapproximationFunctionsofhighdimensionalxcanwiggleintoomanydimensionstobelearnedfromfinitedatasetsApproximationTheoryApproximationtheoryPerceptronsandmultilayerfeedforwardnetworksareuniversalapproximators:Cybenko’89,Hornik’89,Hornik’91,Barron‘93OptimizationtheoryNospuriouslocaloptimaforlinearnetworks:Baldi&Hornik’89Stuckinlocalminima:Brady‘89Stuckinlocalminima,butconvergenceguaranteesforlinearlyseparabledata:Gori&Tesi‘92Manifoldofspuriouslocaloptima:Frasconi’97EarlyTheoreticalResultsonDeepLearning[1]Cybenko.Approximationsbysuperpositionsofsigmoidalfunctions,MathematicsofControl,Signals,andSystems,2(4),303-314,1989.[2]Hornik,StinchcombeandWhite.Multilayerfeedforwardnetworksareuniversalapproximators,NeuralNetworks,2(3),359-366,1989.[3]Hornik.ApproximationCapabilitiesofMultilayerFeedforwardNetworks,NeuralNetworks,4(2),251–257,1991.[4]Barron.Universalapproximationboundsforsuperpositionsofasigmoidalfunction.IEEETransactionsonInformationTheory,39(3):930–945,1993.[5]PBaldi,KHornik,Neuralnetworksandprincipalcomponentanalysis:Learningfromexleswithoutlocalminima,Neuralnetworks,1989.[6]Brady,Raghavan,Slawny.Backpropagationfailstoseparatewhereperceptronssucceed.IEEETransCircuits&Systems,36(5):665–674,1989.[7]Gori,Tesi.Ontheproblemoflocalminimainbackpropagation.IEEETrans.onPatternAnalysisandMachineIntelligence,14(1):76–86,1992.[8]Frasconi,Gori,Tesi.Successesandfailuresofbackpropagation:Atheoretical.ProgressinNeuralNetworks:Architecture,5:205,1997.Invariance,stability,andlearningtheoryScatteringnetworks:Bruna’11,Bruna’13,Mallat’13DeformationstabilityforLipschitznon-linearities:Wiatowski’15Distanceandmargin-preservingembeddings:Giryes’15,Sokolik‘16Geometry,generalizationboundsanddepthefficiency:Montufar’15,Neyshabur’15,Shashua’14’15’16……RecentTheoreticalResultsonDeepLearning[1]Bruna-Mallat.Classificationwithscatteringoperators,CVPR’11.Invariantscatteringconvolutionnetworks,arXiv’12.MallatWaldspurger.DeepLearningbyScattering,arXiv’13.[2]Wiatowski,Bölcskei.Amathematicaltheoryofdeepconvolutionalneuralnetworksforfeatureextraction.arXiv2015.[3]Giryes,Sapiro,ABronstein.DeepNeuralNetworkswithRandomGaussianWeights:AUniversalClassificationStrategy?arXiv:1504.08291.[4]Sokolic.MarginPreservationofDeepNeuralNetworks,2015[5]Montufar.GeometricandCombinatorialPerspectivesonDeepNeuralNetworks,2015.[6]Neyshabur.TheGeometryofOptimizationandGeneralizationinNeuralNetworks:APath-basedApproach,2015.OptimizationtheoryandalgorithmsLearninglow-degreepolynomialsfromrandominitialization:Andoni‘14Characterizinglosssurfaceandattackingthesaddlepointproblem:Dauphin‘14,Choromanska’15,Chaudhuri‘15Globaloptimalityinneuralnetworktraining:Haeffele’15Non-convexoptimization:Dauphin’14TrainingNNsusingtensormethods:Janzamin’15……RecentTheoreticalResultsonDeepLearning[7]Andoni,Panigraphy,Valiant,Zhang.LearningPolynomialswithNeuralNetworks.ICML2014.[8]Dauphin,Pascanu,Gulcehre,Cho,Ganguli,Bengio,Identifyingandattackingthesaddlepointprobleminhigh-dimensionalnon-convexoptimization.NIPS2014.[9]Choromanska,Henaff,Mathieu,Arous,LeCun,“TheLossSurfacesofMultilayerNetworks,”AISTAT2015.[10]ChaudhuriandSoattoTheEffectofGradientNoiseontheEnergyLandscapeofDeepNetworks,arXiV2015.[11]Haeffele,Vidal.GlobalOptimalityinTensorFactorization,DeepLearningandBeyond,arXiv,2015.[12]Janzamin,Sedghi,Anandkumar,BeatingthePerilsofNon-Convexity:GuaranteedTrainingofNeuralNetworksusingTensorMethods,arxiv2015.[13]Dauphin,YannN.,etal."Identifyingandattackingthesaddlepointprobleminhigh-dimensionalnon-convexoptimization."Advancesinneuralinformationprocessingsystems.2014.RMTofDeepLearningPearson,Fisher,Neyman,经典统计学(1900-1940s)无穷向量的相关性(KarlPearson,1905)有限向量的相关性(Fisher,1924)低维问题,随机变量维数N=2-10高维假设检验基因检测:N=6033基因组,n=102人电网检测:N=3000—10000PMUs,n次采样观测A.N.Kolmogorov,渐近理论(1970-1974)高维协方差矩阵新统计模型:𝑁→∞,n→∞,传统意义上中心极限定理不再适用!随机矩阵理论,E.Wigner(1955),Marchenko-Pastur(1967)估计误差累积,有限的偏差What’sRandomMatrixTheory(RMT)

Theeigenvaluesofanon-Hermitiarandommatrixfollowa‘ring’lawTheunitcirclesarepredictedbyfreeprobabilitytheory.ProductofNon-HermitianRandomMatrices:NoiseOnly165/26/2024L=5L=1TheStieltjestransformGofaprobabilitydistributionisThedistributioncanberecoveredusingtheinversionformulaGiventheStieltjestransformG,theRtransformisdefinedasthesolutiontothefunctionalequationThebenefitoftheRtransform.IfAandBarefreelyindependent,thenStieltjestransform,RtransformGradientdescenttypesalgorithms.UseoneorderinformationUndersufficientregularityconditionsNewtontypemethods.Usetwoorderinformation(curvature)HessianmatrixHessianMatrixGradientdescent(green)andNewton'smethod(red)forminimizingafunction.Hessiandecomposition:LeCun’98GistheslecovariancematrixofthegradientsofmodeloutputsHistheHessianofthemodeloutputsGeneralizedGauss-NewtondecompositionoftheHessian:Sagun’17Modelfunction:Lossfunction:ThegradientofthelossperfixedsleisHessiancanbewrittenasHessiandecomposition[1]YLeCun,LBottou,GBORR,andK-RMüller.Efficientbackprop.Lecturenotesincomputerscience,pages9–50,1998.[2]Sagun,L.,Evci,U.,Guney,V.U.,Dauphin,Y.,&Bottou,L.(2017).Empiricalanalysisofthehessianofover-parametrizedneuralnetworks.DefineHessianofthelosscanbewrittenasHessiandecomposition增加神经元个数“成比例”地增加bulk,但outliers取决于数据Empiricalresults[1]Sagun,L.,Evci,U.,Guney,V.U.,Dauphin,Y.,&Bottou,L.(2017).Empiricalanalysisofthehessianofover-parametrizedneuralnetworks.H=wishart+wignerH0ispositivesemi-definiteH1comesfromsecondderivativesandcontainsalloftheexplicitdependenceontheresidualsGeometryofLossSurfacesviaRMT[1]Pennington,Jeffrey,andYasamanBahri."GeometryofNeuralNetworkLossSurfacesviaRandomMatrixTheory."InternationalConferenceonMachineLearning.2017.UnderveryweakassumptionWithRtransformStieltjestransformGisthesolutionofthecubicequationIndex:负特征值个数,临界值(特征值全为0)SpectraldistributionHessian矩阵的特征值分布直观上可以分成两部分:bulk&outlierBulkisconcentratedaroundzeroOutliersarescatteredawayfromzerobulk反应了网络参数的冗余outliers反应了输入数据的复杂度SingularityoftheHessianindeeplearningthebulkoftheeigenvaluesdependonthearchitecturethetopdiscreteeigenvaluesdependonthedata[1]Sagun,Levent,LéonBottou,andYannLeCun."SingularityoftheHessianinDeepLearning."

arXivpreprintarXiv:1611.07476

(2016).UsingRMTandexactsolutioninlinearmodels,theauthorsderivethegenerationerrorandtraindynamicsoflearning:0特征值,nolearningdynamics,此时初值直接影响泛化性能Non-zerobutsmaller特征值,导致学习很慢,以及严重的过拟合防止过拟合时,参数与样本数差不多是最坏的情况,必须要早停对于很大的网络(参数很多),过度训练影响不大减小,有利于减小泛化误差,即选择较小初值Errorsofshallownetworks[1]Advani,MadhuS.,andAndrewM.Saxe."High-dimensionaldynamicsofgeneralizationerrorinneuralnetworks."arXivpreprintarXiv:1710.03667(2017).[1]Advani,MadhuS.,andAndrewM.Saxe."High-dimensionaldynamicsofgeneralizationerrorinneuralnetworks."arXivpreprintarXiv:1710.03667(2017).NotationandassumptionTheGrammatrix ,Wisarandomweightmatrix,Xisarandomdatamatrix,andfisapointwisenonlinearactivationfunctionAssumption:NonlinearrandommatrixtheoryfordeeplearningMomentmethodThemomentgeneratefunction:TheStieltjestransform:Thek-thmomentoftheLSD:Theideabehindthemomentmethodistocomputethek-thbyexpandingoutpowersofMinsidethetraceas:DefineTheStieltjestransformofthespectraldensityofMsatisfieswhereTheStieltjestransformofM

,

thenwhichispreciselytheequationsatisfiedbytheStieltjestransformoftheMarchenko-PasturdistributionwithshapeparameterLimitingcasesHowtocalculatemomentBackgroundWeightinitializationindeepnetworkscanhaveadramaticimpactonlearningspeed.Ensuringthemeansquaredsingularvalueofanetwork’sinput-outputJacobianisO(1)isessentialforavoidingexponentiallyvanishingorexplodinggradients.Indeeplinearnetworks,ensuringthatallsingularvaluesoftheJacobianareconcentratednear1canyieldadramaticadditionalspeed-upinlearning;thisisapropertyknownasdynamicalisometry.Itisunclearhowtoachievedynamicalisometryinnonlineardeepnetworks.Dynamicalisometry[1]Saxe,AndrewM.,JamesL.McClelland,andSuryaGanguli."Exactsolutionstothenonlineardynamicsoflearningindeeplinearneuralnetworks."arXivpreprintarXiv:1312.6120(2013).ReLUnetworksareincapableofdynamicalisometry;Sigmoidalnetworkscanachieveisometry,butonlywithorthogonalweightinitialization;ControllingtheentiredistributionofJacobiansingularvaluesisanimportantdesignconsiderationindeeplearningDynamicalisometry[1]Pennington,Jeffrey,SamuelSchoenholz,andSuryaGanguli."Resurrectingthesigmoidindeeplearningthroughdynamicalisometry:theoryandpractice."Advancesinneuralinformationprocessingsystems.2017.深度学习应用于医疗数据问题-分析已知结论对象(健康人/病患者)的EEG数据,提出对精神疾病有效的判别依据数据-测试者5分钟的脑电波数据,容量为64*304760,常规统计方法难以提炼出有价值的信息方法1-采用LES指标,可很好判别测试者方法2-采用深度学习效果显著精神病脑电波数据分析分布式聚类通信成本K均值成本DistributedPCACombinePCA1.71.61.51.41.31.21.01.10123546RawData40High-riskindividuals(CHR)40Healthycontrols(HC)40First-episodepatients(FES)DataFormat64*(1000*60*5)=64*300000Samplingfrequency1000Hz随机矩阵深度学习网络共7层;每类人选取75%人做训练,25%人做测试,交叉验证;随机矩阵v.s.深度学习

HCFESCHRAvg0.9490.8950.72985.798.1%吸毒成瘾MRI数据分析RawData(RestingState)30Methamphetamine(MA)29Healthycontrols(HC)SamplingTime8minsSamplingfrequency0.5HzDataSize(64*64*31)*240每个MRI文件切分为31张64x64图片建立31个相应CNN模型,进行分类最终判别结果由31个分类结果投票表决训练集46人(MA:24,HC:22)测试集12人(MA:6,HC:6)深度学习结果随机矩阵结果深度学习HCMA100%70.7%85.33%脑部CT图像识别脑出血/脑梗正常、脑梗、脑出血三类数据总共90个样本输入为256*256,jpg格式CNN网络共7层卷积层每类样本取80%做训练,20%做测试正常梗塞出血、挫伤Avg100%100%50%83.33%脑出血深度学习应用于微波图像微波遥感图像目标检测识别毫米波人体安检仪微波图像安检仪危险/可疑物品有效数据–被检测物体能够清晰可见陶瓷刀20组,有效数据70个水瓶30组,有效数据153个枪30组,有效数据145个金属刀30组,有效数据

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

最新文档

评论

0/150

提交评论