机器学习与概率图模型中科院自动化所系列报告王立威

上传人：自*** IP属地：江西上传时间：2020-05-16 格式：PPT 页数：122 大小：1.18MB 积分：12 举报 版权申诉

已阅读5页，还剩117页未读，继续免费阅读

版权说明：本文档由用户提供并上传，收益归属内容提供方，若内容存在侵权，请进行举报或认领

文档简介

MachineLearningandGraphicalModels,王立威北京大学信息科学技术学院,1,wanglw,(LectureI),Outline,AbriefoverviewofMachineLearningGraphicalModelsRepresentationInferenceLearning,2,DefinitionofMachineLearning:Learningfromexperiences.“AcomputerprogramissaidtolearnfromexperienceEwithrespecttosomeclassoftasksTandperformancemeasureP,ifitsperformanceattasksinT,asmeasuredbyP,improveswithexperienceE.”-TomMitchell,3,“Classical”MachineLearningTasks:Classification:spamfilter,facerecognition,RegressionHookslaw,Keplerslaw,RankingSearchengineProbability(Distribution)Estimation,4,“Classical”MachineLearningAlgorithmsClassificationSVMBoostingRandomForestBagging(Deep)NeuralNetworksRegressionLassoBoosting,5,SupportVectorMachines(SVMs),SVM:thelargemarginclassifier,SVM:hingelossminimization+regularization,Boosting,Boosting:(implicit)largemarginclassifier,Boosting:explossminimization(+regularization),“Classical”MachineLearningTheoriesVCtheoryCapacityofthehypothesisspacePAC-theoryMargintheoryConfidenceEmpiricalProcessesCapacityPAC-BayestheoryPACinBayesframeworkRegularizationCapacity,smoothness,8,MLtheories:QuantificationofOccamsRazor,Force,Length,Hookslaw,Comparisonof“Classical”MachineLearningTheoriesRegularization:BayesianoptimalityOnlyasymptotic(convergence,rate,non-uniform)VC/PAC,Margin,PAC-Bayes,Relativeoptimality(optimalinahypothesisspace)Non-asymptotic(finitesamplebounds),10,Limitationsofthe“Classical”MLRepresentationEuclideanrepresentationforinput.Simplerepresentationforoutput.HowtorepresentSTRUCTURESindata?,11,Outline,AbriefoverviewofMachineLearningGraphicalModelsRepresentationInferenceLearning,12,ChapterI:Representation,13,ProbabilisticGraphicalModels:WhatandWhyPGMs:Amodelforjointprobabilitydistributionoverrandomvariables.Representdependenciesandindependenciesbetweentherandomvariables.Whyisprobabilitydistributionimportant?Genesanddiseases,andeverythingWhyPGMwasinventedbycomputerscientist,whynotthestatisticians?,14,TwotypesofPGMsDirectedgraph:BayesianNetworks(BNs).Undirectedgraph:MarkovRandomFields(MRFs),15,BayesianNetworks(BNs),16,(Intuitively)HowBNsRepresentJointpdfs:,17,Example1:,GivenB,CandAareindependent,Note:Dependencyvs.Causality,(Intuitively)HowBNsRepresentJointpdfs:,18,Example2:,GivenA,BandCareindependent,(Intuitively)HowBNsRepresentJointpdfs:,19,Example3:,BandCareindependent;ButgivenA,BandCareNOTindependent,(Intuitively)HowBNsRepresentJointpdfs:,20,Example4:,Learning:Findafactorizationruleaccordingtopreviousexamples.,21,Factorization:,22,BNmustbeDAG,Thegraphmustbeacyclic!,Definition(FactorizeaccordingtoaDAG):AprobabilitydistributionPissaidtobefactorizedaccordingtoadirectedacyclicgraphGif,23,Definition(BayesianNetwork):ABayesiannetworkisapair(P,G)ofaprobabilitydistributionandaDAG,wherePisfactorizedaccordingtoG,andallthe“local”conditionalprobabilitydistributionsaregiven.,24,Giventhefactorization,whichvariablesareindependentofC,givenCsParentsAandB?,25,26,Question:LetCbeanode(randomvariable)inaBN.Whichnodes(randomvariables)areindependentofC,givenCsParents?,27,Theorem(LocalMarkovpropertyforBN):ForanynodeC(randomvariable)inaBN,allnodesthatarenotdescendentsofCareindependentofC,givenCsparents.,Sparsevs.Dense,28,VS,WhatisthejointpdfoftherightBN?IsthereanyindependenceintherightBN?,Question:GivenaBN=(P,G),canyoudetermine,forthreearbitrarysetsofrandomvariablesX=,Y=,andZ=,whetherthefollowingconditionalindependencyhold?,29,Definition(activetrailinBN)Letx,ybetwonodesandZbeasetofnodes.ApathbetweenxandyaresaidtobeanactivetrialgivenZ,ifthefollowingsaretrue:Letbethepath,thenmoroneofitsdescendantsisnotinZ.Thatis,wheneverthereisa“v-structure”inthepath,themiddlenodeoroneofitsdescendantsisinZ;NoothernodealongthepathisinZ.,30,Definition(D-separationinBN)LetX,Y,Zbethreesetsofnodes.XandYaresaidtobeD-separatedbyZifforeverynodexinXandeveryyinY,andeverypathbetweenxandy,thepathisnotanactivetrialgivenZ.Theorem(Informal)TheindependenciesinaBNareexactlythosecharacterizedbyD-separation.,31,TheoremForanyBN(P,G),andarbitrarysetsofnodesX,Y,Z.IfXandYareD-separatedbyZinG,then,32,TheoremForanyDAGG,andanysetsofnodesX,Y,Z.IfXandYarenotD-separatedbyZinG,thentheremustexistaprobabilitydistributionPwhichfactorizeaccordingtoG,but,33,34,IsthereaBNthathaspreciselytheseindependencies?,NoteverydistributioncanberepresentedbyaBNsatisfyingexactlyalltheindependencies!,TheRepresentationLimitofBN,Tosumup:BNrepresentsjointprobabilitydistributionsthatcanfactorizeaccordingto:ThelocalindependenciesinBNarecharacterizedbyparentsandnon-descendants.TheglobalindependenciesinBNarecharacterizedbyD-separation.,35,MarkovRandomFields(MRF),36,HowMRFsRepresentJointpdfs:,37,factors,Partitionfunction,HowMRFsRepresentJointpdfs:,38,Factorscorrespondtomaximalcliques.Thejointdistributionistheproductofallfactorsnormalizedbythepartitionfunction.,FormalDefinition(MRFs):AMarkovnetworkisapair(P,G),whereGisanundirectedgraphandPfactorizesaccordingtoG,i.e.,Phastheform,39,whereeachisa(maximal)cliqueinG.,TheindependenceinMRF:,40,Easytosee:,Question:GivenaMRF=(P,G),canyoudetermine,forthreearbitrarysetsofrandomvariablesX=,Y=,andZ=,whetherthefollowingconditionalindependencyhold?,41,Definition(Separation)LetX,Y,ZbethreesetsofnodesinanundirectedgraphG.XandYaresaidtobeseparatedbyZifforeverynodexinXandeveryyinY,andeverypathbetweenxandy,thereisanodeinthepaththatbelongstoZ.Theorem(Informal)AllindependenciesinMRFarecharacterizedbyseparation.,42,TheoremForanyMRF(P,G),andarbitrarysetsofnodesX,Y,Z.IfXandYareseparatedbyZinG,then,43,TheoremForanyundirectedgraphG,andanysetsofnodesX,Y,Z.IfXandYarenotseparatedbyZinG,thentheremustexistaprobabilitydistributionPwhichfactorizeaccordingtoG,but,44,MachineLearningandGraphicalModels,王立威北京大学信息科学技术学院,45,wanglw,(LectureII),Outline,AbriefoverviewofMachineLearningGraphicalModelsRepresentationLearningInference,46,ChapterII:Learning,47,LearningGraphicalModels:DefinitionofBayesianNetworks:ABayesiannetworkisapair(P,G)ofaprobabilitydistributionandaDAG,wherePisfactorizedaccordingtoG,andallthe“local”conditionalprobabilitydistributionsaregiven.,48,LearningBayesianNetworksBN=Structure(graph)+LocalconditionaldistributionLearningBN:HowtolearndistributionsparametersfromdataRelativelysimple,standardparameterestimation.HowtolearnstructurefromdataVerydifficult!Why?,49,Structurelearning:Structure(graph):edgesbetweennodes.IsthestructureofaBNlearnable?Note:theedgeACexistsornotequalstowhetherthefollowingequationholdsstrictly.,50,Usefulstructurelearningmethods:Constraint-basedstructurelearning.Usinghypothesistesttoobtainindependencies.Constructthegraph.Score-basedstructurelearning:Penalizing“dense”graphLikelihoodscoresBICMDLAIC,51,OpenProblems,RobustLearningofStructures?,52,Outline,AbriefoverviewofMachineLearningGraphicalModelsRepresentationLearningInference,53,ChapterII:Inference,54,WhatisinferenceinGMThehardnessofinferenceinGMExactinferencealgorithmsApproximateinferencealgorithmsFutureresearchdirections,55,WhatisinferenceinGM,56,Input:agraphandthelocalconditionaldistributions.Goal:twotypesofinferenceConditionalprobabilityMAPinference,57,ThehardnessofinferenceinGM,58,ExactinferenceinGMishardDecisionversionofexactinferenceisNPC.Exactinferenceis#Pcomplete.ApproximateinferenceinGMishard-approximateinferenceisNP-hardforevery,59,Thm.1:DecideisNPcomplete.Proof:Reductionfrom3SAT.Thm.2:Computeis#Pcomplete.Proof:Useabovereductionfrom#3SAT.ALevinreduction,certificatesareone-to-one.Thm.3:Forevery,computean-approximateofisNP-hard.,60,Proof:isan-approximationofmeansthatClearly,ifonehasan-approximationof,onecansolvetheNPCproblem,61,Remark:ForabsoluteapproximateitsstillNP-hard.,Thm.4:ExactandapproximateMAPinferenceareallhard.Conclusion:Theworst-casecomplexityoftheinferences,bothexactandapproximateareNP-hard.,62,ExactInferenceAlgorithms,63,TherelationbetweenBNandMRFFromBNtoMRF(factors),64,BN:,MRF:,BN-MRF:,FromBNtoMRF(graphs)Moralgraph（联姻图）Deletethedirectionsforedges;Connectingtheparents.,65,Exactinferencealgorithms:Thevariableeliminationalgorithm.Beliefpropagation:thesum-productalgorithm.Beliefupdate.Whystudyexactinferencealgorithms?Gainintuitionandinsightfordevelopingusefulapproximatealgorithms.,66,Thevariableeliminationalgorithmquery:distribution:solve:,67,Variableelimination-dynamicprogramming,68,69,Step1:,Step2:,(g,e),(a,e,f),70,Step3:,Step4:,(a,c,d,e),(a,c,d),71,Step5:,Step6:,(a,b,c),(a,b),MessagePassing:theSPalgorithmVariableeliminationinducedcliquetree,72,Runningintersectionproperty!,Familypreservingproperty:factorsscope,Generalmessagepassingfromclique,73,:originalterms(potential)of,Variableeliminationasmessagepassingoncliquetree:,TheSum-Productbeliefpropagationalgorithm:Constructacliquetreethatsatisfiesthefamilypreservingandrunningintersectionproperties.Choosecliquewherethequeryvariableliesinastheroot.Messagepassingfromtheleaves.Afterallmessagesarriveattheroot,sumoverallothervariablesintherootcliqueotherthanthequeryvariable.,74,Thm:TheSum-Productalgorithmalwaysgivesthecorrectanswerforanycliquetree!,Whatifwewanttocomputekqueries?Messagepassingktimes?No!Messagecanbereused.Foreachedgeinthecliquetree,twiceisenough,oneforeachdirection(up-down),75,Aftermessagepassingonalledgesintwodirections,eachcliquehasabelief(jointprobability).Belief:Thesystemmustsatisfythecalibrationproperty:,76,Whycalibrationholds?,77,Agreeonthemarginalprobabilityof!,Beliefsaremarginalprobabilitiesonthecliques!,Beliefupdatealgorithm:Messagepassing:Belief:So,78,Propagatingbeliefsfrom,Beliefupdatealgorithm:ConstructcliquetreeInitialize:Update:,79,Howtoconstructcliquetree?Graph:Cliquetree?Thegraphcannothavea4-loop(orlarger)!Solution:triangulationandchordalgraph.,80,No!,WhatifthegraphisanIsingmodel,Triangulationandchordalgraphinducecliquetreesthathavelargewidthandthecomputationalcomplexityisveryhigh.,81,ConstructionofCliqueTrees:VariableElimination;BNtoMoralgraph,thentochordalgraph,thenfindallthemaximumcliques,andthenusethemaximumspanningtreealgorithmtoconstructtheedgesofthecliquetree.,82,ReflectionoftheSum-ProductBeliefPropagationalgorithm:Thealgorithmitselfdoesrequirethecliquetreehastobea“tree”.Whatifweconstructageneral“cliquegraph”insteadofa“cliquetree”andruntheSPalgorithm?,83,Cliquetree,Clustergraph,84,IsingModel,ClusterGraph,WhatstheresultifrunSum-Productbeliefpropagationontheclustergraph?,85,Loops?,Convergence?,Correctness?,Thisisthe“LoopyBeliefPropagation”algorithm!,MachineLearningandGraphicalModels,王立威北京大学信息科学技术学院,86,wanglw,(LectureIII),WhatisinferenceinGMThehardnessofinferenceinGMExactinferencealgorithmsApproximateinferencealgorithmsFutureresearchdirections,87,Approximateinferencemethods:VariationalMethods.(Loopy)beliefpropagation.Meanfieldmethod.Samplingbasedmethods.Metropolis-HastingsGibbsSamplingImportanceSampling,88,VariationalMethods,89,(Loopy)beliefpropagation:,90,Clustergraph,Loopybeliefpropagation:justletthebeliefpropagates.Oncliquetrees,beliefpropagationconvergesafterpropagatingonalledges(twodirections).Forgeneralclustergraphs,itisnotguaranteedtoconverge.Evenifitconverges,itcanconvergetoawronganswer.,91,Variationalexplanationofthe(loopy)beliefpropagation:ExactinferenceProposition:Thefollowingoptimizationproblemgivesthesameresultasexactinference:FindasetofbeliefstominimizetheHelmholtzfreeenergy(orequivalentlytherelativeentropy)tothefactoreddistributiononcliquetreesundertheconstraintsofcalibration(localagreement).,92,Thefixedpointcharacterizationofthesolutionoftheoptimizationproblem:,93,ApproximateinferenceFindasetofbeliefstominimizethefactoredfreeenergytothefactoreddistributiononclustergraphsundertheconstraintsofcalibration.Thefixedpointcharacterizationofthesolutionofthisoptimizationproblemisthe(loopy)beliefpropagationformula.,94,Meanfieldmethod:Usingfirstorderapproximationofthetargetdistribution:Thefixedpointequationandupdatingrulehavesimpleforms.,95,SamplingBasedMethods,96,Sampling-based(MonteCarlo)algorithms:Idea:Usingsamplebasedfrequencytoapproximatetheprobability.ProbabilityisExpectation:Expectation(ofafunction)canbeapproximatedbysampleaverage:,97,MonteCarloisusefulinmanyproblems:Highdimensionalnumericalintegration.Howtogeneratethesamplewhenthetargetprobabilitydistributionisdifficulttocompute?MarkovChainMonteCarlo(MCMC)Keyidea:GeneratedatabyMarkovchain,whosestationarydistributionisthetargetpdf.,98,AbriefreviewoffinitestateMarkovchains:Reduciblevs.IrreduciblePeriodicvs.AperiodicErgodic:nomatterwhichinitialstateis,theprocesswillconvergetoauniquestationarydistribution.,99,RegularMC:1)foreverypairofstatesx,x,theprob.thatxwillreachxinkstepsforsomefinitekispositive;2)foreverystatex,thereispositiveprob.thatxwillstayinxinthenextstep.Theorem(sufficientconditionforergodicMC):IfafinitestateMCisregular,thenitisergodic.,100,Easytounderstandviathegraphview,RegularMarkovchainsaregoodforourpurpose.WeruntheMarkovchain,andwaitforitconvergestothestationarydistribution,thenthedatacanbeusedforapproximatecalculation.But,howlongwillittakefortheMCtoconverge?,101,GibbsSampling,102,GibbsSamplingOneofthesimplestsamplingalgorithm.Assumefor,iseasytosample.EasytoimplementforGraphicalModels.ProposedbyGemanAlgorithm:DrawfromsomeinitialdistributionFor1.2.ForeachSampleaccordingtoReturn,104,WhyGibbssamplingiseasyforPGM?Sample,105,OnlythefactorsthatinvolveFleft!,Generally,forbothBayesiannetworksandMarkovnetworks,theconditionalprobabilityforGibbssamplinginvolvesonlyfactorsthatthequeryrandomvariablelivesin.TriviallygeneralizetothecasewherethereisevidenceDrawasequenceofexampleswherethetargetdistributionis,106,Theorem:TheGibbssamplingprocesshasauniquestationarydistribution(or)DisadvantagesofGibbssamplingforPGMs:Slowconvergencetostationarydistribution.,107,Metropolis-HastingsAlgorithm,108,ForGibbssampling,weassumethatitiseasytogenerateasamplefrom.Butsometimes,thisisdifficult.Moregenerally,foratargetdistribution,itmaybeverydifficulttogeneratesampledirectlyaccordingto,doesMCMChelp?TheideaofMetropolis-Hastings:Usingaproposaldistribution:atransitionmodel.,109,AnimportantresultforMarkovchain:DetailedBalance(ReversibleMarkovchain):Definition:AfinitestateMCwithtransitionprobabilitymatrixTissaidtobereversibleifthereexistsauniquedistributionPsuchthatforallx,xTheaboveequationiscalleddetailedbalance.Reversible:foranysequence,theprobabilitythatitoccursintheprocessisthesameastheprobabilitythatoccurs.,110,Theorem:IfthetransitionmatrixTdefinesaregularMarkovchain,andTsatisfiesthedetailedbalancew.r.p.toP,thenPistheuniquestationarydistributiono

人人文库> 全部分类> 行业资料 > 信息产业

温馨提示

1. 本站所有资源如无特殊说明，都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
2. 本站的文档不包含任何第三方提供的附件图纸等，如果需要附件，请联系上传者。文件的所有权益归上传用户所有。
3. 本站RAR压缩包中若带图纸，网页内容里面会有图纸预览，若没有图纸预览就没有图纸。
4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
5. 人人文库网仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对用户上传分享的文档内容本身不做任何修改或编辑，并不能对任何下载内容负责。
6. 下载文件中如有侵权或不适当内容，请与我们联系，我们立即纠正。
7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

机器学习与概率图模型中科院自动化所系列报告王立威

文档简介

温馨提示

最新文档

评论

机器学习与概率图模型中科院自动化所系列报告王立威

文档简介

温馨提示

最新文档

评论

相关文档