 
         
         
         
         
        
            已阅读5页,还剩117页未读,            继续免费阅读
        
        
                版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
            MachineLearningandGraphicalModels,王立威北京大学信息科学技术学院,1,wanglw,(LectureI),Outline,AbriefoverviewofMachineLearningGraphicalModelsRepresentationInferenceLearning,2,DefinitionofMachineLearning:Learningfromexperiences.“AcomputerprogramissaidtolearnfromexperienceEwithrespecttosomeclassoftasksTandperformancemeasureP,ifitsperformanceattasksinT,asmeasuredbyP,improveswithexperienceE.”-TomMitchell,3,“Classical”MachineLearningTasks:Classification:spamfilter,facerecognition,RegressionHookslaw,Keplerslaw,RankingSearchengineProbability(Distribution)Estimation,4,“Classical”MachineLearningAlgorithmsClassificationSVMBoostingRandomForestBagging(Deep)NeuralNetworksRegressionLassoBoosting,5,SupportVectorMachines(SVMs),SVM:thelargemarginclassifier,SVM:hingelossminimization+regularization,Boosting,Boosting:(implicit)largemarginclassifier,Boosting:explossminimization(+regularization),“Classical”MachineLearningTheoriesVCtheoryCapacityofthehypothesisspacePAC-theoryMargintheoryConfidenceEmpiricalProcessesCapacityPAC-BayestheoryPACinBayesframeworkRegularizationCapacity,smoothness,8,MLtheories:QuantificationofOccamsRazor,Force,Length,Hookslaw,Comparisonof“Classical”MachineLearningTheoriesRegularization:BayesianoptimalityOnlyasymptotic(convergence,rate,non-uniform)VC/PAC,Margin,PAC-Bayes,Relativeoptimality(optimalinahypothesisspace)Non-asymptotic(finitesamplebounds),10,Limitationsofthe“Classical”MLRepresentationEuclideanrepresentationforinput.Simplerepresentationforoutput.HowtorepresentSTRUCTURESindata?,11,Outline,AbriefoverviewofMachineLearningGraphicalModelsRepresentationInferenceLearning,12,ChapterI:Representation,13,ProbabilisticGraphicalModels:WhatandWhyPGMs:Amodelforjointprobabilitydistributionoverrandomvariables.Representdependenciesandindependenciesbetweentherandomvariables.Whyisprobabilitydistributionimportant?Genesanddiseases,andeverythingWhyPGMwasinventedbycomputerscientist,whynotthestatisticians?,14,TwotypesofPGMsDirectedgraph:BayesianNetworks(BNs).Undirectedgraph:MarkovRandomFields(MRFs),15,BayesianNetworks(BNs),16,(Intuitively)HowBNsRepresentJointpdfs:,17,Example1:,GivenB,CandAareindependent,Note:Dependencyvs.Causality,(Intuitively)HowBNsRepresentJointpdfs:,18,Example2:,GivenA,BandCareindependent,(Intuitively)HowBNsRepresentJointpdfs:,19,Example3:,BandCareindependent;ButgivenA,BandCareNOTindependent,(Intuitively)HowBNsRepresentJointpdfs:,20,Example4:,Learning:Findafactorizationruleaccordingtopreviousexamples.,21,Factorization:,22,BNmustbeDAG,Thegraphmustbeacyclic!,Definition(FactorizeaccordingtoaDAG):AprobabilitydistributionPissaidtobefactorizedaccordingtoadirectedacyclicgraphGif,23,Definition(BayesianNetwork):ABayesiannetworkisapair(P,G)ofaprobabilitydistributionandaDAG,wherePisfactorizedaccordingtoG,andallthe“local”conditionalprobabilitydistributionsaregiven.,24,Giventhefactorization,whichvariablesareindependentofC,givenCsParentsAandB?,25,26,Question:LetCbeanode(randomvariable)inaBN.Whichnodes(randomvariables)areindependentofC,givenCsParents?,27,Theorem(LocalMarkovpropertyforBN):ForanynodeC(randomvariable)inaBN,allnodesthatarenotdescendentsofCareindependentofC,givenCsparents.,Sparsevs.Dense,28,VS,WhatisthejointpdfoftherightBN?IsthereanyindependenceintherightBN?,Question:GivenaBN=(P,G),canyoudetermine,forthreearbitrarysetsofrandomvariablesX=,Y=,andZ=,whetherthefollowingconditionalindependencyhold?,29,Definition(activetrailinBN)Letx,ybetwonodesandZbeasetofnodes.ApathbetweenxandyaresaidtobeanactivetrialgivenZ,ifthefollowingsaretrue:Letbethepath,thenmoroneofitsdescendantsisnotinZ.Thatis,wheneverthereisa“v-structure”inthepath,themiddlenodeoroneofitsdescendantsisinZ;NoothernodealongthepathisinZ.,30,Definition(D-separationinBN)LetX,Y,Zbethreesetsofnodes.XandYaresaidtobeD-separatedbyZifforeverynodexinXandeveryyinY,andeverypathbetweenxandy,thepathisnotanactivetrialgivenZ.Theorem(Informal)TheindependenciesinaBNareexactlythosecharacterizedbyD-separation.,31,TheoremForanyBN(P,G),andarbitrarysetsofnodesX,Y,Z.IfXandYareD-separatedbyZinG,then,32,TheoremForanyDAGG,andanysetsofnodesX,Y,Z.IfXandYarenotD-separatedbyZinG,thentheremustexistaprobabilitydistributionPwhichfactorizeaccordingtoG,but,33,34,IsthereaBNthathaspreciselytheseindependencies?,NoteverydistributioncanberepresentedbyaBNsatisfyingexactlyalltheindependencies!,TheRepresentationLimitofBN,Tosumup:BNrepresentsjointprobabilitydistributionsthatcanfactorizeaccordingto:ThelocalindependenciesinBNarecharacterizedbyparentsandnon-descendants.TheglobalindependenciesinBNarecharacterizedbyD-separation.,35,MarkovRandomFields(MRF),36,HowMRFsRepresentJointpdfs:,37,factors,Partitionfunction,HowMRFsRepresentJointpdfs:,38,Factorscorrespondtomaximalcliques.Thejointdistributionistheproductofallfactorsnormalizedbythepartitionfunction.,FormalDefinition(MRFs):AMarkovnetworkisapair(P,G),whereGisanundirectedgraphandPfactorizesaccordingtoG,i.e.,Phastheform,39,whereeachisa(maximal)cliqueinG.,TheindependenceinMRF:,40,Easytosee:,Question:GivenaMRF=(P,G),canyoudetermine,forthreearbitrarysetsofrandomvariablesX=,Y=,andZ=,whetherthefollowingconditionalindependencyhold?,41,Definition(Separation)LetX,Y,ZbethreesetsofnodesinanundirectedgraphG.XandYaresaidtobeseparatedbyZifforeverynodexinXandeveryyinY,andeverypathbetweenxandy,thereisanodeinthepaththatbelongstoZ.Theorem(Informal)AllindependenciesinMRFarecharacterizedbyseparation.,42,TheoremForanyMRF(P,G),andarbitrarysetsofnodesX,Y,Z.IfXandYareseparatedbyZinG,then,43,TheoremForanyundirectedgraphG,andanysetsofnodesX,Y,Z.IfXandYarenotseparatedbyZinG,thentheremustexistaprobabilitydistributionPwhichfactorizeaccordingtoG,but,44,MachineLearningandGraphicalModels,王立威北京大学信息科学技术学院,45,wanglw,(LectureII),Outline,AbriefoverviewofMachineLearningGraphicalModelsRepresentationLearningInference,46,ChapterII:Learning,47,LearningGraphicalModels:DefinitionofBayesianNetworks:ABayesiannetworkisapair(P,G)ofaprobabilitydistributionandaDAG,wherePisfactorizedaccordingtoG,andallthe“local”conditionalprobabilitydistributionsaregiven.,48,LearningBayesianNetworksBN=Structure(graph)+LocalconditionaldistributionLearningBN:HowtolearndistributionsparametersfromdataRelativelysimple,standardparameterestimation.HowtolearnstructurefromdataVerydifficult!Why?,49,Structurelearning:Structure(graph):edgesbetweennodes.IsthestructureofaBNlearnable?Note:theedgeACexistsornotequalstowhetherthefollowingequationholdsstrictly.,50,Usefulstructurelearningmethods:Constraint-basedstructurelearning.Usinghypothesistesttoobtainindependencies.Constructthegraph.Score-basedstructurelearning:Penalizing“dense”graphLikelihoodscoresBICMDLAIC,51,OpenProblems,RobustLearningofStructures?,52,Outline,AbriefoverviewofMachineLearningGraphicalModelsRepresentationLearningInference,53,ChapterII:Inference,54,WhatisinferenceinGMThehardnessofinferenceinGMExactinferencealgorithmsApproximateinferencealgorithmsFutureresearchdirections,55,WhatisinferenceinGM,56,Input:agraphandthelocalconditionaldistributions.Goal:twotypesofinferenceConditionalprobabilityMAPinference,57,ThehardnessofinferenceinGM,58,ExactinferenceinGMishardDecisionversionofexactinferenceisNPC.Exactinferenceis#Pcomplete.ApproximateinferenceinGMishard-approximateinferenceisNP-hardforevery,59,Thm.1:DecideisNPcomplete.Proof:Reductionfrom3SAT.Thm.2:Computeis#Pcomplete.Proof:Useabovereductionfrom#3SAT.ALevinreduction,certificatesareone-to-one.Thm.3:Forevery,computean-approximateofisNP-hard.,60,Proof:isan-approximationofmeansthatClearly,ifonehasan-approximationof,onecansolvetheNPCproblem,61,Remark:ForabsoluteapproximateitsstillNP-hard.,Thm.4:ExactandapproximateMAPinferenceareallhard.Conclusion:Theworst-casecomplexityoftheinferences,bothexactandapproximateareNP-hard.,62,ExactInferenceAlgorithms,63,TherelationbetweenBNandMRFFromBNtoMRF(factors),64,BN:,MRF:,BN-MRF:,FromBNtoMRF(graphs)Moralgraph(联姻图)Deletethedirectionsforedges;Connectingtheparents.,65,Exactinferencealgorithms:Thevariableeliminationalgorithm.Beliefpropagation:thesum-productalgorithm.Beliefupdate.Whystudyexactinferencealgorithms?Gainintuitionandinsightfordevelopingusefulapproximatealgorithms.,66,Thevariableeliminationalgorithmquery:distribution:solve:,67,Variableelimination-dynamicprogramming,68,69,Step1:,Step2:,(g,e),(a,e,f),70,Step3:,Step4:,(a,c,d,e),(a,c,d),71,Step5:,Step6:,(a,b,c),(a,b),MessagePassing:theSPalgorithmVariableeliminationinducedcliquetree,72,Runningintersectionproperty!,Familypreservingproperty:factorsscope,Generalmessagepassingfromclique,73,:originalterms(potential)of,Variableeliminationasmessagepassingoncliquetree:,TheSum-Productbeliefpropagationalgorithm:Constructacliquetreethatsatisfiesthefamilypreservingandrunningintersectionproperties.Choosecliquewherethequeryvariableliesinastheroot.Messagepassingfromtheleaves.Afterallmessagesarriveattheroot,sumoverallothervariablesintherootcliqueotherthanthequeryvariable.,74,Thm:TheSum-Productalgorithmalwaysgivesthecorrectanswerforanycliquetree!,Whatifwewanttocomputekqueries?Messagepassingktimes?No!Messagecanbereused.Foreachedgeinthecliquetree,twiceisenough,oneforeachdirection(up-down),75,Aftermessagepassingonalledgesintwodirections,eachcliquehasabelief(jointprobability).Belief:Thesystemmustsatisfythecalibrationproperty:,76,Whycalibrationholds?,77,Agreeonthemarginalprobabilityof!,Beliefsaremarginalprobabilitiesonthecliques!,Beliefupdatealgorithm:Messagepassing:Belief:So,78,Propagatingbeliefsfrom,Beliefupdatealgorithm:ConstructcliquetreeInitialize:Update:,79,Howtoconstructcliquetree?Graph:Cliquetree?Thegraphcannothavea4-loop(orlarger)!Solution:triangulationandchordalgraph.,80,No!,WhatifthegraphisanIsingmodel,Triangulationandchordalgraphinducecliquetreesthathavelargewidthandthecomputationalcomplexityisveryhigh.,81,ConstructionofCliqueTrees:VariableElimination;BNtoMoralgraph,thentochordalgraph,thenfindallthemaximumcliques,andthenusethemaximumspanningtreealgorithmtoconstructtheedgesofthecliquetree.,82,ReflectionoftheSum-ProductBeliefPropagationalgorithm:Thealgorithmitselfdoesrequirethecliquetreehastobea“tree”.Whatifweconstructageneral“cliquegraph”insteadofa“cliquetree”andruntheSPalgorithm?,83,Cliquetree,Clustergraph,84,IsingModel,ClusterGraph,WhatstheresultifrunSum-Productbeliefpropagationontheclustergraph?,85,Loops?,Convergence?,Correctness?,Thisisthe“LoopyBeliefPropagation”algorithm!,MachineLearningandGraphicalModels,王立威北京大学信息科学技术学院,86,wanglw,(LectureIII),WhatisinferenceinGMThehardnessofinferenceinGMExactinferencealgorithmsApproximateinferencealgorithmsFutureresearchdirections,87,Approximateinferencemethods:VariationalMethods.(Loopy)beliefpropagation.Meanfieldmethod.Samplingbasedmethods.Metropolis-HastingsGibbsSamplingImportanceSampling,88,VariationalMethods,89,(Loopy)beliefpropagation:,90,Clustergraph,Loopybeliefpropagation:justletthebeliefpropagates.Oncliquetrees,beliefpropagationconvergesafterpropagatingonalledges(twodirections).Forgeneralclustergraphs,itisnotguaranteedtoconverge.Evenifitconverges,itcanconvergetoawronganswer.,91,Variationalexplanationofthe(loopy)beliefpropagation:ExactinferenceProposition:Thefollowingoptimizationproblemgivesthesameresultasexactinference:FindasetofbeliefstominimizetheHelmholtzfreeenergy(orequivalentlytherelativeentropy)tothefactoreddistributiononcliquetreesundertheconstraintsofcalibration(localagreement).,92,Thefixedpointcharacterizationofthesolutionoftheoptimizationproblem:,93,ApproximateinferenceFindasetofbeliefstominimizethefactoredfreeenergytothefactoreddistributiononclustergraphsundertheconstraintsofcalibration.Thefixedpointcharacterizationofthesolutionofthisoptimizationproblemisthe(loopy)beliefpropagationformula.,94,Meanfieldmethod:Usingfirstorderapproximationofthetargetdistribution:Thefixedpointequationandupdatingrulehavesimpleforms.,95,SamplingBasedMethods,96,Sampling-based(MonteCarlo)algorithms:Idea:Usingsamplebasedfrequencytoapproximatetheprobability.ProbabilityisExpectation:Expectation(ofafunction)canbeapproximatedbysampleaverage:,97,MonteCarloisusefulinmanyproblems:Highdimensionalnumericalintegration.Howtogeneratethesamplewhenthetargetprobabilitydistributionisdifficulttocompute?MarkovChainMonteCarlo(MCMC)Keyidea:GeneratedatabyMarkovchain,whosestationarydistributionisthetargetpdf.,98,AbriefreviewoffinitestateMarkovchains:Reduciblevs.IrreduciblePeriodicvs.AperiodicErgodic:nomatterwhichinitialstateis,theprocesswillconvergetoauniquestationarydistribution.,99,RegularMC:1)foreverypairofstatesx,x,theprob.thatxwillreachxinkstepsforsomefinitekispositive;2)foreverystatex,thereispositiveprob.thatxwillstayinxinthenextstep.Theorem(sufficientconditionforergodicMC):IfafinitestateMCisregular,thenitisergodic.,100,Easytounderstandviathegraphview,RegularMarkovchainsaregoodforourpurpose.WeruntheMarkovchain,andwaitforitconvergestothestationarydistribution,thenthedatacanbeusedforapproximatecalculation.But,howlongwillittakefortheMCtoconverge?,101,GibbsSampling,102,GibbsSamplingOneofthesimplestsamplingalgorithm.Assumefor,iseasytosample.EasytoimplementforGraphicalModels.ProposedbyGemanAlgorithm:DrawfromsomeinitialdistributionFor1.2.ForeachSampleaccordingtoReturn,104,WhyGibbssamplingiseasyforPGM?Sample,105,OnlythefactorsthatinvolveFleft!,Generally,forbothBayesiannetworksandMarkovnetworks,theconditionalprobabilityforGibbssamplinginvolvesonlyfactorsthatthequeryrandomvariablelivesin.TriviallygeneralizetothecasewherethereisevidenceDrawasequenceofexampleswherethetargetdistributionis,106,Theorem:TheGibbssamplingprocesshasauniquestationarydistribution(or)DisadvantagesofGibbssamplingforPGMs:Slowconvergencetostationarydistribution.,107,Metropolis-HastingsAlgorithm,108,ForGibbssampling,weassumethatitiseasytogenerateasamplefrom.Butsometimes,thisisdifficult.Moregenerally,foratargetdistribution,itmaybeverydifficulttogeneratesampledirectlyaccordingto,doesMCMChelp?TheideaofMetropolis-Hastings:Usingaproposaldistribution:atransitionmodel.,109,AnimportantresultforMarkovchain:DetailedBalance(ReversibleMarkovchain):Definition:AfinitestateMCwithtransitionprobabilitymatrixTissaidtobereversibleifthereexistsauniquedistributionPsuchthatforallx,xTheaboveequationiscalleddetailedbalance.Reversible:foranysequence,theprobabilitythatitoccursintheprocessisthesameastheprobabilitythatoccurs.,110,Theorem:IfthetransitionmatrixTdefinesaregularMarkovchain,andTsatisfiesthedetailedbalancew.r.p.toP,thenPistheuniquestationarydistributiono        
    温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 2025年教师资格考试《学科教学内容与方法》备考题库及答案解析
- 货车从业考试试题及答案解析
- 安全原理陈宝智题库及答案解析
- 2025年【N1叉车司机】考试内容及N1叉车司机作业考试题库(含答案)
- 会计制度设计-成本核算业务会计制度的设计(二)-2-真题含答案与解析
- 养老护理员中级笔试题及答案
- 2025年摄影师职业资格考试《摄影技术与构图原理》备考题库及答案解析
- 2025年新版施工考试题目及答案
- 入党积极分子发展对象考试真题汇编含答案详解(满分必刷)
- 2025年电子商务推广专员备考题库及答案解析
- 高教社马工程伦理学(第二版)教学课件06
- 内河船舶保险年费率
- 《电影场景构图》课件
- 《种鸡场卫生管理》课件
- 《工业园区清洁生产审核指南》
- “职”引未来知到智慧树章节测试课后答案2024年秋云南师范大学
- 《IBM战略人才》课件
- 《城市道路水下隧道设计规范》
- 半导体材料行业报告:InP 磷化铟衬底
- 酒店客房服务与卫生标准
- 工程热力学(严家騄)课后答案
 
            
评论
0/150
提交评论