贝叶斯学习综述

上传人：2*** IP属地：湖北上传时间：2021-12-16 格式：PPT 页数：89 大小：1.61MB 积分：30 举报 版权申诉

已阅读5页，还剩84页未读，继续免费阅读

版权说明：本文档由用户提供并上传，收益归属内容提供方，若内容存在侵权，请进行举报或认领

文档简介

1、机器学习与人工神经网络Machine learning and Artificial Neural Networks西安交通大学电信学院自动化系杜友田2第三章贝叶斯学习方法3.1 极大似然估计3.2 贝叶斯学习3.3 朴素贝叶斯方法3.4 贝叶斯网络 3.4.1 贝叶斯网络概念 3.4.2 贝叶斯网络推理第三章第三章贝叶斯贝叶斯学习方法学习方法3.1 极大似然估计3.2 贝叶斯学习3.3 朴素贝叶斯方法3.4 贝叶斯网络3极大似然参数估计所谓极大似然法( maximum likelihood method )是指选择使事件发生概率最大的可能情况的参数估计方法。极大似然法包括2个步骤：

2、 1）建立包括有该参数估计量的似然函数( likelihood function ) 2）根据实验数据求出似然函数达极值时的参数估计量或估计值Maximum likelihood estimation (MLE)第三章贝叶斯学习方法3.1 极大似然估计3.2 贝叶斯学习3.3 朴素贝叶斯方法3.4 贝叶斯网络4 对于离散型随机变量，似然函数是多个独立事件的概率函数的乘积，该乘积是概率函数值，它是关于总体参数的函数。例如，一只大口袋里有红、白、黑3种球，采用复置抽样50次，得到红、白、黑3种球的个数分别为12，24，14，那么根据多项式的理论，可以建立似然函数为：其中p1，p2，p3分别为

3、口袋中红、白、黑3种球的概率(p3=1p1p2)，它们是需要估计的。143242121)()()(!14!24!12!50pppMaximum likelihood estimation (MLE)第三章贝叶斯学习方法3.1 极大似然估极大似然估计计3.2 贝叶斯学习3.3 朴素贝叶斯方法3.4 贝叶斯网络5对于连续型随机变量，似然函数是每个独立随机观测值的概率密度函数的乘积，则似然函数为：若yi 服从正态分布，则，上式可变为：)；()；()；()；，()(nnyfyfyfyyyLL2121)，(2N)(,)()(212)(2)(221222221222nnyynyyeeeL)1(11

4、)，(Maximum likelihood estimation (MLE)第三章贝叶斯学习方法3.1 极大似然估极大似然估计计3.2 贝叶斯学习3.3 朴素贝叶斯方法3.4 贝叶斯网络6为了计算上的方便，一般将似然函数取对数，称为对数似然函数，因为取对数后似然函数由乘积变为加式，其表达式为：求极大似然估计量可以通过令对数似然函数对总体参数的偏导数等于0来获得，即当，有121lnlnln,nniiLL yyyf yqqq=L，，；( )()()12lqqqq=L，，()，；，(lnkyyyL2121ln121nilikf yqqqq=L；，，，()0由此获得总体参数的极大似然估

5、计量。Maximum likelihood estimation (MLE)第三章贝叶斯学习方法3.1 极大似然估极大似然估计计3.2 贝叶斯学习3.3 朴素贝叶斯方法3.4 贝叶斯网络7设y1 , y2 , , yn是正态总体的随机样本，求正态分布参数的极大似然估计量。 2Nms，()niinniiyyL12222122221exp212exp21)()，(niiynnL1222221ln22ln2ln)()()，(niniynLyL1242221220212ln01ln)()，()()，(niiniiynyyn122111)(Maximum likelihood estimation

6、(MLE)第三章贝叶斯学习方法3.1 极大似然估极大似然估计计3.2 贝叶斯学习3.3 朴素贝叶斯方法3.4 贝叶斯网络8 Use parameterized pdfs, e.g., Gaussian or mixtures of Gaussians Criteria for parameter estimation Maximum Likelihood)|(xp)|(maxargDpnkknpDpp121)|()|()|,(xxxxMaximum likelihood estimation (MLE)第三章贝叶斯学习方法3.1 极大似然估极大似然估计计3.2 贝叶斯学习3.3 朴素贝叶斯

7、方法3.4 贝叶斯网络9 Gaussian case Unknown mean Unkown mean andcovariance matrix Differentiate wrt means and variances, set to zero and solve! Gaussian mixture case Differentiating wrt weights, means, and covariances leads to an iterative algorithm1( | )( |)MllllpNxx11(| )ln(|)NMllililL DN x?第三章贝叶斯学习方法3.1

8、极大似然估极大似然估计计3.2 贝叶斯学习3.3 朴素贝叶斯方法3.4 贝叶斯网络10 Assume there exist hidden or missing data Instead of maximizing the original incomplete likelihood consider the complete likelihood12(,|)(|)npp Xxxxz(,|)p X zExpectation Maximization (EM)第三章贝叶斯学习方法3.1 极大似然估极大似然估计计3.2 贝叶斯学习3.3 朴素贝叶斯方法3.4 贝叶斯网络11(| )( )( )l

9、n (| )ln (|)ln(|)iiiP XLLP XP XP X(| , ) ( | )ln(|)ziP X zP zP XJensens inequality:lnln,1jjjjjjjjyyif(| , ) ( | )( |,)( )( )ln(|)( |,)(| , ) ( | )( |,)ln(|) ( |,)iiziiiziiP X zP zP z XLLP XP z XP X zP zP z XP XP z X( )( )( |)( |)iiiLLl ( )( |)iiLlif 第三章贝叶斯学习方法3.1 极大似然估极大似然估计计3.2 贝叶斯学习3.3 朴素贝叶斯方法3.4

10、贝叶斯网络121|,(| , ) ( | )argmax ( )( |,)ln(|) ( |,)argmax( |,)ln(| , ) ( | )argmaxln(, | )iiiiziiizz XP X zP zLP z XP XP z XP z XP X zP zEP X z1) Maximizes , therefore , Hence, at each iteration, cannot decrease. When EM reaches a fixed point at some (a maximum of ), provided and are differentiable, m

11、ust also be a stationary point of -not necessarily a local maximum.1(|)(|)0iiii 1i( )Li( )li( )L( )l( )LPDF Estimation第三章贝叶斯学习方法3.1 极大似然估极大似然估计计3.2 贝叶斯学习3.3 朴素贝叶斯方法3.4 贝叶斯网络13第三章贝叶斯学习方法3.1 极大似然估极大似然估计计3.2 贝叶斯学习3.3 朴素贝叶斯方法3.4 贝叶斯网络14 The EM algorithm increases the complete likelihood in each iterat

12、ion(1)(1)( ,)ln(,|) |,iizQEp X zX E-step:),(maxarg)1()(iiQM-step:(1)(1)ln (, | )|,ln (, | ) ( |,)iizEp X zXp X zp z Xdz Expectation Maximization (EM)第三章贝叶斯学习方法3.1 极大似然估极大似然估计计3.2 贝叶斯学习3.3 朴素贝叶斯方法3.4 贝叶斯网络15Introduce yi, if sample xi is generated from the k-th component, yi = kWhere 第三章贝叶斯学习方法3.1 极

13、大似然估极大似然估计计3.2 贝叶斯学习3.3 朴素贝叶斯方法3.4 贝叶斯网络16is the given parameterscan be thought of the prior probability of the j-th component , that is 第三章贝叶斯学习方法3.1 极大似然估极大似然估计计3.2 贝叶斯学习3.3 朴素贝叶斯方法3.4 贝叶斯网络17第三章贝叶斯学习方法3.1 极大似然估极大似然估计计3.2 贝叶斯学习3.3 朴素贝叶斯方法3.4 贝叶斯网络18第三章贝叶斯学习方法3.1 极大似然估极大似然估计计3.2 贝叶斯学习3.3 朴素贝叶斯方法

14、3.4 贝叶斯网络19第三章贝叶斯学习方法3.1 极大似然估极大似然估计计3.2 贝叶斯学习3.3 朴素贝叶斯方法3.4 贝叶斯网络20第三章贝叶斯学习方法3.1 极大似然估极大似然估计计3.2 贝叶斯学习3.3 朴素贝叶斯方法3.4 贝叶斯网络21Goal: To determine the most probable hypothesis, given the data D plus any initial knowledge about the prior probabilities of the various hypotheses in H.Prior probability o

15、f h, P(h): it reflects any background knowledge we have about the chance that h is a correct hypothesis (before having observed the data).Prior probability of D, P(D): it reflects the probability that training data D will be observed given no knowledge about which hypothesis h holds.Conditional Prob

16、ability of observation D, P(D|h): it denotes the probability of observing data D hypothesis h. Bayesian decision第三章贝叶斯学习方法3.1 极大似然估计3.2 贝叶斯学习贝叶斯学习 3.2.1 贝叶斯决贝叶斯决策策3.3 朴素贝叶斯方法3.4 贝叶斯网络22 Posterior probability of h, P(h|D): it represents the probability that h holds given the observed training data D

17、. It reflects our confidence that h holds after we have seen the training data D and it is the quantity that Machine Learning researchers are interested in. Bayes Theorem allows us to compute P(h|D):P(h|D)=P(D|h)P(h)/P(D)Bayesian decision第三章贝叶斯学习方法3.1 极大似然估计3.2 贝叶斯学习 3.2.1 贝叶斯决贝叶斯决策策3.3 朴素贝叶斯方法3.4

18、贝叶斯网络23 Goal: To find the most probable hypothesis h from a set of candidate hypotheses H given the observed data D. MAP Hypothesis, hMAP = argmax h H P(h|D) = argmax h H P(D|h)P(h)/P(D) = argmax h H P(D|h)P(h) If every hypothesis in H is equally probable a priori, we only need to consider the likel

19、ihood of the data D given h, P(D|h). Then, hMAP becomes the Maximum Likelihood, hML= argmax h H P(D|h)P(h)Bayesian decision第三章贝叶斯学习方法3.1 极大似然估计3.2 贝叶斯学习 3.2.1 贝叶斯决贝叶斯决策策3.3 朴素贝叶斯方法3.4 贝叶斯网络24给定：m个类，训练样本和未知数据目标：给每个输入数据标记一个类属性两个阶段：建模/学习：基于训练样本学习分类规则. 分类/测试：对输入数据应用分类规则P(f1)f1鹅卵石救命稻草杆Pebbles Strawspeb

20、blesStrawsf2f1决策边界Bayesian decision第三章贝叶斯学习方法3.1 极大似然估计3.2 贝叶斯学习 3.2.1 贝叶斯决贝叶斯决策策3.3 朴素贝叶斯方法3.4 贝叶斯网络25什么是最优分类器?已有:类条件概率密度函数This is called the class-conditional probability describing the probability of occurrence of the features on category.欲求:后验概率make a decision that maximize the conditional prob

21、ability of the object, given certain feature measurements. Also called posterior probability function. )|(iCxp)|(xCpip(x|1)p(x|2)类条件概率密度函数p(1|x)后验概率p(2|x)Bayesian decision第三章贝叶斯学习方法3.1 极大似然估计3.2 贝叶斯学习 3.2.1 贝叶斯决贝叶斯决策策3.3 朴素贝叶斯方法3.4 贝叶斯网络26MAP决策:以后验概率为判决函数:Choose category/class that has the maximumT

22、his produces the optimal performance: minimum probability of error:A classifier that achieves this optimal performance is called Bayesian classifier.)|(maxarg iff xCpkCxiik( , )( | ) ( )ePP e x dxP e x p x dx)|()(xCpxgiiBayesian decision第三章贝叶斯学习方法3.1 极大似然估计3.2 贝叶斯学习 3.2.1 贝叶斯决贝叶斯决策策3.3 朴素贝叶斯方法3.4 贝

23、叶斯网络27Bayesian decision第三章贝叶斯学习方法3.1 极大似然估计3.2 贝叶斯学习 3.2.1 贝叶斯决贝叶斯决策策3.3 朴素贝叶斯方法3.4 贝叶斯网络28决策的风险：做决策要考虑决策可能引起的损失。以医生根据白细胞浓度判断一个人是否患血液病为例：没病(1)被判为有病(2) ，还可以做进一步检查，损失不大；有病(2)被判为无病(1) ，损失严重。Decision Risk tableThe risk to make a decision : classify x (belong to class j) to class i, so:Decision Rule:),(

24、jiC1(| ) (,)(,) (| )ciijijjjRxECCp Cx i)|(minarg iff xRkCxiikBayesian decision第三章贝叶斯学习方法3.1 极大似然估计3.2 贝叶斯学习 3.2.1 贝叶斯决贝叶斯决策策3.3 朴素贝叶斯方法3.4 贝叶斯网络29基于Bayes决策的最优分类器Bayes决策的三个前提：类别数确定各类的先验概率P(Ci)已知各类的条件概率密度函数p(x|Ci)已知问题的转换：基于样本估计P(Ci)和p(x|Ci) 基于样本直接确定判别函数学习问题Bayesian decision第三章贝叶斯学习方法3.1 极大似然估计3.

25、2 贝叶斯学习 3.2.1 贝叶斯决贝叶斯决策策3.3 朴素贝叶斯方法3.4 贝叶斯网络30类的先验概率P(Ci)的估计：用训练数据中各类出现的频率估计依靠经验类条件概率密度p(x|Ci)估计的两种主要方法：参数估计：概率密度函数的形式已知，而表征函数的参数未知，通过训练数据来估计最大似然估计最大后验估计非参数估计：密度函数的形式未知，也不作假设，利用训练数据直接对概率密度进行估计Parzen窗法Bayesian decision第三章贝叶斯学习方法3.1 极大似然估计3.2 贝叶斯学习 3.2.1 贝叶斯决贝叶斯决策策3.3 朴素贝叶斯方法3.4 贝叶斯网络31Bayesian e

26、stimation Bayesian learning considers (the parameter vector to be estimated) to be a random variable. Before we observe the data, the parameters are described by a prior which is typically very broad. Once we observed the data, we can make use of Bayes formula to find posterior. Since some values of

27、 the parameters are more consistent with the data than others, the posterior is narrower than prior. This is Bayesian learning 第三章贝叶斯学习方法3.1 极大似然估计3.2 贝叶斯学习 3.2.1 贝叶斯决策 3.2.2 贝叶斯估贝叶斯估计与预测计与预测3.3 朴素贝叶斯方法3.4 贝叶斯网络32第三章贝叶斯学习方法3.1 极大似然估计3.2 贝叶斯学习 3.2.1 贝叶斯决策 3.2.2 贝叶斯估贝叶斯估计与预测计与预测3.3 朴素贝叶斯方法3.4 贝叶斯网络B

28、ayesian estimation33 Suppose we know the distribution of possible values of that is a prior Suppose we also have a loss function which measures the penalty for estimating when actual value is Then we may formulate the estimation problem as Bayesian decision making: choose the value of which minimize

29、s the risk Note that the loss function is usually continuous.0( ).p( )( ) |( |) ( , )nnRXpXd ( , ) .Bayesian estimation第三章贝叶斯学习方法3.1 极大似然估计3.2 贝叶斯学习 3.2.1 贝叶斯决策 3.2.2 贝叶斯估贝叶斯估计与预测计与预测3.3 朴素贝叶斯方法3.4 贝叶斯网络34Example 1: ( is unidimensional).The total Bayesian risk here:We seek its minimum: ( , ) | ( )(

30、 ) |( |)|nnRXpXd( )( )( ) |( |)( |)nnndRXpXdpXddAt the which is a solution we haveThat is, for the the optimal Bayesian estimator for the parameter is the median of the distribution( )( )( |)()( |)()nnpXdpXd( )( )( |)( |)nnpXdpXd( , ) | ( )( |)npX第三章贝叶斯学习方法3.1 极大似然估计3.2 贝叶斯学习 3.2.1 贝叶斯决策 3.2.2 贝叶斯估

31、贝叶斯估计与预测计与预测3.3 朴素贝叶斯方法3.4 贝叶斯网络Bayesian estimation35Example 2: (squared error).Total Bayesian risk:Again, in order to find the minimum, let the derivative be equal 0:2( , )() ( )( )2 |( |)()nnRXpXd( )( )( )( )( ) |( |)2()2( |)2( |)2 |0nnnnndRXpXddpXdpXdEX The optimal estimator here is the condition

32、al expectation of given the data X(n) .第三章贝叶斯学习方法3.1 极大似然估计3.2 贝叶斯学习 3.2.1 贝叶斯决策 3.2.2 贝叶斯估贝叶斯估计与预测计与预测3.3 朴素贝叶斯方法3.4 贝叶斯网络Bayesian estimation36Maximum A-Posteriori (MAP) Estimation( )( )0( )01( )0( ) (| )argmax( |)argmax()( )(| )argmax(|)( )nnnniinpp XpXp Xpp xp Xpd 第三章贝叶斯学习方法3.1 极大似然估计3.2 贝叶斯学习

33、3.2.1 贝叶斯决策 3.2.2 贝叶斯估贝叶斯估计与预测计与预测3.3 朴素贝叶斯方法3.4 贝叶斯网络3701010101argmax( )(| )argmax log( )(| )argmax log( )log(| )argmax log( )log(| )niiniiniiniipp xpp xpp xpp x So, the we are looking for is log is monotonically increasing)Maximum A-Posteriori (MAP) EstimationRegularizationLikelihoodIn MAP estimat

34、or, the larger n (the size of the data), the less important the prior is. It can motivate us to omit the prior.0log( )p第三章贝叶斯学习方法3.1 极大似然估计3.2 贝叶斯学习 3.2.1 贝叶斯决策 3.2.2 贝叶斯估贝叶斯估计与预测计与预测3.3 朴素贝叶斯方法3.4 贝叶斯网络38Example 3:0 |( , )1 | (0-1 loss)( )( ) |( |) ( , )nnRXpXd ( )1( |) ( )npXV ( )( )min |max( |)n

35、nRXpXHence( )max( |)nMAPpX( )( |)npXMaximum A-Posteriori (MAP) Estimation第三章贝叶斯学习方法3.1 极大似然估计3.2 贝叶斯学习 3.2.1 贝叶斯决策 3.2.2 贝叶斯估贝叶斯估计与预测计与预测3.3 朴素贝叶斯方法3.4 贝叶斯网络39 Density function for x, given the training data set From the definition of conditional probability densities The first factor is independen

36、t of X(n) since it just our assumed form for parameterized density. Therefore ( )( )( )( ,|)( | ,) ( |).nnnpXpXpXxx( )( | ,)( | )npXpxx( )( )( |)( | ) ( |)nnpXppXdxx( )1 ,.,nNX xx( )( )( |)( ,|)nnpXpXdxxBayesian learning第三章贝叶斯学习方法3.1 极大似然估计3.2 贝叶斯学习 3.2.1 贝叶斯决策 3.2.2 贝叶斯估贝叶斯估计与预测计与预测3.3 朴素贝叶斯方法3.4

37、贝叶斯网络( )( )( ) (| )()nnpp Xp X40 Instead of choosing a specific value for , the Bayesian approach performs a weighted average over all values of . If the weighting factor , which is a posterior of peaks very sharply about some value we obtain . ( )( |)npX( )( |)( | )npXpxxBayesian learning第三章贝叶斯学习方

38、法3.1 极大似然估计3.2 贝叶斯学习 3.2.1 贝叶斯决策 3.2.2 贝叶斯估贝叶斯估计与预测计与预测3.3 朴素贝叶斯方法3.4 贝叶斯网络( )( )( |)( | ) ( |)nnpXppXdxx( )( )( ) (| )()nnpp Xp X41Prediction based on three estimations第三章贝叶斯学习方法3.1 极大似然估计3.2 贝叶斯学习 3.2.1 贝叶斯决策 3.2.2 贝叶斯估贝叶斯估计与预测计与预测3.3 朴素贝叶斯方法3.4 贝叶斯网络42朴素贝叶斯学习模型（NB ）将训练实例表示成属性(特征)向量A和决策类别变量C。假定

39、特征向量的各分量间相对于决策变量是相对独立的，也就是说各分量独立地作用于决策变量。降低了学习的复杂性在许多领域，表现出相当的健壮性和高效性NB的特点结构简单只有两层结构推理复杂性与网络节点个数呈线性关系Ca1a2an-1an1(|)(|)nikikP A CP aCNave Bayesian classifier第三章贝叶斯学习方法3.1 极大似然估计3.2 贝叶斯学习 3.2.1 贝叶斯决策 3.2.2 贝叶斯估计与预测3.3 朴素贝叶斯朴素贝叶斯方法方法3.4 贝叶斯网络43NB假设：设样本A表示成属性向量，如果属性ak对于给定的类别独立，那么P(A|Ci)可以分解成几个分量的积

40、：简单贝叶斯分类 (SBC:Simple Bayesian Classification)一般认为，只有在独立性假定成立的时候，SBC才能获得精度最优的分类效率；或者在属性相关性较小的情况下，能获得近似最优的分类效果。)|()()()|(1mjikiiCaPAPCPACP)|(ikCaP)(iCP1(|)(|)nikikP A CP aCNave Bayesian classifier第三章贝叶斯学习方法3.1 极大似然估计3.2 贝叶斯学习 3.2.1 贝叶斯决策 3.2.2 贝叶斯估计与预测3.3 朴素贝叶斯朴素贝叶斯方法方法3.4 贝叶斯网络44R Saw “Return of the

41、 King” more than onceZ Live in zipcode 15213C Brought Coat to ClassroomJ Person is a JuniorWhat parameters arestored in the CPTs ofthis Bayes Net?Nave Bayesian classifier第三章贝叶斯学习方法3.1 极大似然估计3.2 贝叶斯学习 3.2.1 贝叶斯决策 3.2.2 贝叶斯估计与预测3.3 朴素贝叶斯朴素贝叶斯方法方法3.4 贝叶斯网络45R Saw “Return of the King” more than onceZ L

42、ive in zipcode 15213C Brought Coat to ClassroomJ Person is a JuniorP(J) =P(C|J) =P(C|J)=P(Z|J) =P(Z|J)=P(R|J) =P(R|J)=Suppose we have a database from 20 people who attended a lecture. How could we use that to estimate the values in this CPT?#people who walked to school#people in database#people who

43、didnt walk to school, and brought a coat#people who didnt walk to school#people who walked to school and brought a coat#people who walked to schoolNave Bayesian classifier第三章贝叶斯学习方法3.1 极大似然估计3.2 贝叶斯学习 3.2.1 贝叶斯决策 3.2.2 贝叶斯估计与预测3.3 朴素贝叶斯朴素贝叶斯方法方法3.4 贝叶斯网络46R Saw “Return of the King” more than onceZ

44、Live in zipcode 710000C Brought Coat to ClassroomJ Person is a JuniorP(J) =P(C|J) =P(C|J)=P(Z|J) =P(Z|J)=P(R|J) =P(R|J)=A new person shows up at class wearing an “I live in Baoji city where I saw all the Lord of The Rings Movies every night” overcoat.What is the probability that they are a Junior?In

45、put AttributesOuput AttributesNave Bayesian classifier第三章贝叶斯学习方法3.1 极大似然估计3.2 贝叶斯学习 3.2.1 贝叶斯决策 3.2.2 贝叶斯估计与预测3.3 朴素贝叶斯朴素贝叶斯方法方法3.4 贝叶斯网络47( , , )( |, )( , )P J CZ RP J CZ RP CZ R( , , )(| ) (| ) (| ) ( )P J CZ RP C J PZ J P R J P J( , )( , , )(, , )(|) (|) (|) ( )(|) (|) (|) ()P CZ RP J CZ RPJ CZ

46、RP C J PZ J P R J P JP CJ PZJ P RJ PJNave Bayesian classifier第三章贝叶斯学习方法3.1 极大似然估计3.2 贝叶斯学习 3.2.1 贝叶斯决策 3.2.2 贝叶斯估计与预测3.3 朴素贝叶斯朴素贝叶斯方法方法3.4 贝叶斯网络481. Estimate P(Y=v) as fraction of records with Y=v2. Estimate P(Xi=u | Y=v) as fraction of “Y=v” records that alsohave X=u.3. To predict the Y value give

47、n observations of all the Xi values,compute11argmax(|,)predictmmvYP Yv XuXuThe General CaseNave Bayesian classifier第三章贝叶斯学习方法3.1 极大似然估计3.2 贝叶斯学习 3.2.1 贝叶斯决策 3.2.2 贝叶斯估计与预测3.3 朴素贝叶斯朴素贝叶斯方法方法3.4 贝叶斯网络4911argmax(|,)predictmmvYP Yv XuXu1111(,)argmax(,)mmpredictvmmP Yv XuXuYP XuXu11argmax(,)predictmmvYP

48、 Yv XuXu11argmax(,|) ()predictmmvYP XuXuYv P Yv1argmax()(|)mpredictjjvjYP YvP XuYvNave Bayesian classifier第三章贝叶斯学习方法3.1 极大似然估计3.2 贝叶斯学习 3.2.1 贝叶斯决策 3.2.2 贝叶斯估计与预测3.3 朴素贝叶斯朴素贝叶斯方法方法3.4 贝叶斯网络50124536145263通俗的说，图是由一些点和连接这些点的线组成的。Node, edge图论主要在离散数学中使用，应用于计算机相关科学图描述复杂系统成分之间的关系。图G有两个集合组成：非空的结点集V和有限的边集E若

49、集合V中的结点组成点对，或者说是图的边则称这两个节点邻接邻接，否则称为非邻接非邻接。ijvv和( ,)ijv vEBayesian network第三章贝叶斯学习方法3.1 极大似然估计3.2 贝叶斯学习 3.2.1 贝叶斯决策 3.2.2 贝叶斯估计与预测3.3 朴素贝叶斯方法3.4 贝叶斯网络贝叶斯网络51 有向图(directed graph) 非空有穷集合V，，称G=(V,E)为一个有向图。其中V中的元素称为顶点顶点(vertex)，V称为顶点集顶点集；是有方向的，称为从顶点v1到顶点v2的有有向边向边，或者叫做弧弧(arc)，E为有向边集；有向边集； v1称为前导， v2

50、称为后继 EVV12( ,)v vE124536V = 1, 2, 3, 4, 5, 6, 7 ，| V | = 7E = (1,2), (2,2), (2,4), (4,5), (4,1), (5,4),(6,3) ，| E | = 7自循环7孤立点Bayesian network第三章贝叶斯学习方法3.1 极大似然估计3.2 贝叶斯学习 3.2.1 贝叶斯决策 3.2.2 贝叶斯估计与预测3.3 朴素贝叶斯方法3.4 贝叶斯网络贝叶斯网络52 无向图(undirected graph) 非空有穷集合V，，称G=(V,E)为一个有向图。其中V中的元素称为顶点顶点，V称为顶点集顶点集；

51、是无方向的，称从无向边无向边，E为无向边集；无向边集；和是一样的 EVV12 ,v vE12 ,v v21 , v vADEFBCV = A, B, C, D, E, F |V | = 6E = A, B, A,E, B,E, C,F |E | = 4Bayesian network第三章贝叶斯学习方法3.1 极大似然估计3.2 贝叶斯学习 3.2.1 贝叶斯决策 3.2.2 贝叶斯估计与预测3.3 朴素贝叶斯方法3.4 贝叶斯网络贝叶斯网络53 Directed acyclic graph Nodes are variables (discrete or continuous) Arc

52、s indicate dependence between variables Conditional Probabilities (local distributions) Missing arcs implies conditional independence Independencies + local distributions = joint distributionX1X2X3),|()|()(213121xxxpxxpxp),(321xxxpBayesian network第三章贝叶斯学习方法3.1 极大似然估计3.2 贝叶斯学习 3.2.1 贝叶斯决策 3.2.2 贝叶斯估

53、计与预测3.3 朴素贝叶斯方法3.4 贝叶斯网络贝叶斯网络54WetGrassCloudyRainSprinklerBayesian network (example)55变量-阴天(C), 下雨(R), 洒水(S), 草地湿(W)CRSWProb.FFFF0.01FFFT0.04FFTF0.05FFTT0.01FTFF0.02FTFT0.07FTTF0.2FTTT0.1TFFF0.01TFFT0.07TFTF0.13TFTT0.04TTFF0.06TTFT0.05TTTF0.1TTTT0.0524-1 独立参数模型对草地是否是潮湿的进行建模如果下列条件满足，草地是湿的下雨洒水车洒水是否

54、下雨依赖于是否阴天如果阴天则不太可能洒水Bayesian network (example)第三章贝叶斯学习方法3.1 极大似然估计3.2 贝叶斯学习 3.2.1 贝叶斯决策 3.2.2 贝叶斯估计与预测3.3 朴素贝叶斯方法3.4 贝叶斯网络贝叶斯网络56WetGrassCloudyRainSprinklerWSRFTFF10FT0.10.9TF0.10.9TT0.010.99SCFTF0.50.5T0.90.1RCFTF0.80.2T0.20.8CFT0.50.59 独立参数Bayesian network (example)第三章贝叶斯学习方法3.1 极大似然估计3.2 贝叶斯学习

55、3.2.1 贝叶斯决策 3.2.2 贝叶斯估计与预测3.3 朴素贝叶斯方法3.4 贝叶斯网络贝叶斯网络57You have a new burglar alarm installedIt is reliable about detecting burglary, but responds to minor earthquakesTwo neighbors (John, Mary) promise to call you at work when they hear the alarmJohn always calls when hears alarm, but confuses alarm w

56、ith phone ringing (and calls then also)Mary likes loud music and sometimes misses alarm!Given evidence about who has and hasnt called, estimate the probability of a burglaryBayesian network (example)第三章贝叶斯学习方法3.1 极大似然估计3.2 贝叶斯学习 3.2.1 贝叶斯决策 3.2.2 贝叶斯估计与预测3.3 朴素贝叶斯方法3.4 贝叶斯网络贝叶斯网络58Im at work, John

57、calls to say my alarm is ringing, Mary doesnt call. Is there a burglary?5 Variables Network topology reflects causal knowledgeBayesian network (example)第三章贝叶斯学习方法3.1 极大似然估计3.2 贝叶斯学习 3.2.1 贝叶斯决策 3.2.2 贝叶斯估计与预测3.3 朴素贝叶斯方法3.4 贝叶斯网络贝叶斯网络59l In order for a Bayesian network to model a probability distrib

58、ution, the following must be true by definition: Each variable is conditionally independent of all its non-descendants in the graph given the value of all its parents.l This implies11(,)(|()nniiiP XXP Xparents XBayesian network第三章贝叶斯学习方法3.1 极大似然估计3.2 贝叶斯学习 3.2.1 贝叶斯决策 3.2.2 贝叶斯估计与预测3.3 朴素贝叶斯方法3.4 贝

59、叶斯网络贝叶斯网络60Global and local semantics Global semantics defines the full joint distribution as the product of the local conditional distributions For defining this product, a linear ordering of the nodes of the network has to be given: X1 Xn P(X1 Xn) = ni=1 P(Xi|Parents(Xi) ordering in the example: B

60、, E, A, J, M P(J M A B E) = P (B) P ( E)P (A|B E)P (J|A)P (M|A) 第三章贝叶斯学习方法3.1 极大似然估计3.2 贝叶斯学习 3.2.1 贝叶斯决策 3.2.2 贝叶斯估计与预测3.3 朴素贝叶斯方法3.4 贝叶斯网络贝叶斯网络61 Local semantics defines a series of statements of conditional independence Each node is conditionally independent of its nondescendants given its paren

人人文库> 全部分类> 教育资料 > 课件下载

温馨提示

1. 本站所有资源如无特殊说明，都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
2. 本站的文档不包含任何第三方提供的附件图纸等，如果需要附件，请联系上传者。文件的所有权益归上传用户所有。
3. 本站RAR压缩包中若带图纸，网页内容里面会有图纸预览，若没有图纸预览就没有图纸。
4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
5. 人人文库网仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对用户上传分享的文档内容本身不做任何修改或编辑，并不能对任何下载内容负责。
6. 下载文件中如有侵权或不适当内容，请与我们联系，我们立即纠正。
7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

贝叶斯学习综述

文档简介

温馨提示

最新文档

评论

贝叶斯学习综述

文档简介

温馨提示

最新文档

评论

相关文档