HammersleyClifford定理_第1页
HammersleyClifford定理_第2页
HammersleyClifford定理_第3页
HammersleyClifford定理_第4页
HammersleyClifford定理_第5页
已阅读5页,还剩8页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

1、Hammersley Clifford定理Hammersley Clifford theorem The Hammersley Clifford theorem is aresult in probability theory,mathematical statistics and statistical mechanics,that gives necessary and sufficient conditions under which apositive probability distribution can be represented as aMarkov network(also

2、 known as aMarkov random field).It states that aprobability distribution that has apositive mass or density satisfies one of the Markov properties with respect to an undirected graph Gif and only if it is aGibbs random field,that is,its density can be factorized over the cliques(or complete subgraph

3、s)of the graph.The relationship between Markov and Gibbs random fields was initiated by Roland Dobrushin1and Frank Spitzer2in the context of statistical mechanics.The theorem is named after John Hammersley and Peter Clifford who proved the equivalence in an unpublished paper in 1971.34Simpler proofs

4、 using the inclusion-exclusion principle were given independently by Geoffrey Grimmett,5Preston6and Sherman7in 1973,with afurther proof by Julian Besag in 1974.8NotesDobrushin,P.L.(1968),The Description of aRandom Field by Means of Conditional Probabilities and Conditions of Its Regularity,Theory of

5、 Probability and its Applications 13(2):197 224,doi:10.1137/1113026,Spitzer,Frank(1971),Markov Random Fields and Gibbs Ensembles,The American Mathematical Monthly 78(2):142 154,doi:10.2307/2317621,JSTOR 2317621,Hammersley,J.M.;Clifford,P.(1971),Markov fields on finite graphs and lattices,Clifford,P.

6、(1990),Markov random fields in statistics,in Grimmett,G.R.;Welsh,D.J.A.,Disorder in Physical Systems:A Volume in Honour of John M.Hammersley,Oxford University Press,pp.19 32,ISBN 0198532156,MR 1064553,retrieved 2009-05-04Grimmett,G.R.(1973),A theorem about random fields,Bulletin of the London Mathem

7、atical Society 5(1):81 84,doi:10.1112/blms/5.1.81,MR 0329039Preston,C.J.(1973),Generalized Gibbs states and Markov random fields,Advances in Applied Probability 5(2):242 261,doi:10.2307/1426035,JSTOR 1426035,MR 0405645.JSTOR 1426035,Sherman,S.(1973),Markov random fields and Gibbs random fields,Israe

8、l Journal of Mathematics 14(1):92 103,doi:10.1007/BF 02761538,MR 0321185Besag,J.(1974),Spatial interaction and the statistical analysis of lattice systems,Journal of the Royal Statistical Society.Series B(Methodological)36(2):192 236,MR 0373208.JSTOR 2984812 Further reading Bilmes,Jeff(Spring 2006),

9、Handout 2:Hammersley Clifford,course notes from University of Washington course.Grimmett,Geoffrey,Probability on Gr aphs,Chapter 7,Helge,The Hammersley Clifford Theorem and its Impact on Modern Statistics,probability-related article is astub.You can help Wikipedia by expanding it.Retrieved from Clif

10、ford_theoremFrom Wikipedia,the free encyclopediaThe first afternoon of the memorial session for Julian Besag in Bristol was an intense and at times emotional moment,where friends and colleagues of Julian shared memories and stories.This collection of tributes showed how much of alarger-than-life cha

11、racter he was,from his long-termed and wide-ranged impact on statistics to his very high expectations,both for himself and for others,leading to atotal and uncompromising research ethics,to his passion forextremesports and outdoors.(The stories during and after diner were of amore personal nature,bu

12、t at least as much enjoyable!)The talks on the second day showed how much and how deeply Julian had contributed to spatial statistics and agricultural experiments,to pseudo-likelihood,to Markov random fields and image analysis,and to MCMC methodology and practice.I hope Idid not botch too much my pr

13、esentation on the history of MCMC,while Ifound reading through the 1974,1986 and 1993 Read Papers and their discussions an immensely rewarding experiment(I wish Ihad done prior to completing our Statistical Science paper,but it was bound to be incomplete by nature!).Some interesting links made by th

14、e audience were the prior publication of proofs of the Hammersley-Clifford theorem in 1973(by Grimmet,Preston,and Steward,respectively),as well as the proposal of aGibbs sampler by Brian Ripley as early as 1977(even though Hastings did use Gibbs steps in one of his examples).Christophe Andrieu also

15、pointed out to me avery early Monte Carlo review by John Halton in the 1970 SIAM Rewiew,review that Iwill read(and commment)as soon as possible.Overall,I am quite glad Icould take part in this memorial and Iam grateful to both Peters for organising it as afitting tribute to Julian.Markov Chain Monte

16、 Carlo(MCMC)methods are currently avery active field of research.MCMC methods are sampling methods,based on Markov Chains which are ergodic with respect to the target probability measure.The principle of adaptive methods is to optimize on the fly some design parameters of the algorithm with respect

17、to agiven criterion reflecting the samplers performance(opti mize the acceptance rate,optimize an importance sampling function,etc).A postdoctoral position is opened to work on the numerical analysis of adaptive MCMC methods:convergence,numerical efficiency,development and analysis of new algorithms

18、.A particular emphasis will be given to applications in statistics and molecular dynamics.(Detailed description)Position funded by the French National Research Agency(ANR)through the 2009-2012 project ANR-08-BLAN-0218.The position will benefit from an interdisciplinary environment involving numerica

19、l analysts,statisticians and probabilists,and of strong interactions between the partners of the project ANR-08-BLAN-021 In the most recent issue of Statistical Science,the special topic isCelebrating the EM Algorithms Quandunciacentennial.It contains an historical survey by Martin Tanner and Wing W

20、ong on the emergence of MCMC Bayesian computation in the 1980s,This survey is more focused and more informative than our global history(also to appear in Stati stical Science).In particular,it provides the authorsanalysis as to why MCMC was delayed by ten years or so(or even more when considering th

21、at aGibbs sampler as asimulation tool appears in both Hastings(1970)and Besags(1974)papers).They dismissourconcerns about computing power(I was running Monte Carlo simulations on my Apple IIe by 1986 and asingle mean square error curve evaluation for aJames-Stein type estimator would then take close

22、 to aweekend!)and Markov innumeracy,rather attributing the reluctance to alack of confidence into the method.This perspective remains debatable as,apart from Tony OHagan who was then fighting again Monte Carlo methods as being un-Bayesian(1987,JRSS D),I do not remember any negative attitude at the t

23、ime about simulation and the immediate spread of the MCMC methods from Alan Gelfands and Adrian Smiths presentations of their 1990 paper shows on the opposite that the Bayesian community was ready for the move.Another interesting point made in this historical survey is that Metropolisand other Marko

24、v chain methods were first presented outside simulation sections of books like Hammersley and Handscomb(1964),Rubinstein(1981)and Ripley(1987),perpetuating the impression that such methods were mostly optimisation or niche specific methods.This is also why Besags earlier works(not mentioned in this

25、survey)did not get wider recognition,until later.Something Iwas not aware is the appearance of iterative adap tive importance sampling(i.e.population Monte Carlo)in the Bayesian literature of the 1980s,with proposals from Herman van Dijk,Adrian Smith,and others.The appendix about Smith et al.(1985),

26、the 1987 special issue of JRSS D,and the computation contents of Valencia 3(that Isadly missed for being in the Army!)is also quite informative about the perception of computational Bayesian statistics at this time.A missing connection in this survey is Gilles Celeux and Jean Diebolts stochastic EM(

27、or SEM).As early as 1981,with Michel Broniatowski,they proposed asimulated version of EM for mixtures where the latent variable zwas simulated from its conditional distribution rather than replaced with its expectation.So this was the first half of the Gibbs sampler for mixtures we completed with Je

28、an Diebolt about ten years later.(Also found in Gelman and King,1990.)These authors did not get much recognition from the community,though,as they focused almost exclusively on mixtures,used simulation to produce arandomness that would escape the local mode attraction,rather than targeting the poste

29、rior distribution,and did not analyse the Markovian nature of their algorithm until later with the simulated annealing EM algorithm.Share:Share概率图模型分为有向和无向的模型。有向的概率图模型主要包括贝叶斯网络和隐马尔可夫模型,无向的概率图模型则主要包括马尔可夫随机场模型和条件随机场模型。2001年,卡耐基.梅隆大学的Lafferty教授(John Lafferty,Andrew McCallum,Fernando Pereira)等针对序列数据处理提出

30、了CRF模型(Conditional Random Fields Probabilistic Models for Segmenting and Labeling Sequence Data)。这种模型直接对后验概率建模,很好地解决了MRF模型利用多特征时需要复杂的似然分布建模以及不能利用观察图像中上下文信息的问题。Kumar博士在2003年将CRF模型扩展到2-维格型结构,开始将其引入到图像分析领域,吸引了学术界的高度关注。对给定观察图像,估计对应的标记图像y观察图像,x未知的标记图像1.如果直接对后验概率建模(即考虑公式中的第一项),可以得到判别的(Discriminative)概率框架。

31、特别地,如果后验概率直接通过Gibbs分布建模,(x,y)称为一个CRF,得到的模型称为判别的CRF模型。2.通过对(x,y)的联合建模(即考虑公式中的第二项),可以得到联合的概率框架?。特别地,如果考虑双随机场(x,y)的马尔可夫性,即公式的第二项为Gibbs分布,那么(x,y)被称为一个双MRF(Pairwise MRF,PMRF)9。3.后验概率通过公式所示的p(x)和p(y|x)建模,其中p(y|x)为生成观察图像的模型,因此这种框架称为生成的(Generative)概率框架。特别地,如果先验p(x)服从Gibbs分布,x称为一个MRF12,得到的模型称为生成的MRF模型。-【面向图像

32、标记的随机场模型研究】运用Hammersley-Clifford定理,标记场的后验概率服从Gibbs分布其中,z(y,)为归一化函数,c为定义在基团c上的带有参数的势函数。CRF模型中一个关键的问题是定义合适的势函数。因此发展不同形式的扩展CRF模型是当前CRF模型的一个主要研究方向。具体的技术途径包括:一是扩展势函数。通过引进更复杂的势函数,更多地利用多特征和上下文信息;二是扩展模型结构。通过引入更复杂的模型结构,可以利用更高层次、更多形式的上下文信息。扩展势函数(1)对数回归(Logistic Regression,LR)(2)支持向量机(Support Vector Machine,SV

33、M)(3)核函数(4)Boost(5)Probit扩展模型结构(1)动态CRF模型动态CRF(Dynamic CRF,DCRF)模型用于对给定的观测数据,同时进行多个标记任务,以此充分利用不同类型标记之间的相关性。(2)隐CRF模型CRF模型的另一类扩展图结构是在观察图像和标记图像之间引入过渡的隐变量层h,得到的模型称为隐CRF(Hidden Conditional Random Field,HCRF)。隐含层的引入使CRF模型具有更丰富的表达能力,可以对一些子结构进行建模。隐变量可以是抽象的,也可以具有明确的物理意义。(3)树结构CRF模型CRF模型的标准图结构中,标记之间的相关性通过格型结

34、构的边(edge)表示。(4)混合CRF模型1.Markov假设有限历史以及平稳。有限历史指的是和有限的历史相关平稳指的是两个状态的关系和时间无关。2.HMM给定观察序列O1,O2,O3.,每个观察Oi对应隐状态序列S1,S2.Sn。HMM解决三个问题:1.计算观察序列的概率利用forward算法即可2.跟定观察序列,计算出对应概率最大的隐状态序列Viterbi算法,提供O(N*N*T)的复杂度3.给定观察序列以及状态集合,估计参数A(状态转移矩阵)B(发射概率)EM算法,forward-backword算法问题2类似序列标注的问题Pr(O|S)=p(O1|S1)*p(O2|S2).p(On|

35、Sn)P(O)=p(O1|start)*p(O2|O1).p(On|On-1)P(S|O)=argmaxPr(O|S)P(O)=argmax(.p(Oi|Si)*p(Si|Si-1).)ME:分类器,将给定的观察值O进行分类。ME需要从O中提取出相关的Feature以及计算对应w。注意:主要解决的是观察值O分类问题,如文本分类d那个P(C=c|O)MEMM:序列标注问题,综合ME和HMM,提供更多的Featrue,优于HMM考虑到t时间附近观察以及状态对其影响。P(S|O)=argmax(P(S|O)=argmax(.p(Si|O,Si-1).),其中O可以是Oi,也可以是Oi-1等观察。最大

36、熵模型Maximum Entropy现从一个简单例子看起:比如华盛顿和维吉利亚都可以作人名和地名,而从语料中只知道p(人名)=0.6,那么p(华盛顿=人名)的概率为多少比较好呢?一个直观的想法就是p(华盛顿=人名)=0.3。为什么呢?这就是在满足已有证据的情况下不做任何其他假设,也就是熵最大,这就是最大熵模型的原理。现在来看模型的定义:首先,明确模型的目标:给定一个上下文x,估计p(y|x)接着,从训练样本中我们可以得到一串标注过的样本(x_i,y_i),其中x_i为上下文,y_iin Y为类别然后构造特征函数f(x,y)=1如果x,y满足一些条件,比如x=记者*,y=人名0 otherwis

37、e注意x是一个上下文,是个向量,而不是单个词(最大熵模型里的特征概念不同于模式识别里的特征,这里的特征即特征函数,通常是二值函数,也有直接定义成实数的,比如jeon-sigir06里直接把f定义为KDE距离,不是很明白那样定义的好处。)于是模型的约束就是对于所有的特征函数模型中的期望等于样本的期望,即E_p(f)=E_tilde p(f)其中E_p(f)=sum_x,yp(x,y)f(x,y)=sum_x,yp(x)p(y|x)f(x,y)approxsum_x,ytilde p(x)p(y|x)f(x,y)tilde p(f)=sum_x,ytilde p(x,y)f(x,y),并且对于任意

38、的x:sum_y p(y|x)=1而模型的熵为H(p)=-sum_x,ytilde p(x)p(y|x)log p(y|x)在满足约束的情况下,使熵最大,于是问题即求p*=argmax_pin P-sumx,yp(y|x)tilde p(x)log p(y|x)where P=p(y|x)|all f_i:sum_x,yp(y|x)tilde p(x)f_i(x,y)=sum_x,ytilde p(x,y)f_i(x,y),all x:sum_y p(y|x)=1可以证明,模型的最优解的形式为p(y|x)=exp(sum_ilambda_i f_i(x,y)/Zx where Zx=sum_y

39、 exp(sum_ilambda_i f_i(x,y)具体证明请见拜下qxred大牛隐马尔可夫模型Hidden Markov Model马尔可夫模型实际上是个有限状态机,两两状态间有转移概率;隐马尔可夫模型中状态不可见,我们只能看到输出序列,也就是每次状态转移会抛出个观测值;当我们观察到观测序列后,要找到最佳的状态序列。设O为观测值,x为隐变量,那么模型要找到让P(O)最大的最佳隐藏状态,而P(O)=sum_x P(O|X)P(X)而其中P(X)=p(x_1)p(x_2.n|x_1)=p(x_1)p(x_2|x_1)p(x_3.n|x_1,x_2)根据x_i只与x_i-1相关的假设有P(X)=

40、p(x_1)p(x_2|x_1)p(x_3|x_2)而类似的P(O|X)=p(o_1|x_1.n)p(o_2.n|o_1x_1.n)=p(o_1|x_1.n)p(o_2|o_1x_1.n)p(o_3.n|o_1,2,x_1.n)根据o_i只与x_i有关的假设有P(O|X)=p(o_1|x_1)p(o_2|x_2)合起来就是P(O)=sum_x p(x_1)p(x_2|x_1)p(o_1|x_1)p(x_3|x_2)p(o_2|x_2)定义向前变量alpha_i(t)为t时刻以状态S_i结束时的总概率alpha_j(t)=sum_i=1Nalpha_ip(x_t=j|x_t-1=i)p(o_t=

41、i|x_t=i)定义向后变量beta_i(t)为给定当前状态S_i和t时刻情况下观测序列中剩余部分的概率和beta_i(t)=sum_j=1Np(x_t=j|x_t+1=i)p(o_t=i|x_t=i)beta_j(t+1)于是观测序列的概率为P(O,X_t=i)=alpha_i(t)beta_i(t)最佳状态可以由动态规划得到模型参数可以由EM算法得到EM具体请见再拜qxred大牛最大熵隐马Maximum Entropy Markov Model HMM的缺点是根据观测序列决定状态序列,是用联合模型解决条件问题;另外,几乎不可能枚举所有所有可能的观测序列。而MEMM解决了这些问题。首先,ME

42、MM和MM或HMM都有本质不同,MEMM估计的是P(S|O),而MM估计的是P(S),HMM估计的都是P(O)。P(S|O)=P(s_1|O)P(s_2.n|s_1,O)=P(s_1|O)P(s_2|s_1,O)P(s_3.n|s_1,s_2,O)然后根据假设有P(S|O)=P(s_1|O)P(s_2.n|s_1,O)=P(s_1|o_1)P(s_2|s_1,o_2)P(s_3.n|s_1,s_2,o_3)重新定义特征函数:a=b,r b是指示函数用于指示当前观测r是状态值f_a(o_t,S_t)=1 if b(o_t)is true and s_t=r于是约束变为E_a=sum_k=1m_ssum_sin SP(s|s,o_k)f_a(o_k,s)/m_s=sum_k=1m_sf_a(o_k,s_k)=F_a这个目标函数和ME的目标函数实质是一样的于是解的形式为P(s|s,o)=exp(sum_alambda_a f_a(o,s)/Z(o,s)然后依然采用HMM中的前向后向变量,寻找最佳序列而实际上得到的序列是由计算P(s|o)=P(s_0)P(s_1|s_0,o_0)P(s_2|s_1,o_1)得到条件随机场Conditional Random Fields MEMM其实是用

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论