Gaussian Mixture Model (GMM):高斯混合模型(GMM).doc_第1页
Gaussian Mixture Model (GMM):高斯混合模型(GMM).doc_第2页
Gaussian Mixture Model (GMM):高斯混合模型(GMM).doc_第3页
Gaussian Mixture Model (GMM):高斯混合模型(GMM).doc_第4页
Gaussian Mixture Model (GMM):高斯混合模型(GMM).doc_第5页
已阅读5页,还剩60页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

Gaussian Mixture Model (GMM)andHidden Markov Model (HMM)Samudravijaya KTata Institute of Fundamental Research, Mumbaichieftifr.res.in09-JAN-20091 of 88Pattern RecognitionModelTrainingGenerationInputSignalProcessingTestingPatternOutputMatchingGMM: static patternsHMM: sequential patterns 2 of 88Basic ProbabilityJoint and Conditional probabilityp(A, B) = p(A|B) p(B) = p(B|A) p(A)Bayes rulep(A|B) =If Ais are mutually exclusive events,p(B) =ip(A|B) = p(B|A) p(A)p(B)p(B|Ai) p(Ai)p(B|A) p(A)i p(B|Ai) p(Ai)3 of 88Normal DistributionMany phenomenon are described by Gaussian pdfp(x|) =2211exp 22 (x )2(1)pdf is parameterised by = , 2 where mean = and variance=2.A convenient pdf : second order statistics is sucient.Example: Heights of Pygmies Gaussian pdf with = 4f t & std-dev() = 1f tOR: Heights of bushmen Gaussian pdf with = 6f t & std-dev() = 1f tQuestion:If we arbitrarily pick a person from a population what is the probability of the height being a particular value? 4 of 88If I pick arbitrarily a Pygmy, say x, thenPr(Height of x=41”) = 2.111exp 2.1(41 4)2Note: Here mean and variances are xed, only the observations, x, change.Also see: P r(x = 41) P r(x = 5) P r(x = 41) P r(x = 3)5 of 88(2)ANDConversely: Given a persons height is 41 Person is more likely to be a pygmy than bushman.If we observe heights of many persons say 36, 41, 38, 45, 47, 4, 65and all are from same population (i.e. either pygmy or bushmen.) then more certain we are that the population is pygmy.More the observations better will be our decision 6 of 88Likelihood Functionx0, x1, . . . , xN 1 set of independent observations from pdf parameterised by .Previous Example: x0, x1, . . . , xN 1 are heights observed and is the mean of density which is unknown (2 assumed known).NL(X; ) = p(x0 . . . xN 1; ) =i=0p(xi; )=1(22)N |21exp 22Ni=0(xi )2(3)L(X; ) is a function of and is called Likelihood FunctionGiven: x0 . . . xN 1, what can we say about value of , i.e. best estimate of . 7 of 88Maximum Likelihood EstimationExample: We know height of a person x0 = 44.Most likely to have come from which pdf = 3, 46 or 6 ?Maximum of L(x0; = 3), L(x0; 46) and L(x0; = 6) choose = 46.If is just a parameter, we will choose arg max L(x0; ). 8 of 88Maximum Likelihood EstimatorGiven x0, x1, . . . , xN 1 and pdf parameterised by = Np(xi; )-20We form Likelihood function L(X; ) =12.m1i=0M LE = arg max L(X; )For height problem:1 can show ()M LE = Estimate of mean of Gaussian = sample mean of measured heights. 9 of 88xiNBayesian Estimation MLE is assumed unknown but deterministic Bayesian Approach: is assumed random with pdf p() Prior Knowledge.p(|x) =Aposteriorp(x|)p()p(x) p(x|) p()Prior Height problem: Unknown mean is random pdf Gaussian N (, 2)p() = 12 21exp 2 2 ( )2Then : ()Bayesian =2 + n 2x2 + n 2 Weighted average of sample mean and a prior mean 10 of 88Gaussian Mixture Modelp(x) = p(x|N (1; 1) + (1 ) p(x|N (2; 2)Mp(x) =wm p(x|N (m; m),wi = 1m=1Characteristics of GMM:Just like ANNs are universal approximators of functions, GMMs are universalapproximators of densities (provided sucient no. of mixtures are used); true fordiagonal GMMs as well. 11 of 88General Assumption in GMM Assume that there are M components. Each component generates data from a Gaussian with mean m and covariancematrix m. 12 of 88GMMConsider the following probability density function shown in solid blueIt is useful to parameterise or “model” this seemingly arbitrary “blue” pdf 13 of 88Gaussian Mixture Model (Contd.)Actually pdf is a mixture of 3 Gaussians, i.e.p(x) = c1N (x; 1, 1) + c2N (x; 2, 2) + c3N (x; 3, 3) andpdf parameters: c1, c2, c3, 1, 2, 3, 1, 2, 3 ci = 1(4)14 of 88Observation from GMMExperiment: An urn contains balls of 3 dierent colurs: red, blue or green. Behinda curtain, a person picks a ball from urn generate xi from N (x; 1, 1)If blue ball generate xi from N (x; 2, 2)If green ball generate xi from N (x; 3, 3)We have access only to observations x0, x1, . . . , xN 1Therefore : p(xi; ) = c1N (x; 1, 1) + c2N (x; 2, 2) + c3N (x; 3, 3)but we do not which urn xi comes from!Can we estimate component = c1 c2 c3 1 2 3 1 2 3T from the observations?Narg max p(X; ) = arg max i=1p(xi; )(5)15 of 88If red ballEstimation of Parameters of GMMEasier Problem: We know the component for each observationObs:Comp.x01x12x22x31x43x51x63x73x82x92x103x113x123X1 = x0, x3, x5 X2 = x1, x2, x8, x9 X3 = x3, x6, x7, x10, x11, x12 belongs to p3(x; 3, 3)From: X1 = x0, x3, x5 c1 = 131and 1 = 3 x0 + x3 + x52 1(x0 1)2 + (x2 1)2 + (x5 1)2In practice we do not know which observation come from which pdf. How do we solve for arg max p(X; ) ? 16 of 88belong top1(x; 1, 1)belongs top2(x; 2, 2)31 =3Incomplete & Complete Datax0, x1, . . . , xN 1 incomplete data,Introduce another set of variables y0, y1, . . . , yN 1such that yi = 1 if xi p1, yi = 2 if xi p2 and yi = 3 if xi p3Obs:Comp.miss:x01y0x12y1x22y2x31y3x43y4x51y5x63y6x73y7x82y8x92y9x103y10x113y11x123y12yi = missing dataunobserved data information about componentz = (x; y) is complete data observations and which density they come fromp(z; ) = p(x, y; ) M LE = arg max p(z; )Question: But how do we nd which observation belongs to which density ? 17 of 88Given observation x0 and g, what is the probability of x0 coming from rstdistribution?p (y0 = 1|x0; g)=p(y0=1,x0;g)p(x0;g)g gp(x0|y0=1;1,1 ) p(y0=1)j=1 p(x0|y0=j; ) p(y0=j)g g gp(x0|y0=1,1,1) c1 g g gp(x0|y0=1;1,1 ) c1+p(x0|y0=2;2,2 ) c2+.All parameters are known we can calculate p(y0 = 1|x0; g)(Similarly calculate p(y0 = 2|x0; g), p(y0 = 3|x0; g)Which density? y0 = arg max p(y0 = i|x0; g) Hard allocation 18 of 88g=3g g giParameter Estimation for Hard Allocationx0x1x2x3x4x5x6p(yj = 1|xj; g)p(yj = 2|xj; g)p(yj = 3|xj; g)Hard Assign.5y0=y1=10.20.750.05y2=y3=y4=y5=y6=2Updated Parameters: c1 =c2 =c3 =(dierent from initial guess!)Similarly (for Gaussian) nd:i, ifor ith pdf 19 of 882774712Parameter Estimation for Soft Assignmentx0x1x2x3x4x5x6p(yj = 1|xj; g)p(yj = 2|xj; g)p(yj = 3|xj; g)0.20.750.00. 20.60.2Example: Prob. of each sample belonging to component 1p(y0 = 1|x0; g), p(y1 = 1|x1; g) p(y2 = 1|x2; g), Average probability that a sample belongs to Comp.#1 iscnew=1NNi=1p(yi = 1|xi; g)0.5 + 0.6 + 0.2 + 0.1 + 0.2 + 0.4 + 0.2 2.2=7 7 20 of 881=Soft Assignment Estimation of Means & VariancesRecall: Prob. of sample j belonging to component ip(yj = i|xj; g)Soft Assignment: Parameters estimated by taking weighted average !new=Ni=1 xi p(yi =i=1 p(yi = 11 | xi; g)| xi; g)(1)newNi=1(xiN 1) p(yii=1 p(yi = 1 |= 1 | xi; g)xi; g)These are updated parameters starting with initial guess g 21 of 881N22=Maximum Likelihood Estimation of Parameters of GMM1. Make initial guess of parameters: g = c1, c2, c3, 1, 2, 3, 1 , 2 , 32. Knowing parameters g, nd Prob. of sample xi belonging to j th component.gpyi = j | xi; for i = 1, 2, . . . N no. of observationsfor j = 1, 2, . . . M no. of components3.bnew=N1 XN i=1p(yi = j | xi; g)4.new=Pi=1 xi p(yi =Pi=1 p(yi = jj | xi; g)| xi; g)5.(j )new=Pi=1(xi 1)2 p(yi = j | xi; g)Pi=1 p(yi = j | xi; g)6. Go back to (2) and repeat until convergence 22 of 88gggg g g g g gcjjNN2bNN 23 of 88 24 of 88Live demonstration: http:/www.neurosci.aist.go.jp/ akaho/MixtureEM.htmlPractical Issues E.M. can get stuck in local minima. EM is very sensitive to initial conditions; a good initial guess helps; k-meansalgorithm is used prior to application of EM algorithm 25 of 88Size of a GMMBayesian Information Criterion (BIC) value of a GMM can be dened as follows:BIC(G | X) = logp(X | G) dlogNwhereG represent the GMM with the ML parameter congurationd represents the number of parameters in GN is the size of the datasetthe rst term is the log-likelihood term;the second term is the model complexity penalty term.BIC selects the best GMM corresponding to the largest BIC value by trading othese two terms. 26 of 88 2source: Boosting GMM and Its Two Applications, F.Wang, C.Zhang and N.Lu in N.C.Oza et al. (Eds.) LNCS 3541, pp. 12-21, 2005The BIC criterion can discover the true GMM size eectively as shown in the gure. 27 of 88Maximum A Posteriori (MAP) Sometimes, it is dicult to get sucient number of examples for robust estimationof parameters. However, one may have access to large number of similar examples which can beutilized. Adapt the target distribution from such a distribution. For example, adapt aspeaker independent model to a new speaker using small amount of adaptationdata. 28 of 88MAP AdaptationML EstimationM LE = arg max p(X|)MAP EstimationM AP= arg max p(|X)= arg maxp(X|) p()p(X)= arg max p(X|) p()p() is the a priori distribution of parameters .A conjugate prior is chosen such that the corresponding posterior belongs to thesame functional family as the prior. 29 of 88Simple Implementation of MAP-GMMsSource: Statistical Machine Learning from Data: GMM; Samy Bengio 30 of 88Simple ImplementationTrain a prior model p with a large amount of available data (say, from multiplespeakers). Adapt the parameters to a new speaker using some adaptation data (X).Let = 0, 1 be a parameter that describes the faith on the prior model.Adapted weight of j th mixturewj =pwj+ (1 )p(j|xi) Here is a normalization factor such that wj = 1.31 of 88iSimple Implementation (contd.)meansj =pj+ (1 )i p(j|xi) xii p(j|xi)Weighted average of sample mean and a prior meanvariancesj = pj+pp+ (1 )i p(j|xi) xixii p(j|xi) jj 32 of 88 j jHMM Primary role of speech signal is to carry a message; sequence of sounds (phonemes)encode a sequence of words. The acoustic manifestation of a phoneme is mostly determined by: Conguration of articulators (jaw, tongue, lip) physiology and emotional state of speaker Phonetic context HMM models sequential patterns; speech is a sequential pattern Most text dependent speaker recognition systems use HMMs Text verication involves verication/recognition of phonemes 33 of 88Phoneme recognitionConsider two phonemes classes /aa/ and /iy/.Problem: Determine to which class a given sound belongs.Processing of speech signal results in a sequence of feature (observation) vectors:o1, . . . , oT (say MFCC vectors)We say the speech is /aa/ if: p(aa|O) p(iy|O)Using Bayes RuleAcousticM odelP riorP robp(O|aa) p(aa)p(O)V s.p(O|iy) p(iy)p(O)Given p(O|aa), p(aa), p(O|iy) and p(iy) which is more probable ? 34 of 88Parameter Estimation of Acoustic ModelHow do we nd the density function paa(.) and piy(.).We assume a parametric model: paa() parameterised by aa pij() parameterised by iyTraining Phase: Collect many examples of /aa/ being said Compute corresponding observations o1, . . . , oTaaUse the Maximum Likelihood Principleaa = arg max p(O; aa)aaRecall: if the pdf is modelled as a Gaussian Mixture Model then we use EM Algorithm 35 of 88.Modelling of PhonemeOur Articulators are moving from a congurationTo enunciate /aa/ in a word for previous phoneme to /aa/ and then proceedingto move to conguration of next phoneme.Can think of 3 distinct time periods: Transition from previous phoneme Steady state Transition to next phonemeFeatures for 3 “t

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论