[机械模具数控自动化专业毕业设计外文文献及翻译]【期刊】变分贝叶斯独立分量分析-外文文献_第1页
[机械模具数控自动化专业毕业设计外文文献及翻译]【期刊】变分贝叶斯独立分量分析-外文文献_第2页
[机械模具数控自动化专业毕业设计外文文献及翻译]【期刊】变分贝叶斯独立分量分析-外文文献_第3页
[机械模具数控自动化专业毕业设计外文文献及翻译]【期刊】变分贝叶斯独立分量分析-外文文献_第4页
[机械模具数控自动化专业毕业设计外文文献及翻译]【期刊】变分贝叶斯独立分量分析-外文文献_第5页
免费预览已结束,剩余6页可下载查看

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

Variational Bayesian Independent ComponentAnalysisNeil D. Lawrence7 Kingsfold Close,Billingshurst, West SussexRH14 9HG, U.K.neil.lawrencecl.cam.ac.ukChristopher M. BishopMicrosoft ResearchSt. George House, 1 Guildhall StreetCambridge, CB2 3NH, U.K.AbstractBlind separation of signals through the info-max algorithm may beviewed as maximum likelihood learning in a latent variable model. Inthis paper we present an alternative approach to maximum likelihoodlearning in these models, namely Bayesian inference. It has already beenshown how Bayesian inference can be applied to determine latent dimen-sionality in principal component analysis models (Bishop, 1999a). Inthis paper we derive a similar approach for removing unecessary sourcedimensions in an independent component analysis model. We presentresults on a toy data-set and on some artificially mixed images.1 IntroductionThe objective of independent component analysis is to find a representation for a data-setwhich is based on probabilistically independent components. One way to achieve such arepresentation is to fit the data to a latent variable model where the latent variables areconstrained to be independent.We consider a model, C5, in which there are C1 latent dimensions, C8 observed dimensionsand our data set contains C6 samples. In the ICA literature the latent dimensions are oftenrefered to as sources. Since we are seeking representations for our data in which the latentvariables, DC, are generated independently, we takeD4B4DCD2B5 BPC1CHCXBPBDD4B4DCCXD2B5 (1)for any particular data point D2. Assuming Gaussian noise, the probability of each instanti-ation of the observed variables, D8D2, may then be taken to beD4B4D8D2CYDCD2BNCFBNACBNAMB5 BPD6ACBEAPCTDCD4AIACBECZD8D2A0CFDCD2A0AMCZBEAJBN (2)where CF is a C8 A2C1 matrix of parameters, AC represents an inverse noise variance and AM avector of means.1.1 The Source DistributionAs is well known in independent component analysis, the choice of the latent distributionis important. In particularit must be non-Gaussian. Non-Gaussian source distributionsmay be split into two classes, those with positive kurtosis or heavy tails and those withnegative kurtosis or light tails. The former are known as super-Gaussian distributions andthe latter as sub-Gaussian. If our true source distributions belong in either of these twoclasses we may attempt to separate them. For our ICA model, we follow Attias (1998) whochooses a flexible source distribution which may take a super or sub-Gaussian form. Theresulting model will then be applicable for both eventualities. Attias selected a factoriseddistribution in which each factor is a mixture of C5 Gaussians,D4B4DCD2B5 BPC1CHCXBPBDAYC5CGD1BPBDAPD1C6B4DCD2CXCYD1D1BNARBED1B5AZBN (3)where CUAPD1CV are the mixing coefficients and each component is governed by a mean D1D1and a variance ARBED1. Attias refered to the model as independent factor analysis. We maynow write down a likelihood which is a function of the parameters CF, AC and AMD4B4D8CYCFBNACBNAMB5 BPC6CHD2BPBDCID4B4D8D2CYDCD2BNCFBNACBNAMB5D4B4DCD2B5CSDCD2BM (4)This function could now be maximised with respect to the parameters in order to deter-mine the independent components. Traditionally this optimisation is performed in the limitas AC tends to zero. This approach to blind source separation was introduced by Bell andSejnowski (1995) as an information maximisation algorithm. The relationship with maxi-mum likelihood was pointed out by various authors including Cardoso (1997) and MacKay(1996).2 A Bayesian Formalism of ICAIn this paper we propose, rather than learning the parameters through maximum likeli-hood, to follow the Bayesian approach of inferring the parameterisation of the model. Thisrequires us to place priors over the model parameters.We aim to show how through a particular selection of our prior distribution D4B4CFB5 we mayautomatically determine the number of sources which have produced our data. We areinspired in our approach by the Bayesian PCA of Bishop (1999a) which aims to determinethe dimensionality of the principal sub-space automatically.We choose to treat the noise precision, AC, with a gamma priorD4B4ACB5 BP CVCPD1B4ACCYCPACBNCQACB5 (5)where we define the gamma-distribution asCVCPD1B4ASCYCPBNCQB5 BPCQCPA0B4CPB5ASCPA0BDCTDCD4B4A0CQASB5 (6)For the mixing matrix, CF, we consider a Gaussian prior. In particular the relevance of eachinput may be determined through the use of the automatic relevance determination (ARD)prior (Neal, 1996; MacKay, 1995b)D4B4CFCYABB5 BPC1CHCXBPBDC8CHD4BPBDC6B4DBCXD4CYBCBNABA0BDCXB5 (7)where the prior is governed by a vector of hyper-parameters, AB, of length C1. The parametersassociated with each input of the network are governed by an element of the vector whichdetermines the its relevance.The hyper-parameters AB may be inferred through a hierarchical Bayesian framework. Wetherefore place gamma distributed hyper-prioes across these parameters,D4B4ABB5 BPC1CHCXBPBDCVCPD1B4ABCXCYCPABCXCQABCXB5BM (8)Finally we place a Gaussian prior over the means AMD4B4AMB5 BPC8CHD4BPBDC6B4AMD4CYBCBNASB5 (9)where AS represents the inverse variance of the prior.We may now define our model likelihoodD4B4D8CYC5B5 BPCID4B4D8CYCFBNACBNDCB5D4B4DCB5D4B4CFCYABB5A2D4B4ABB5D4B4ACB5D4B4AMB5CSDCCSCFCSABCSACCSAMBM (10)3 The Variational ApproachIn Bayesian inference we aim to infere posterior distributions for our parameters. Integralsof the type shown in Eqn 10 are important in this process. Unfortunately, for Bayesian ICAas we have described it, this integral is intractable and we must look to approximations tomake progress. We choose to take a variational approach (Jordan et al., 1998; Lawrence,2000). The variational approach involves developing an approximation, D5B4C0B5, to the trueposterior distribution, D4B4C0CYCEB5, of the latent variables, C0, given the observed variables, CE.Variational inference can provide a rigourous lower bound on marginalised log-likelihoodsof the form,D0D2CID4B4C0BNCEB5CSC0 ALCID5B4C0B5D0D2D4B4C0BNCEB5D5B4C0B5CSC0 (11)The difference between this bound and the true marginalised likelihood can be shown tobe the Kullback-Leibler (KL) divergence between the true posterior distribution and theapproximation.C3C4B4D5CYCYD4B5 BPCID5B4C0B5D0D2D4B4C0CYCEB5D5B4C0B5CSC0 (12)If we were to utilise an unrestricted approximation, D5B4C0B5, and perform a free-form maxi-misation of bound 11 with respect to D5B4C0B5 we would recover D5B4C0B5 BP D4B4C0CYCEB5 and thebound would become exact. This approach is the expectation step of an expectation-maximisation algorithm. However in our model, if such a choice were made, the in-tractabilities would not be resolved. Instead, by placing restrictions on the form of theapproximating distribution, we hope to make minimisation of the KL-divergence achiev-able.The choice of the variational D5-distribution is important. We seek a choice which is simpleenough to make our computations tractable, but one which gives enough flexibility to makethe bound 11 tight. There are various approaches to determining a useful approximation tothe posterior. Lappalainen (1999), for example, imposed specific parameterised functionalforms on his variational distributions and then minimised the KL-divergence by gradientbased optimisation of their parameters. In this paper we prefer to consider free-form opti-misations of our approximating distributions.As we have already mentioned if we were to allow completely unconstrained free-formoptimisation of the posterior approximation, we would merely recover the true posteriordistribution and the intractabilities would not have been circumvented. We therefore mustimpose constraints on the form of the approximation. Consider a model where the latentvariables, C0, are split into exclusive sub-sets C0CX. If we impose separability constraintsacross these sub-sets on our approximation to the posterior,D5B4C0B5 BPCHCXD5B4C0CXB5BN (13)it is straightforward to show (MacKay, 1995a; Lawrence, 2000) that the optimum form foreach component of posterior distribution isD5B4C0CYB5 BB CTDCD4CWD0D2D4B4C0B5CXC9CXBIBPCYD5B4C0CXB5BM (14)Here we have used the notation CW A1 CXD5to denote an expectation under the distribution D5.By utilising an approximation that factorises across the parameters of the model,D5B4DCBNCFBNABBNACBNAMB5 BP D5DCB4DCB5D5DBB4CFB5D5ABB4ABB5D5ACB4ACB5D5AMB4AMB5BN (15)we may obtain a lower bound on the model log-likelihood of the formD0D2D4B4D8CYC5B5 AL CWD0D2D4B4D8CYCFBNACBNDCBNAMB5CXD5DCD5DBD5ACD5AMB7CWD0D2D4B4DCB5CXD5DCB7CWD0D2D4B4CFCYABB5CXD5DBD5ABB7CWD0D2D4B4ABB5CXD5ABB7CWD0D2D4B4ACB5CXD5ACB7CWD0D2D4B4AMB5CXD5AMB7CBB4D5DCB5 B7CBB4D5DBB5 B7CBB4D5ABB5 B7CBB4D5ACB5B7CBB4D5AMB5 (16)where CBB4D4B4 A1 B5B5 is the entropy of distribution D4B4 A1 B5The difference between this boundand D0D2D4B4D8CYC5B5 is the KL-divergence between the true and approximating posterior. For theBayesian ICA model, as we have described it, all the necessary expectations in Eqn 14 maybe performed analytically giving the following resultsD5DCBPC6CHD2BPBDC5CGD1BDBPBDA1A1A1C5CGD1C1BPBDDIAPB4D2B5D1CID2C6B4DCD2CYCUB4D2B5D1BNBWD1B5BN (17)D5AMBP C6B4AMCYD1AMBNA6AMB5 (18)D5DBBPC8CHD4BPBDC6B4DBD4CYD1B4D4B5DBBNA6DBB5 (19)D5ABBPC1CHCXBPBDCVCPD1B4ABCXCYDICPABBNDICQB4CXB5ABB5 (20)D5ACBP CVCPD1B4ACCYDICPACBNDICQACB5BN (21)where the parameters of the latent distribution D5DC: BWD1BNCUB4D2B5D1BN DIAPB4D2B5D1and CID2are given inSection A. The other parameters are straightforward to derive and may be found in Bishop(1999b).The solution for the optimal factors of the D5-distribution is an implicit one. Each distribu-tion depends on moments of the other solutions. A solution may be determined numericallyby starting with a suitable initial guess and cycling through through each distribution usingthe update equations given in Section A and Bishop (1999b).D8BDD8BDD8BED8BED8BFD8BFFigure 1: Scatter plots of samples from the model. The discovered embedded dimensions are shownon the centre of the plots as solid lines while the true embedded dimensions are shown to the upperright of each plot as dashed lines.4 Results4.1 Toy ProblemFor an evaluation of the ability of our model to determine the latent dimensionality, wesampled a toy data-set CUD8D2CV from a randomly parameterised model. The two source dis-tributions were chosen to contain two components, both with mean zero. The variances ofeach component, ARBED1, were taken to be BD and BHBC. The number of observed data dimensionswas then taken to be three, and the inverse variance of the noise was set at BDA2BDBCA0BG. Thevalues of the mixing matrix CF were randomly sampled from a Gaussian with unit vari-ance. We obtained 200 samples from the model and then sought to determine the numberof sources and the noise variance by inference in a model with three source dimensions.The sources were of identical type to those in the generating model. The values of CF wereinitialised with a PCA solution. Optimisation of the distributions then proceeded by firstupdating the moments of AM then the moments of DC and finally the moments of CF. Thesemoments were updated until convergence or for a maximum of 100 times. The momentsof AB and AC were then updated. This whole process was repeated fifty times.The selected independent components are shown in Figure 1. Note that the true number ofsource dimensions has been correctly determined. Shown in Figure 2 is the evolution ofeach D0D3CVBDBCABCXduring learning. Note that very quickly one of the hyper-parameters becomesthree orders of magnitude larger than the others effectively switching off one of the sourcedimensions.4.2 Real DataAs a further test of our model we analysed two images. The images are from The CorelGallery 1,000,000 Collection and were averaged across their red, green and blue channelsto obtain grey-scale images. The kurtosis of both images was negative, showing that bothimages are sub-Gaussian. We sub-sampled 1000 data-points from the images and thenmixed them with the same matrix as for the toy problem. Gaussian noise was then addedwith a variance which was 7.8% of the source signal size. We then implemented a modelcontaining three source dimensions. The factors of the source distributions contained twoiterationsD0D3CVBDBCAB-101230 5 10 15 20 25 30 35 40 45 50Figure 2: Evolution of the base 10 logarithm of the hyper-parameters during ponents, both with unit variance and with means of 1 and A0BD. The updates of thevariational distributions were carried out in a similar manner to the model trained on thetoy data.Figure 3 shows the results for the model. This model did not remove the third sourceautomatically. However we have shown the image with the third output dimension (the oneassociated with the highest ABCX) manually removed.Figure 4 shows scatter plots of the sub-sampled data projected along the three possiblepairings of the output dimensions. Also shown, in the upper right corners of each plot, arethe true directions of the mixing matrix and in the centre of the plot the expected value ofthe mixing matrix determined by the algorithm.5 DiscussionThe suggested model performed well when trained on toy data which had been sampledfrom similar source distributions. However in experiments on images the independent com-ponent analysis model was unable to determine the true number of sources for the real data.This problem perhaps arises because the assumed forms of the latent distributions are notmatched to the true source distributions1. As a result the model may always better explainthe data by adding phantom source distributions. This is because the embedded observa-tion space may obtain more and more complex forms through the addition of more latentdimensions. This is in contrast to the situation for Bayesian PCA. In Bayesian PCA the la-tent distributions are taken to be Gaussian and any sub-space embedded within a Gaussiandistribution will also be Gaussian. As a result it does not exhibit these problems.We might take the approach of removing the source associated with the highest hyper-parameter and checking whether the lower bound on the model likelihood increases. Unfor-tunately, although we are maximising a lower bound on the model-likelihood for BayesianICA, we are unable to determine the value of this bound. This is a consequence of the pos-terior approximations for the latent variables being a mixture of Gaussians. To calculatethe bound 16 we are required to evaluate the entropy of the mixture distribution. There issome hope for progress because while exact evaluation of this entropy is intractable, it maybe lower bounded (Lawrence & Azzouzi, 1999).1We are not referring to the issue of sub-Gaussian versus super-Gaussian distributions, we haveassumed that the sign of the kurtosis of the sources was known in these experiments and have selectedour latent distributions accordingly.Figure 3: Results when the model is left to determine the latent dimensionality. The recoveredsources with the phantom latent dimension manually removed. Note that there is a stronger ghost ofthe puppy on the image of the Colosseum.D8BDD8BDD8BED8BED8BFD8BFFigure 4: Scatter plots of sub-samples of the observed images for the experiment attempting todetermine the true number of sources. The discovered embedded dimensions are shown on the centreof the plots as solid lines while the true embedded dimensions are shown to the upper right of eachplot as dashed lines.Another solution, and the one we would propose, would be to adaptively modify the latentdistributions as part of the optimisation process. This could be achieved through modifica-tion of the latent variable distributions as shown by, for example, Attias (1998).A final alternative is to use a simpler model to determine the latent dimensionality. Wecould apply Bayesian PCA to determine the principal sub-space and the associated latentdimensionality. Then a simple maximum likelihood ICA model could be applied withinthat embedded space. We prefer our suggested solution though as it would be able tohandle problems where there are more sources than observed dimensions and is thereforemore general. Another and very different approach to determining the sources has beensuggested by Hyvarinen and Oja (1997). This approach is not based on a likelihood modeland is related to projection pursuit.We note that the model we have described should also be able to handle more source di-mensions than there are data dimensions. This has been considered a difficult problem inthe independent component analysis literature.An additional problem with our approach is the exponential growth in the number of com-ponents as a function of the latent space dimensionality. This makes the approach imprac-tical for models with large numbers of sources.A Source D5-distribution for Bayesian ICAFirst of all we note that the joint distribution factorises across the data-points and we maytherefore takeD5DCB4CGB5 BPC6CHD2BPBDD5B4D2B5DCB4DCD2B5BM (22)In Eqn 14 we need only take expectations of the components of D0D2D4B4BWBNAIB5 which containDCD2, as we will look to take care of the other terms, which are constant in DCD2, through thenormalising constant.D0D2D5B4D2B5DCB4DCB5 BPC1CGCXBPBDD0D2D4B4DCD2B5A0ASACBECZD8D2A0DBDCD2A0AMCZBEATAIBIBPDCB7CRD3D2D7D8 (23)where D4B4DCB5 is the latent variable model which we took to be a factorised distribution inwhich each factor is a mixture of C5 Gaussians,D4B4DCD2B5 BPC1CHCXBPBDC5CGD1BPBDAPD1C6CXD1BN (24)where the notation C6CXD1indicates the Gaussian distribution C6B4DCD2CXCYD1D1BNARBED1B5. The sec-ond term of Eqn 23 may be expanded and rewritten in terms of the expectations of eachparameterA0CWACCXBEDCCCD2AADBCCDBABDCD2B7CWACCXDCCCD2CWDBCXB4D8D2A0CWAMCXB5CCAH BTB4DCD2B5BM (25)Exponentiating both sides we obtainD5B4D2B5DCB4DCD2B5 BB CTDCD4CJBTB4DCD2B5CLC1CHCXBPBDC5CGD1BPBDAPD1C6CXD1BM (26)The product of the sums of Gaussian

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论