[精品论文]A New Local PCA SOM Algorithm.doc_第1页
[精品论文]A New Local PCA SOM Algorithm.doc_第2页
[精品论文]A New Local PCA SOM Algorithm.doc_第3页
[精品论文]A New Local PCA SOM Algorithm.doc_第4页
[精品论文]A New Local PCA SOM Algorithm.doc_第5页
已阅读5页,还剩15页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

精品论文a new local pca som algorithmdong huang zhang yi and xiaorong pucomputational intelligence laboratory, school of computer science andengineering, university of electronic science and technology of china, chengdu610054, p. r. china.e-mail: donnyhuang, zhangyi, .abstractthis paper proposes a local pca-som algorithm. the new competition measure is computational efficient, and implicitly incorporates the mahalanobis distance and the reconstruction error. the matrix inversion or pca decomposition for each data input is not needed as compared to the previous models. moreover, the local data distribution is completely stored in the covariance matrix instead of the pre-defined numbers of the principal components. thus, no priori information of the optimal principal subspace is required. experiments on both the synthesis data and a pattern learning task are carried out to show the performance of the proposed method.key words: neural networks; unsupervised learning; local principal componentanalysis; self-organizing mapping1 introductionprincipal component analysis (pca) has been wildly used in dimension re- duction of multi-variate data, data compression, pattern recognition and sta- tistical analysis. a pca algorithm is designed to approximate the original high-dimensional pattern space by a low-dimensional subspace spanned by the principal eigenvectors of the data covariance matrix. in this way, the data distribution can be represented and reconstructed by the principal eigenvec- tors and their corresponding eigenvalues. however, pca is globally linear and1 this work was supported by national science foundation of china under grant60471055 and specialized research fund for the doctoral program of higher edu- cation under grant 20040614017.preprint submitted to elsevier science 28 september 20071inefficient for data distribution with non-linear dependencies 1. several ex- tensions have been suggested to overcome this problem. these extensions fall into two main categories: the global nonlinear approach and locally linear de- scriptions. the former includes the principal curves 2 and kernel pca 3. we focus on the latter approaches in which the data distribution is represented by a collection of local pca units 78.when describing a data distribution with a mixture model, the question arises what kind of units should the mixture contain. in gaussian mixture models 7, the local iso-density surface of a uniform gaussian unit is a sphere. and a local pca unit corresponds to a multivariate gaussian with an ellipsoid iso-density surface. despite its greater complexity, the local pca is favorable over a sphere unit for the following reasons. an ellipsoid can describe a local structure for which many spheres are needed. furthermore, data distributions are usually constrained locally to subspaces with fewer dimensions than the space of the training data. thus, there are directions in which the distribution has locally zero variance (or almost zero because of noise). an ellipsoid unit representation can extend its minor components into the additional noise dimensions. but the computational cost increases over-proportionally with the number of principal components needed.in the recent work ngas-pca 4, the local pca model is based on soft competition scheme of neural gas algorithm and the rrlsa 10 online pca learning in each local unit. the novelty of the ngas-pca is that its dis- tance measure for neuron competition is the combination of a normalized mahalanobis distance and the squared reconstruction error. note that storing the principal basis vectors for every unit involves rather complicated updating procedures as in ngas-pca and assom 9. another recent model is called the pca-som 5 which proposes an alternative way by storing the local in- formation in the covariance matrix. this is directly derived from on statistics theories and has great advantages over the assom model both in computa- tion burden and reliability of the result. however, pca-som only uses the reconstruction error in the principal subspace in its neuron competition. for this reason, pca-som still need the pca decomposition or updating with predefined numbers of the principal components when each training data is presented.in this paper, we propose a new computational efficient local pca algorithm that combines the advantages of ngas-pca and pca-som. each unit is associated with its mean vector and covariance matrix. the new competition measure implicitly incorporate the reconstruction error and distance between the input data and the unit center. in the proposed algorithm, the extra up- dating step of the principal subspaces is eliminated from the data distribution learning process. in addition, the local distribution of the input data is com- pletely stored in the covariance matrix instead of the predefined numbers of8精品论文the principal components. one potential application of the proposed model is the nonlinear pattern learning and recalling 11. in most models of associative memory, memories are stored as attractive fixed points at discrete locations in state space. discrete attractors may not be appropriate for patterns with con- tinuous variability 13. for example, the images of a three-dimensional object from different viewpoints 12. after the training process, the data distribution is represented by a collection of local linear units. no priori information for the optimal principal subspace is needed for the pattern representation.the rest of this paper is organized as follows. in section 2, the proposed compe- tition measure and neuron updating method are formulated. section 3 presents the simulation results and discussions. section 4 briefly reviews the ngas- pca and pca-som algorithms. and their relations to the proposed method are also discussed in details in this section. finally, the paper is concluded in section 5.2 the local pca modelin this section, we present the detailed formulation of the local pca models. the following notations are used throughout the paper. the input data setis x = xi |xi rd , i = 1, , m , where d is the data dimension. for each local unit, there are the center of the unit cj and the covariance matrixcj where j = 1, , n . the principal subspace of cj is characterized by w (p) = w1, , wp and (p) = diag1 , , p , where wi is the eigenvector and i the corresponding eigenvalue (i = 1, , p). throughout the paper the eigenvalues are arranged in the descending order. thus p is the number of underlying principal components. the rest (d p) eigenvectors spans theunderlying minor subspace.2.1 the proposed competition measurewe seek to formulate a local pca algorithm that can simplify both the neuron competition and updating. as in pca-som, each unit in our model is also as- sociated with its center and the covariance matrix. thus the data distribution is represented by a mixture of the ellipsoid local units. this is closely related to the anisotropic k-means clustering and kernel density estimation where we seek to minimizing the mahalanobis distancem ahalanobis(x) = xt c 1 x,or maximizing the normalized gaussian density function:(x) = 1e x c x,1 t 121z +(x)dx = 1,(1)2|c | 2 where |c | = 0. however, due to the matrix inversion c 1 , the computation complexity of the mahalanobis distance is o(d!). we consider an alternativeneuron competition measure:n r(x, c ) =xt c xtr(c )(xt x)2,(2)dwhere the size of the hyper-ellipsoid is normalized by tr(c ) = x i . the newi=1measure is called the normalized rayleigh ratio n r(x, c ). it is the normalizedscattering intensity of the data points. the computation burden of n r(x, c )is reduced to o(d2 ). the definition of n r(x, c ) is based on the assumptionthat xt x = 0. the following analysis start from the the rayleigh quotient.given a hermitian matrix c and nonzero vector x, the rayleigh quotientr(x, c ) is defined by:r(x, c ) =xt c x xt x .rayleigh-ritz theorem: if the eigenvalues of a hermitian (symmetric) ma- trix c is ordered as max = 1 2 d1 d = min , thenmin xt c xxt x max ,(3)where the equalities hold when x is an eigenvector of c . the gradient andhessian matrix of r(x, c ) can be computed as:r 2(x) =(c r i )x, (4)x xt x2r 2r t r thr (x) = xxt = xt x c x xwhere i is the identity matrix. x( x )x r i , (5)for the jth local pca neuron unit with its center cj and covariance matrixcj , the normalized rayleigh ratios n r(, cj ) is then written ast c n r(, cj ) = tr(c )(t )2 ,(6)jwhere = x cj . if n r(, cj ) is used as the competition measure in the probabilistic som training process, the winner v = maxn r(x, cj (t), cj (t).the final result of the training process is obtained when n r(x, cj (t), cj ) (j =1, , n ) is maximized for x x|e(x) = cj , e(xxt ) = cj . to analyze themaximum property of n r, the neuron index j is omitted.it is easy to know from (4) and (5) that () = 0 only at the critical points, where = wi (i = 1, , d). and at these critical pointshr (wi )wj = 0j = i, (j i )wjj = i.in real applications the covariance matrix c are positive defined, i.e. rank(c ) =d. note that x = 0, there is no local minima or maxima except for those onthe edge of the domain, i.e. the maxima at = wd , and the minima at = w1.tt this shows that the rayleigh ratio c is only polarized by the directions ofthe input vectors.take the logistic of n r(, c ) and compute the gradient,(ln(n r(, c )= (ln(t c ) 2ln(t )2c = t c 2=4xt c x t 2x t c .(7)t c t without losing generality, we assume = x c is in the subspace spanned by the eigenvectors of the covariance matrix c .1/2 = 1 w1 + + d wd ,kkdiwhere x 2 = 1. the second term in (7) is written asi=1c t 2 t c d dd d != kk3/2 x i wi wt x i wi 2 x i wi x i 2iiiiii d d j= kk3/2 x i i 2(x j 2)i wi .ijsince wi , i = 1, , d is a basis vector set in rd , ln(n r)/ = 0 only ifdji i 2(x j 2)i = 0,jfor all i = 1, , d. however, this is not possible (note kk = 0). thusn r(, c ) has no local maxima except on the edge of the domain. from therayleigh-ritz theorem (3), we have1t c 1 1t t c 1 t max t min max minandmint c maxmint c maxtr(c ) tr(c )t tr(c ) . tr(c )t tr(c )(t )2 tr(c )t .in the probabilistic framework as som training, the iterative algorithm uses input data (x x|e(x) = c, e(xxt ) = c ) over multiple pass. we considerthe property of the training results in the sense of mathematical expectation,e min(t )2 # #= e min t c 1 t= e(tr(c ) pdi) = = i=1. (8)kk6=0t c kk6=0maxmaxmaxit can be observed from (8) that maximizing the normalized rayleigh ratio (2)through the som training process is equivalent to minimizing the mahalanobis distance xt c 1 x, with a constrain on the size of the hyper-ellipsoid tr(c ) =dx i .i=12.2 updating of the neuronsthe neuron centers and covariance matrices are updated similar to that of pca-som. in practical situation, the normalized rayleigh ratio (6) is regu- lated as:tnr(, c ) =c + ,tr(c )(t )2 + jin case kk = 0 occurs, where = x cj (t) and 0 is a small constant. when a input data point,i.e. x, is randomly selected from the data set x , the winning neuron v is chosen as v = maxn r(x, cj (t), cj (t) (j = 1, , n ).then all the units (j = 1, , n ) are updated as follows:cj (t + 1) = cj (t) + (t)hv,j (t) x cj (t) ,(9)c j (t + 1) = (x cj (t + 1)(x cj (t + 1)t,(10)jcj (t + 1) = cj (t) + (t)hv,j (t) hc (t + 1) cj (t)i ,(11)where (t) is the learning rate, and hv,j (t) is the neighborhood function of jth neuron given the winning neuron v. the training process stops when the changes of cj and cj can be neglected. in the proposed method, the amplitudeof neuron updates (9)-(11) are not dependent on the iteration number, but on how well the neural network fits the input data 6. to achieve this, thelearning rate (t) is calculated as follows:(t) = kx(t) cv (t)k2 , r(t)wherer(t) = maxkx(t) cv (t)k2, r(t 1),r(0) = kx(0) cv (0)k2.the neighborhood function depends on n r(cj (t), cv (t), cv (t) and is calcu- lated ashv,j (t) = exp n r(cj (t), cv (t), cv (t) !(t),(12)where is a constant that scales the size of the neighborhood. in our experi-nments in section 3, it can be estimated as 1of the total variance of the datadistribution.the proposed method is an iterative deterministic algorithm similar to the neural networks for global pca and som . the proposed method updates neurons and converges iteratively using the sequentially training data over multiple passes. new data can be incorporated to update the outputs by added to the previous training set. the iterations continue until converge. here, convergence is reached when changes of neuron centers and covariance matrix can be neglected.3 simulations and discussionwe test the proposed local pca method on synthetic data of two and three dimensions and on high-dimensional data in pattern completion learning and recalling tasks.3.1 synthetic datathe following setting was used in the training and demonstration: the initial neuron centers are randomly located at the data points. the initial covariance matrix were identity matrices multiplied with a scaler related to the spatial extension of the distribution and are also specified for each experiment. similarto in (12), we set the scaler as maxi=1, , di /n , where i is the eigenvalue of themm 1covariance matrix 1 x(xj x)(xj x)t . in the following figures, trainingj=1vectors are depicted as small gray dots. for the ease of illustration, the neuronsare visualized as a set of ellipsoids centered at “”s. the orientations and half- axis of the ellipsoids are obtained from the estimated covariance cj at training step t. for graphical reasons, the length of the half-axis of a unit j is shown as 1.5qj .to evaluate the performances, we compute the log-likelihood per data pointl 4 based on gaussian density estimation. the gaussian probability densityfunction pj (x) = (2)(d2) |cj |1/2 exp 1 (x cj )t c 1(x cj ) for each unit2 jj. the prior probability or mixing parameter 14 of each unit is estimated frompj = mj /m , where mj is the number of training patterns assigned to unit jand m is the total number of training data. the probability density function精品论文n9of a gaussian mixture model with n units is p(x) =mx pj pj (x). the log-j=1mlikelihood per pattern l is computed as l = 1x log p(xi ). for a given datai=1set, l decrease from the unordered to the final well-ordered approximation ofthe data distribution by the local pca algorithms.in the two-dimensional example, the data distribution is approximated byngas-pca, pca-som and the proposed method. fig. 1 shows the the train- ing process of the proposed method (n = 7) at different time steps t. in fig.2, the train process of ngas-pca (p = d = 2) and pca-som (p = 1) areshown, where the initial neuron states are the same to that of fig. 1. it can been seen that ngas-pca can also approximate the training data by the local linear units. but the “dead units” may occur. in pca-som, minimizing the reconstruction error alone results in data assignment only related to the principal directions and is not suitable for the present task. as is shown in fig. 2 (d)-(f ), the structure of the data distribution can not be extracted.(a) t=0, l = 4.7465(b) t=1000, l = 3.6898 (c) t=4000, l = 3.1522(d) t=8000, l = 2.8863(e) t=12000, l = 2.5091(f ) t=20000, l = 2.4001fig. 1. snapshots of the training process of the proposed method at different time steps t for a two dimensional distribution with 700 points (n = 7).to further evaluate the performances, ngas-pca (p = d = 2) and the pro- posed model are trained on the two dimensional distribution (fig.1) startingfrom 100 random initial states. (number of neurons n = 7). fig. 3 showsthe comparison of the training results by the log-likelihood of density estima-精品论文(a)t=2000, l = 3.4678 (b) t=10000, l =

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

最新文档

评论

0/150

提交评论