iccv2019论文全集9568-outlier-detection-and-robust-pca-using-a-convex-measure-of-innovation

上传人：我*** IP属地：北京上传时间：2020-06-19 格式：PDF 页数：11 大小：702.36KB 积分：12 举报 版权申诉

iccv2019论文全集9568-outlier-detection-and-robust-pca-using-a-convex-measure-of-innovation_第2页

iccv2019论文全集9568-outlier-detection-and-robust-pca-using-a-convex-measure-of-innovation_第3页

iccv2019论文全集9568-outlier-detection-and-robust-pca-using-a-convex-measure-of-innovation_第4页

iccv2019论文全集9568-outlier-detection-and-robust-pca-using-a-convex-measure-of-innovation_第5页

已阅读5页，还剩6页未读，继续免费阅读

版权说明：本文档由用户提供并上传，收益归属内容提供方，若内容存在侵权，请进行举报或认领

文档简介

1、Outlier Detection and Robust PCA Using a Convex Measure of Innovation Mostafa Rahmani and Ping Li Cognitive Computing Lab Baidu Research USA 10900 NE 8th St. Bellevue, WA 98004 mostafarahmani,liping11 Abstract This paper presents a provable and strong algorithm, termed Innovation Search (iSearch), t

2、o robust Principal Component Analysis (PCA) and outlier detection. An outlier by defi nition is a data point which does not participate in forming a low dimensional structure with a large number of data points in the data. In other words, an outlier carries some innovation with respect to most of th

3、e other data points. iSearch ranks the data points based on their values of innovation. A convex optimization problem is proposed whose optimal value is used as our measure of innovation. We derive analytical performance guarantees for the proposed robust PCA method under different models for the di

4、stribution of the outliers including randomly distributed outliers, clustered outliers, and linearly dependent outliers. Moreover, it is shown that iSearch provably recovers the span of the inliers when the inliers lie in a union of subspaces. In the challenging scenarios in which the outliers are c

5、lose to each other or they are close to the span of the inliers, iSearch is shown to outperform most of the existing methods. 1Introduction Outlier detection is an important research problem in unsupervised machine learning. Outliers are associated with important rare events such as malignant tissue

6、s 14, the failures of a system 10, 12,31, web attacks 16, and misclassifi ed data points 9,27. In this paper, the proposed outlier detection method is introduced as a robust Principal Component Analysis (PCA) algorithm, i.e., the inliers lie in a low dimensional subspace. In the literature of robust

7、 PCA, two main models for the data corruption are considered: the element-wise model and the column-wise model. These two models are corresponding to two different robust PCA problems. In the element-wise model, it is assumed that a small subset of the elements of the data matrix are corrupted and t

8、he support of the corrupted elements is random. This problem is known as the low rank plus sparse matrix decomposition problem 1,3,4,23,24. In the column-wise model, a subset of the columns of the data are affected by the data corruption 5,7,8,11,17,20,25,26,3639. Section 2 provides a review of the

9、robust (to column-wise corruption) PCA methods. This paper focuses on the column-wise model, i.e., we assume that the given data follows Data Model 1. Data Model 1.The data matrixD RM1M2can be expressed asD = B (A + N) T ,where A Rmni,B Rmno,Tis an arbitrary permutation matrix, andB (A + N)represent

10、s the concatenation ofBand(A + N). The columns ofAlie in anr-dimensional subspaceU. The columns ofBdo not lie entirely inU, i.e., thenicolumns ofAare the inliers and thenocolumns of Bare the outliers. The matrixNrepresents additive noise. The orthonormal matrixU RM1ris a basis for U. Evidently, M2=

11、ni+ no. In the robust PCA problem, the main task is to recoverU. Clearly, ifUis estimated accurately, the outliers can be located using a simple subspace projection 22. 33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada. Summary of Contributions: The main cont

12、ributions can be summarized as follows. The proposed approach introduces a new idea to the robust PCA problem. iSearch uses a convex optimization problem to measure the Innovation of the data points. It is shown that iSearch mostly outperforms the exiting methods in handling close outliers and noisy

13、 data. To the best of our knowledge, the proposed approach and the CoP method presented in 27 are the only robust PCA methods which are supported with analytical performance guarantees under different models for the distributions of the outliers including the randomly distributed outliers, the clust

14、ered outliers, and the linearly dependent outliers. In addition to considering different models for the distribution of the outliers, we provide analytical performance guarantees under different models for the distributions of the inliers too. The presumed models include the union of subspaces and t

15、he uniformly at random distribution on U SM11where SM11denotes the unit 2-norm sphere in RM1. Notation:Given a matrixA,kAkdenotes its spectral norm. For a vectora,kakpdenotes itsp-norm anda(i)itsithelement. Given two matricesA1andA2with an equal number of rows, the matrix A3= A1A2is the matrix forme

16、d by concatenating their columns. For a matrixA,aidenotes itsithcolumn. The subspaceUis the complement ofU. The cardinality of setI is defi ned as|I|. Also, for any positive integer n, the index set1,.,nis denotedn. The coherence between vector a and subspace H with orthonormal basis H is defi ned a

17、s kaTHk2. 2Related Work In this section, we briefl y review some of the related works. We refer readers to 18,27 for a more comprehensive review on the topic. One of the early approaches to robust PCA was to replace the Frobenius norm in the cost function of PCA with1-norm because1-norm were shown t

18、o be robust to the presence of the outliers 2,15. The method proposed in 6 leveraged the column-wise structure of the corruption matrix and replaced the1-norm minimization problem with an1,2-norm minimization problem. In 19 and 39, the optimization problem used in 6 was relaxed to a convex optimizat

19、ion problem and it was proved that under some suffi cient conditions the optimal point is a projection matrix which spansU. In 34, a provable outlier rejection method was presented. However, 34 assumed that the outliers are randomly distributed onSS1and the inliers are distributed randomly onU SM11.

20、 In 36, a convex optimization problem was proposed which decomposes the data into a low rank component and a column sparse component. The approach presented in 36 is provable but it requiresno to be signifi cantly smaller thanni. In 32, it was assumed that the outliers are randomly distributed onSM1

21、1and a small number of them are not linearly dependent. The method presented in 32 detects a data point as an outlier if it does not have a sparse representation with respect to the other data points. Connection and Contrast to Coherence Pursuit:In 27, Coherence Pursuit (CoP) was proposed as a prova

22、ble robust PCA method. CoP computes the Coherence Values for all the data points to rank the data points. The Coherence value corresponding to data columndis a measure of resemblance betweendand the rest of the data columns. CoP uses the inner product betweendand the rest of the data points to measu

23、re the resemblance betweendand the rest of data. In sharp contrast, iSearch fi nds an optimal direction corresponding to each data column. The optimal direction corresponding to data columndis used to measure the innovation ofdwith respect to the rest of the data columns. We show through theoretical

24、 studies and numerical experiments that fi nding the optimal directions makes iSearch signifi cantly stronger than CoP in detecting outliers which carry weak innovation. Connection and Contrast to Innovation Pursuit:In 28,29, Innovation Pursuit was proposed as a new subspace clustering method. The o

25、ptimization problem proposed in 28 fi nds a direction in the span of the data such that it is orthogonal to the maximum number of data points. We present a new discovery about the applications of Innovation Pursuit. It is shown that the idea of innovation search can be used to design a strong outlie

26、r detection algorithm. iSearch uses an optimization problem similar to the linear optimization problem used in 28 to measure the innovation of the data points. 3Proposed Approach Algorithm 1 presents the proposed method along with the defi nition of the used symbols. iSearch consists of 4 steps. In

27、the next subsections, Step 2 and Step 4 are discussed. In this paper, we use an ADMM solver to solve (1). The computation complexity of the solver isO(max(M1M2 2,M 2 1M2). 2 Algorithm 1 Subspace Recovery Using iSearch 1. Data Preprocessing. The input is data matrix D RM1M2. 1.1 Defi neQ RM1rd as the

28、 matrix of fi rstrdleft singular vectors ofDwhererdis the number of non-zero singular values. SetD = QTD. If dimensionality reduction is not required, skip this step. 1.2 Normalize the 2-norm of the columns of D, i.e., set diequal to di/kdik2for all 1 i M2. 2. Direction Search. Defi ne C RrdM2such t

29、hat c i Rrd1is the optimal point of min c kcTDk1subject tocTdi= 1 or defi ne C RrdM2as the optimal point of min C k(CTD)Tk1subject todiag(CTD) = 1 .(1) 3. Computing the Innovation Values. Defi ne vector x RM21such that x(i) = 1/kDTc ik1. 4. Building Basis.Construct matrixYfrom the columns ofDcorresp

30、onding to the smallest elements of x such that they span an r-dimensional subspace. Output: The column-space of Y is the identifi ed subspace. If PCA is used in the prepossessing step to reduce the dimensionality of the data tord, the computation complexity of the solver is O(max(rdM2 2,r2dM2) 1. 3.

31、1An Illustrative Example for Innovation Value We use a synthetic numerical example to explain the idea of computing the Innovation Value. Suppose D R20250, ni= 200, no= 50, and r = 3. Assume that D follows Assumption 1. Assumption 1.The columns ofAare drawn uniformly at random fromU SM11. The column

32、s of Bare drawn uniformly at random fromSM11. To simplify the exposition and notation, it is assumed without loss of generality that T in Data Model 1 is the identity matrix, i.e, D = B A. Suppose d is a column of D, defi ne cas the optimal point of min c kcTDk1subject tocTd = 1 ,(2) and defi ne the

33、 Innovation Value corresponding todas1/kDTck1. The main idea of iSearch is that chas two completely different behaviours with respect toU(whendis an outlier and whendis an inlier). Supposedis an outlier. The optimization problem (2) searches for a direction whose projection ondis non-zero and it has

34、 the minimum projection on the rest of the data points. Asdis an outlier,dhas a non-zero projection onU. In addition, asniis large, (2) searches for a direction in the ambient whose projection onUis as weak as possible. Thus,clies inUor it is close toU. -0.5 0 0.5 1 c* D d ins an outlier, p=1 050100

35、150200250 Element Index -1 -0.5 0 0.5 1 c* D d ins an inlier, p=1 050100150200250 Element Index 0 0.2 0.4 0.6 0.8 1 x Innovation Values, p=1 050100150200250 Element Index Figure 1: The fi rst 50 columns are outliers. The left panel shows vectorDTcwhendis an outlier. The middle panel depictsDTcwhendi

36、s an inlier. The right panel shows the Innovation Values corresponding to all the data points (vector x was defi ned in Algorithm 1). The left plot of Figure 1 showsDTcwhendis an outlier. In this case,cis orthogonal to all the inliers. Accordingly, whendis an outliers,kDTck1is approximately equal to

37、kBTck1. On the 1If the data is noisy, rdshould be set equal to the number of dominant singular values. In this paper, we do not theoretically analyze iSearch in the presence of noise. In the numerical experiments, we set rdequal to the index of the largest singular value which is less than or equal

38、to 0.01 % of the fi rst singular value. 3 other hand, whendis an inlier, the linear constraint strongly discouragescto lie inUor to be close toU. Inliers lie in a low dimensional subspace and mostly they are close to each other. Since chas a strong projection ond, it has strong projections on many o

39、f the inliers. Accordingly, the value ofkATck1is much larger whendis an inlier. Therefore, the Innovation Value corresponding to an inlier is smaller than the Innovation Value corresponding to an outlier becausekATck1is much larger whendis an inliers. Figure 1 compares the vectorDTcwhendis an outlie

40、rs with the same vector whendis an inlier. In addition, it shows the vector of Innovation Values (right plot). One can observe that the Innovation Values make the outliers clearly distinguishable. 3.2Building the Basis Matrix The data points corresponding to the least Innovation Values are used to c

41、onstruct the basis matrixY. If the data follows Assumption 1, therdata points corresponding to thersmallest Innovation Values spanUwith overwhelming probability 35. In practise, the algorithm should continue adding new columns toYuntil the columns ofYspans ar-dimensional subspace. This approach requ

42、ires to check the singular values ofYseveral times. We propose two techniques to avoid this extra steps. The fi rst approach is based on the side information that we mostly have about the data. In many applications, we can have an upper-bound onnobecause outliers are mostly associated with rare even

43、ts. If we know that the number of outliers is less thanypercent of the data, matrixYcan be constructed using(1y)percent of the data columns which are corresponding to the least Innovation Values. The second approach is the adaptive column sampling method proposed in 27. The adaptive column sampling

44、method avoids sampling redundant columns. 4Theoretical Studies In this section, we analyze the performance of the proposed approach with three different models for the distribution of the outliers: unstructured outliers, clustered outliers, and linearly dependent outliers. Moreover, we analyze iSear

45、ch with two different models for the distribution of the inliers. These models include the union of subspaces and uniformly at random distribution onU SM11. In the theoretical investigations, we do not consider noisy inliers. In Section 5, it is shown with real and synthetic data that iSearch accura

46、tely estimatesUeven in the low signal to noise ratio cases and it mostly outperforms the existing approaches when the data is noisy. The theoretical results are followed by short discussions which highlight the important aspects of the theorems. The proofs of the presented theorems are available in

47、an extended version of this work 30. 4.1Randomly Distributed Outliers In this section, it is assumed thatDfollows Assumption 1. In order to guarantee the performance of the proposed approach, it is enough to show that the Innovation Values corresponding to the outliers are greater than the Innovatio

48、n Values corresponding to the inliers. In other word, it suffi ces to show max ? 1/kDTc ik1 M2 i=no+1 ? no M1 + 2 r no M1 + s 2nolog1/ (M1 1)M1 + s noc 00 logno/ M2 1 + n 0 z s c 00 M2 1 + s? no M2 1 + M1 ? logno/ #r 4M1c M1 cr and A max no M 1 + 2no(1 + ) + 2 s 2nolog 1 M 1 ,2n 0 z r cr M1 + 2 s no

49、crlogno/ M1 , (4) 4 then (3) holds andUis recovered exactly with probability at least1 7where c = 3max ? 1, q 8M1 (M11)r, q 8M1log n0/ (M11)r ? , q c 00 = 3max ? 1, q 8M1 M11, q 16M1log n0/ M11 ? , and= max ? 4 3 log 2M1 , q 4 no M1 log 2M1 ? . Theorem 1 shows that as long asni/r is suffi ciently la

50、rger thanno/M1, the proposed approach is guaranteed to detect the randomly distributed outliers exactly.It is important to note that in the suffi cient conditionsniis scaled with1/rbutnois scaled with1/M1. It shows that ifr is suffi ciently smaller thanM1, iSearch provably detects the unstructured o

51、utliers even ifnois much larger thanni. The numerical experiments presented in Section 5 confi rms this feature of iSearch and they show that if the outliers are unstructured, iSearch can yield exact recovery even ifno 100 ni. It is important to note that when the outliers are structured, by the def

52、i nition of outlier,nocannot be larger thanni. 4.2Structured Outliers In this section, we analyze the proposed approach with structured outliers. In contrast to the unstructured outliers, structured outliers can form a low dimensional structure different from the structure of the majority of the dat

53、a points. Structured outliers are associated with important rare events such as malignant tissues 14 or web attacks 16. In this section, we assume that the outliers form a cluster outside ofU . The following assumption specifi es the presumed model for the distribution of the structured outliers. As

54、sumption 2.A column ofBis formed asbi= 1 1+2 (q+vi). The unit2-norm vectorqdoes not lie in U, vino i=1are drawn uniformly at random from S M11, and is a positive number. According to Assumption 2, the outliers cluster around vectorqwhereq 6 U. In Algorithm 1, if the dimensionality reduction step is

55、performed, the direction search optimization problem is applied to QTD. Thus, (2) is equivalent to min c kcTDk1subject tocTd = 1andc Q ,(5) wherec RM11andD RM1M2. The subspaceQis the column-space ofD. In this section, we are interested in studying the performance of iSearch in identifying tightly cl

56、ustered outliers because some of the existing outlier detection algorithms fail if the outliers form a tight cluster. For instance, the thresholding based method 13 and the sparse representation based algorithm 32 fail when the outliers are close to each other. Therefore, we assume that the span ofQ

57、is approximately equal to the column-space ofU q. The following Theorem shows that even if the outliers are close to each other, iSearch successfully identifi es the outliers provided that ni/ r is suffi ciently larger than no. Theorem 2.Suppose the distribution of the inliers/outliers follows Assum

58、ption-1/Assumption-2. Assume thatQis equal to the column-space ofU q . Defi neq= (IUUT)q k(IUUT)qk2 , defi ne = max ?1/|dT i q| : di B? , defi nec i as the optimal point of (5) withd = di, and assume that nokUTqk2+ s norclogno/ M1 , A no|qTq| + no s c00 logno/ M1 , (6) then (3) holds and U is recovered exactly with probability at least 1 5. In contrast to (4), in (6)nois not scaled with1/M1. Theorem 2 shows that in contrast to the unstructured outliers, the number of the structured outliers should be suffi ciently smaller than the number of the inliers for the small values of. T

人人文库> 全部分类> 应用文书

温馨提示

1. 本站所有资源如无特殊说明，都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
2. 本站的文档不包含任何第三方提供的附件图纸等，如果需要附件，请联系上传者。文件的所有权益归上传用户所有。
3. 本站RAR压缩包中若带图纸，网页内容里面会有图纸预览，若没有图纸预览就没有图纸。
4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
5. 人人文库网仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对用户上传分享的文档内容本身不做任何修改或编辑，并不能对任何下载内容负责。
6. 下载文件中如有侵权或不适当内容，请与我们联系，我们立即纠正。
7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

iccv2019论文全集9568-outlier-detection-and-robust-pca-using-a-convex-measure-of-innovation

文档简介

温馨提示

最新文档

评论

iccv2019论文全集9568-outlier-detection-and-robust-pca-using-a-convex-measure-of-innovation

文档简介

温馨提示

最新文档

评论

相关文档