R软件做PCA(课堂PPT)_第1页
R软件做PCA(课堂PPT)_第2页
R软件做PCA(课堂PPT)_第3页
R软件做PCA(课堂PPT)_第4页
R软件做PCA(课堂PPT)_第5页
已阅读5页,还剩24页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

1、Xuhua XiaSlide 1Principal Components Analysis Objectives: Understand the principles of principal components analysis (PCA) Recognize conditions under which PCA may be useful Use R procedure PRINCOMP to perform a principal components analysis interpret PRINCOMP output.Xuhua XiaSlide 2Typical Form of

2、DataA data set in a 8x3 matrix. The rows could be species and columns sampling sites.10097999690908075607585956240287780789291807585100X =A matrix is often referred to as a nxp matrix (n for number of rows and p for number of columns). Our matrix has 8 rows and 3 columns, and is an 8x3 matrix. Xuhua

3、 XiaSlide 3What are Principal Components? Principal components are linear combinations of the observed variables. The coefficients of these principal components are chosen to meet three criteria What are the three criteria?Y = b1X1 + b2 X2 + bn XnXuhua XiaSlide 4What are Principal Components? The th

4、ree criteria: There are exactly p principal components (PCs), each being a linear combination of the observed variables; The PCs are mutually orthogonal (i.e., perpendicular and uncorrelated); The components are extracted in order of decreasing variance.Xuhua XiaSlide 5A Simple Data Set-2-1.5-1-0.50

5、0.511.52-1.5-1-0.500.511.5XY21)(),(1nyyxxyxCovniiiXYX11Y11XYX11.414Y1.414 2,22()()1()()X YX X Y YrX XY YCorrelation matrixCovariance matrixXuhua XiaSlide 6General Patterns The total variance is 3 (= 1 + 2) The two variables, X and Y, are perfectly correlated, with all points fall on the regression l

6、ine. The spatial relationship among the 5 points can therefore be represented by a single dimension. PCA is a dimension-reduction technique. What would happen if we apply PCA to the data?Xuhua XiaSlide 7Graphic PCA-2-1.5-1-0.500.511.52-1.5-1-0.500.511.5XYXuhua XiaSlide 8R Program# Pricipal Component

7、s Analysis# entering raw data and extracting PCs # from the correlation matrix x=c(-1.264911064,-0.632455532,0,0.632455532,1.264911064)y=c(-1.788854382,-0.894427191,0,0.894427191,1.788854382)mydata=cbind(x,y)fit - princomp(mydata, cor=TRUE)summary(fit) # print variance accounted for loadings(fit) #

8、pc loadings plot(fit,type=lines) # scree plot fit$scores # the principal componentsbiplot(fit)Xuhua XiaSlide 9Steps in a PCA Have at least two variables Generate a correlation or variance-covariance matrix Obtain eigenvalues and eigenvectors (This is called an eigenvalue problem, and will be illustr

9、ated with a simple numerical example) Generate principal component (PC) scores Plot the PC scores in the space with reduced dimensions All these can be automated by using R.Xuhua XiaSlide 10Covariance or Correlation Matrix?010203040AbundanceSp1Sp2Xuhua XiaSlide 11Covariance or Correlation Matrix?051

10、01520253035AbundanceSp2Sp3Xuhua XiaSlide 12Covariance or Correlation Matrix?05101520253035Sp1Sp2Sp3Xuhua XiaSlide 13The Eigenvalue Problem3, 00322212221212AThe covariance matrix.The Eigenvalue is the set of values that satisfy this condition.The resulting eigenvalues (There are n eigenvalues for n v

11、ariables). The sum of eigenvalues is equal to the sum of variances in the covariance matrix.Finding the eigenvalues and eigenvectors is called an eigenvalue problem (or a characteristic value problem).Xuhua XiaSlide 14Get the Eigenvectors3, 00322212221212A An eigenvector is a vector (x) that satisfi

12、es the following condition:A x = x In our case A is a variance-covariance matrix of the order of 2, and a vector x is a vector specified by x1 and x2.2022, 02002221, 012212121xxxxxxtoequivalentiswhichxxAxFor1222112121212322,3232221, 3xxxxxxxxtoequivalentiswhichxxxxAxForXuhua XiaSlide 15Get the Eigen

13、vectors We want to find an eigenvector of unit length, i.e., x12 + x22 = 1 We therefore have5774. 0,8165. 0212, 021121212xxxxxxxFor8165. 02,5774. 0121, 31212xxxxxForFrom Previous SlideThe first eigenvector is one associated with the largest eigenvalue.Solve x1Xuhua XiaSlide 16Get the PC Scores0 2.19

14、089 0 1.09545 0 0.00000 0 1.09545-0 2.19089-.577350- 0.8164970.816497 0.57735021.78885438 41.2649110610.89442719 20.632455530 010.89442719- 20.63245553-21.78885438- 41.26491106-First PC scoreSecond PC scoreOriginal data (x and y)EigenvectorsThe original data in a two dimensional space is reduced to

15、one dimension.Xuhua XiaSlide 17What Are Principal Components? Principal components are a new set of variables, which are linear combinations of the observed ones, with these properties: Because of the decreasing variance property, much of the variance (information in the original set of p variables)

16、 tends to be concentrated in the first few PCs. This implies that we can drop the last few PCs without losing much information. PCA is therefore considered as a dimension-reduction technique. Because PCs are orthogonal, they can be used instead of the original variables in situations where having or

17、thogonal variables is desirable (e.g., regression).Xuhua XiaSlide 18Index of hidden variablesSchoolMath EnglishPhysics ChemistryChinese160556564672706569717738075728582490858588885100959595936. The ranking of Asian universities by the Asian Week HKU is ranked second in financial resources, but seven

18、th in academic research How did HKU get ranked third? Is there a more objective way of ranking? An illustrative example:Xuhua XiaSlide 19School Math English16055270653807549085510095Mean 80.075.0Var25025050751005075100MathEnglishA Simple Data Set School 5 is clearly the best school School 1 is clear

19、ly the worst schoolXuhua XiaSlide 20Graphic PCA-1.7889 -0.8944 0 0.8944 1.7889Xuhua XiaSlide 21Crime Data in 50 StatesSTATE MURDER RAPE ROBBE ASSAU BURGLA LARCEN AUTOALABAMA 14.2 25.2 96.8 278.3 1135.5 1881.9 280.7ALASKA 10.8 51.6 96.8 284.0 1331.7 3369.8 753.3ARIZONA 9.5 34.2 138.2 312.3 2346.1 446

20、7.4 439.5ARKANSAS 8.8 27.6 83.2 203.4 972.6 1862.1 183.4CALIFORNIA 11.5 49.4 287.0 358.0 2139.4 3499.8 663.5COLORADO 6.3 42.0 170.7 292.9 1935.2 3903.2 477.1CONNECTICUT 4.2 16.8 129.5 131.8 1346.0 2620.7 593.2DELAWARE 6.0 24.9 157.0 194.2 1682.6 3678.4 467.0FLORIDA 10.2 39.6 187.9 449.1 1859.9 3840.

21、5 351.4GEORGIA 11.7 31.1 140.5 256.5 1351.1 2170.2 297.9HAWAII 7.2 25.5 128.0 64.1 1911.5 3920.4 489.4IDAHO 5.5 19.4 39.6 172.5 1050.8 2599.6 237.6ILLINOIS 9.9 21.8 211.3 209.0 1085.0 2828.5 528.6. . . . . . . . . . . . . . .PROC PRINCOMP OUT=CRIMCOMP;Slide 22DATA CRIME; TITLE CRIME RATES PER 100,00

22、0 POP BY STATE; INPUT STATENAME $1-15 MURDER RAPE ROBBERY ASSAULT BURGLARY LARCENY AUTO;CARDS;Alabama 14.2 25.2 96.8 278.3 1135.5 1881.9 280.7Alaska 10.8 51.6 96.8 284.0 1331.7 3369.8 753.3Arizona 9.5 34.2 138.2 312.3 2346.1 4467.4 439.5Arkansas 8.8 27.6 83.2 203.4 972.6 1862.1 183.4California 11.5

23、49.4 287.0 358.0 2139.4 3499.8 663.5Colorado 6.3 42.0 170.7 292.9 1935.2 3903.2 477.1Connecticut 4.2 16.8 129.5 131.8 1346.0 2620.7 593.2Delaware 6.0 24.9 157.0 194.2 1682.6 3678.4 467.0Florida 10.2 39.6 187.9 449.1 1859.9 3840.5 351.4Georgia 11.7 31.1 140.5 256.5 1351.1 2170.2 297.9Hawaii 7.2 25.5

24、128.0 64.1 1911.5 3920.4 489.4Idaho 5.5 19.4 39.6 172.5 1050.8 2599.6 237.6Illinois 9.9 21.8 211.3 209.0 1085.0 2828.5 528.6Indiana 7.4 26.5 123.2 153.5 1086.2 2498.7 377.4Iowa 2.3 10.6 41.2 89.8 812.5 2685.1 219.9Kansas 6.6 22.0 100.7 180.5 1270.4 2739.3 244.3Kentucky 10.1 19.1 81.1 123.3 872.2 166

25、2.1 245.4Louisiana 15.5 30.9 142.9 335.5 1165.5 2469.9 337.7Maine 2.4 13.5 38.7 170.0 1253.1 2350.7 246.9Maryland 8.0 34.8 292.1 358.9 1400.0 3177.7 428.5Massachusetts 3.1 20.8 169.1 231.6 1532.2 2311.3 1140.1Michigan 9.3 38.9 261.9 274.6 1522.7 3159.0 545.5Minnesota 2.7 19.5 85.9 85.8 1134.7 2559.3

26、 343.1Mississippi 14.3 19.6 65.7 189.1 915.6 1239.9 144.4Missouri 9.6 28.3 189.0 233.5 1318.3 2424.2 378.4Montana 5.4 16.7 39.2 156.8 804.9 2773.2 309.2Nebraska 3.9 18.1 64.7 112.7 760.0 2316.1 249.1Nevada 15.8 49.1 323.1 355.0 2453.1 4212.6 559.2New Hampshire 3.2 10.7 23.2 76.0 1041.7 2343.9 293.4N

27、ew Jersey 5.6 21.0 180.4 185.1 1435.8 2774.5 511.5New Mexico 8.8 39.1 109.6 343.4 1418.7 3008.6 259.5New York 10.7 29.4 472.6 319.1 1728.0 2782.0 745.8Slide 23North Carolina 10.6 17.0 61.3 318.3 1154.1 2037.8 192.1North Dakota 0.9 9.0 13.3 43.8 446.1 1843.0 144.7Ohio 7.8 27.3 190.5 181.1 1216.0 2696

28、.8 400.4Oklahoma 8.6 29.2 73.8 205.0 1288.2 2228.1 326.8Oregon 4.9 39.9 124.1 286.9 1636.4 3506.1 388.9Pennsylvania 5.6 19.0 130.3 128.0 877.5 1624.1 333.2Rhode Island 3.6 10.5 86.5 201.0 1489.5 2844.1 791.4South Carolina 11.9 33.0 105.9 485.3 1613.6 2342.4 245.1South Dakota 2.0 13.5 17.9 155.7 570.

29、5 1704.4 147.5Tennessee 10.1 29.7 145.8 203.9 1259.7 1776.5 314.0Texas 13.3 33.8 152.4 208.2 1603.1 2988.7 397.6Utah 3.5 20.3 68.8 147.3 1171.6 3004.6 334.5Vermont 1.4 15.9 30.8 101.2 1348.2 2201.0 265.2Virginia 9.0 23.3 92.1 165.7 986.2 2521.2 226.7Washington 4.3 39.6 106.2 224.8 1605.6 3386.9 360.

30、3West Virginia 6.0 13.2 42.2 90.9 597.4 1341.7 163.3Wisconsin 2.8 12.9 52.2 63.7 846.9 2614.2 220.7Wyoming 5.4 21.9 39.7 173.9 811.6 2772.2 282.0;PROC PRINCOMP out=crimcomp;run;PROC PRINT; ID STATENAME; VAR PRIN1 PRIN2 MURDER RAPE ROBBERY ASSAULT BURGLARY LARCENY AUTO;run;PROC GPLOT; PLOT PRIN2*PRIN

31、1=STATENAME; TITLE2 PLOT OF THE FIRST TWO PRINCIPAL COMPONENTS;run;PROC PRINCOMP data=CRIME COV OUT=crimcomp;run;PROC PRINT; ID STATENAME; VAR PRIN1 PRIN2 MURDER RAPE ROBBERY ASSAULT BURGLARY LARCENY AUTO;run;/* Add to have a map view*/proc sort data=crimcomp out=crimcomp; by STATENAME;run;proc sort

32、 data=maps.us2 out=mymap; by STATENAME;run;data both; merge mymap crimcomp; by STATENAME;run;proc gmap data=both; id _map_geometry_; choro PRIN1 PRIN2/levels=15; /* choro PRIN1/discrete; */run;Xuhua XiaSlide 24 MURDER RAPE ROBBERY ASSAULT BURGLARY LARCENY AUTOMURDER 1.0000 0.6012 0.4837 0.6486 0.385

33、8 0.1019 0.0688RAPE 0.6012 1.0000 0.5919 0.7403 0.7121 0.6140 0.3489ROBBERY 0.4837 0.5919 1.0000 0.5571 0.6372 0.4467 0.5907ASSAULT 0.6486 0.7403 0.5571 1.0000 0.6229 0.4044 0.2758BURGLARY 0.3858 0.7121 0.6372 0.6229 1.0000 0.7921 0.5580LARCENY 0.1019 0.6140 0.4467 0.4044 0.7921 1.0000 0.4442AUTO 0.

34、0688 0.3489 0.5907 0.2758 0.5580 0.4442 1.0000Correlation MatrixIf variables are not correlated, there would be no point in doing PCA.The correlation matrix is symmetric, so we only need to inspect either the upper or lower triangular matrix.Xuhua XiaSlide 25 Eigenvalue Difference Proportion Cumulat

35、ivePRIN1 4.11496 2.87624 0.587851 0.58785PRIN2 1.23872 0.51291 0.176960 0.76481PRIN3 0.72582 0.40938 0.103688 0.86850PRIN4 0.31643 0.05846 0.045205 0.91370PRIN5 0.25797 0.03593 0.036853 0.95056PRIN6 0.22204 0.09798 0.031720 0.98228PRIN7 0.12406 . 0.017722 1.00000Eigenvalues Xuhua XiaSlide 26Eigenvec

36、tors PRIN1 PRIN2 PRIN3 PRIN4 PRIN5 PRIN6 PRIN7MURDER 0.3002 -.6291 0.1782 -.2321 0.5381 0.2591 0.2675RAPE 0.4317 -.1694 -.2441 0.0622 0.1884 -.7732 -.2964ROBBERY 0.3968 0.0422 0.4958 -.5579 -.5199 -.1143 -.0039ASSAULT 0.3966 -.3435 -.0695 0.6298 -.5066 0.1723 0.1917BURGLARY 0.4401 0.2033 -.2098 -.0575 0.1010 0.5359 -.6481LARCENY 0.3573 0.4023 -.5392 -.2348 0.0300 0.0394 0.6016AUTO 0.2951 0.5024 0.5683 0.4192 0.3697 -.0572 0.1470 Do these eigenvectors mean anything? All crimes are positively

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论