




版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
1、Xuhua XiaSlide 1Principal Components Analysis Objectives: Understand the principles of principal components analysis (PCA) Recognize conditions under which PCA may be useful Use R procedure PRINCOMP to perform a principal components analysis interpret PRINCOMP output.Xuhua XiaSlide 2Typical Form of
2、DataA data set in a 8x3 matrix. The rows could be species and columns sampling sites.10097999690908075607585956240287780789291807585100X =A matrix is often referred to as a nxp matrix (n for number of rows and p for number of columns). Our matrix has 8 rows and 3 columns, and is an 8x3 matrix. Xuhua
3、 XiaSlide 3What are Principal Components? Principal components are linear combinations of the observed variables. The coefficients of these principal components are chosen to meet three criteria What are the three criteria?Y = b1X1 + b2 X2 + bn XnXuhua XiaSlide 4What are Principal Components? The th
4、ree criteria: There are exactly p principal components (PCs), each being a linear combination of the observed variables; The PCs are mutually orthogonal (i.e., perpendicular and uncorrelated); The components are extracted in order of decreasing variance.Xuhua XiaSlide 5A Simple Data Set-2-1.5-1-0.50
5、0.511.52-1.5-1-0.500.511.5XY21)(),(1nyyxxyxCovniiiXYX11Y11XYX11.414Y1.414 2,22()()1()()X YX X Y YrX XY YCorrelation matrixCovariance matrixXuhua XiaSlide 6General Patterns The total variance is 3 (= 1 + 2) The two variables, X and Y, are perfectly correlated, with all points fall on the regression l
6、ine. The spatial relationship among the 5 points can therefore be represented by a single dimension. PCA is a dimension-reduction technique. What would happen if we apply PCA to the data?Xuhua XiaSlide 7Graphic PCA-2-1.5-1-0.500.511.52-1.5-1-0.500.511.5XYXuhua XiaSlide 8R Program# Pricipal Component
7、s Analysis# entering raw data and extracting PCs # from the correlation matrix x=c(-1.264911064,-0.632455532,0,0.632455532,1.264911064)y=c(-1.788854382,-0.894427191,0,0.894427191,1.788854382)mydata=cbind(x,y)fit - princomp(mydata, cor=TRUE)summary(fit) # print variance accounted for loadings(fit) #
8、pc loadings plot(fit,type=lines) # scree plot fit$scores # the principal componentsbiplot(fit)Xuhua XiaSlide 9Steps in a PCA Have at least two variables Generate a correlation or variance-covariance matrix Obtain eigenvalues and eigenvectors (This is called an eigenvalue problem, and will be illustr
9、ated with a simple numerical example) Generate principal component (PC) scores Plot the PC scores in the space with reduced dimensions All these can be automated by using R.Xuhua XiaSlide 10Covariance or Correlation Matrix?010203040AbundanceSp1Sp2Xuhua XiaSlide 11Covariance or Correlation Matrix?051
10、01520253035AbundanceSp2Sp3Xuhua XiaSlide 12Covariance or Correlation Matrix?05101520253035Sp1Sp2Sp3Xuhua XiaSlide 13The Eigenvalue Problem3, 00322212221212AThe covariance matrix.The Eigenvalue is the set of values that satisfy this condition.The resulting eigenvalues (There are n eigenvalues for n v
11、ariables). The sum of eigenvalues is equal to the sum of variances in the covariance matrix.Finding the eigenvalues and eigenvectors is called an eigenvalue problem (or a characteristic value problem).Xuhua XiaSlide 14Get the Eigenvectors3, 00322212221212A An eigenvector is a vector (x) that satisfi
12、es the following condition:A x = x In our case A is a variance-covariance matrix of the order of 2, and a vector x is a vector specified by x1 and x2.2022, 02002221, 012212121xxxxxxtoequivalentiswhichxxAxFor1222112121212322,3232221, 3xxxxxxxxtoequivalentiswhichxxxxAxForXuhua XiaSlide 15Get the Eigen
13、vectors We want to find an eigenvector of unit length, i.e., x12 + x22 = 1 We therefore have5774. 0,8165. 0212, 021121212xxxxxxxFor8165. 02,5774. 0121, 31212xxxxxForFrom Previous SlideThe first eigenvector is one associated with the largest eigenvalue.Solve x1Xuhua XiaSlide 16Get the PC Scores0 2.19
14、089 0 1.09545 0 0.00000 0 1.09545-0 2.19089-.577350- 0.8164970.816497 0.57735021.78885438 41.2649110610.89442719 20.632455530 010.89442719- 20.63245553-21.78885438- 41.26491106-First PC scoreSecond PC scoreOriginal data (x and y)EigenvectorsThe original data in a two dimensional space is reduced to
15、one dimension.Xuhua XiaSlide 17What Are Principal Components? Principal components are a new set of variables, which are linear combinations of the observed ones, with these properties: Because of the decreasing variance property, much of the variance (information in the original set of p variables)
16、 tends to be concentrated in the first few PCs. This implies that we can drop the last few PCs without losing much information. PCA is therefore considered as a dimension-reduction technique. Because PCs are orthogonal, they can be used instead of the original variables in situations where having or
17、thogonal variables is desirable (e.g., regression).Xuhua XiaSlide 18Index of hidden variablesSchoolMath EnglishPhysics ChemistryChinese160556564672706569717738075728582490858588885100959595936. The ranking of Asian universities by the Asian Week HKU is ranked second in financial resources, but seven
18、th in academic research How did HKU get ranked third? Is there a more objective way of ranking? An illustrative example:Xuhua XiaSlide 19School Math English16055270653807549085510095Mean 80.075.0Var25025050751005075100MathEnglishA Simple Data Set School 5 is clearly the best school School 1 is clear
19、ly the worst schoolXuhua XiaSlide 20Graphic PCA-1.7889 -0.8944 0 0.8944 1.7889Xuhua XiaSlide 21Crime Data in 50 StatesSTATE MURDER RAPE ROBBE ASSAU BURGLA LARCEN AUTOALABAMA 14.2 25.2 96.8 278.3 1135.5 1881.9 280.7ALASKA 10.8 51.6 96.8 284.0 1331.7 3369.8 753.3ARIZONA 9.5 34.2 138.2 312.3 2346.1 446
20、7.4 439.5ARKANSAS 8.8 27.6 83.2 203.4 972.6 1862.1 183.4CALIFORNIA 11.5 49.4 287.0 358.0 2139.4 3499.8 663.5COLORADO 6.3 42.0 170.7 292.9 1935.2 3903.2 477.1CONNECTICUT 4.2 16.8 129.5 131.8 1346.0 2620.7 593.2DELAWARE 6.0 24.9 157.0 194.2 1682.6 3678.4 467.0FLORIDA 10.2 39.6 187.9 449.1 1859.9 3840.
21、5 351.4GEORGIA 11.7 31.1 140.5 256.5 1351.1 2170.2 297.9HAWAII 7.2 25.5 128.0 64.1 1911.5 3920.4 489.4IDAHO 5.5 19.4 39.6 172.5 1050.8 2599.6 237.6ILLINOIS 9.9 21.8 211.3 209.0 1085.0 2828.5 528.6. . . . . . . . . . . . . . .PROC PRINCOMP OUT=CRIMCOMP;Slide 22DATA CRIME; TITLE CRIME RATES PER 100,00
22、0 POP BY STATE; INPUT STATENAME $1-15 MURDER RAPE ROBBERY ASSAULT BURGLARY LARCENY AUTO;CARDS;Alabama 14.2 25.2 96.8 278.3 1135.5 1881.9 280.7Alaska 10.8 51.6 96.8 284.0 1331.7 3369.8 753.3Arizona 9.5 34.2 138.2 312.3 2346.1 4467.4 439.5Arkansas 8.8 27.6 83.2 203.4 972.6 1862.1 183.4California 11.5
23、49.4 287.0 358.0 2139.4 3499.8 663.5Colorado 6.3 42.0 170.7 292.9 1935.2 3903.2 477.1Connecticut 4.2 16.8 129.5 131.8 1346.0 2620.7 593.2Delaware 6.0 24.9 157.0 194.2 1682.6 3678.4 467.0Florida 10.2 39.6 187.9 449.1 1859.9 3840.5 351.4Georgia 11.7 31.1 140.5 256.5 1351.1 2170.2 297.9Hawaii 7.2 25.5
24、128.0 64.1 1911.5 3920.4 489.4Idaho 5.5 19.4 39.6 172.5 1050.8 2599.6 237.6Illinois 9.9 21.8 211.3 209.0 1085.0 2828.5 528.6Indiana 7.4 26.5 123.2 153.5 1086.2 2498.7 377.4Iowa 2.3 10.6 41.2 89.8 812.5 2685.1 219.9Kansas 6.6 22.0 100.7 180.5 1270.4 2739.3 244.3Kentucky 10.1 19.1 81.1 123.3 872.2 166
25、2.1 245.4Louisiana 15.5 30.9 142.9 335.5 1165.5 2469.9 337.7Maine 2.4 13.5 38.7 170.0 1253.1 2350.7 246.9Maryland 8.0 34.8 292.1 358.9 1400.0 3177.7 428.5Massachusetts 3.1 20.8 169.1 231.6 1532.2 2311.3 1140.1Michigan 9.3 38.9 261.9 274.6 1522.7 3159.0 545.5Minnesota 2.7 19.5 85.9 85.8 1134.7 2559.3
26、 343.1Mississippi 14.3 19.6 65.7 189.1 915.6 1239.9 144.4Missouri 9.6 28.3 189.0 233.5 1318.3 2424.2 378.4Montana 5.4 16.7 39.2 156.8 804.9 2773.2 309.2Nebraska 3.9 18.1 64.7 112.7 760.0 2316.1 249.1Nevada 15.8 49.1 323.1 355.0 2453.1 4212.6 559.2New Hampshire 3.2 10.7 23.2 76.0 1041.7 2343.9 293.4N
27、ew Jersey 5.6 21.0 180.4 185.1 1435.8 2774.5 511.5New Mexico 8.8 39.1 109.6 343.4 1418.7 3008.6 259.5New York 10.7 29.4 472.6 319.1 1728.0 2782.0 745.8Slide 23North Carolina 10.6 17.0 61.3 318.3 1154.1 2037.8 192.1North Dakota 0.9 9.0 13.3 43.8 446.1 1843.0 144.7Ohio 7.8 27.3 190.5 181.1 1216.0 2696
28、.8 400.4Oklahoma 8.6 29.2 73.8 205.0 1288.2 2228.1 326.8Oregon 4.9 39.9 124.1 286.9 1636.4 3506.1 388.9Pennsylvania 5.6 19.0 130.3 128.0 877.5 1624.1 333.2Rhode Island 3.6 10.5 86.5 201.0 1489.5 2844.1 791.4South Carolina 11.9 33.0 105.9 485.3 1613.6 2342.4 245.1South Dakota 2.0 13.5 17.9 155.7 570.
29、5 1704.4 147.5Tennessee 10.1 29.7 145.8 203.9 1259.7 1776.5 314.0Texas 13.3 33.8 152.4 208.2 1603.1 2988.7 397.6Utah 3.5 20.3 68.8 147.3 1171.6 3004.6 334.5Vermont 1.4 15.9 30.8 101.2 1348.2 2201.0 265.2Virginia 9.0 23.3 92.1 165.7 986.2 2521.2 226.7Washington 4.3 39.6 106.2 224.8 1605.6 3386.9 360.
30、3West Virginia 6.0 13.2 42.2 90.9 597.4 1341.7 163.3Wisconsin 2.8 12.9 52.2 63.7 846.9 2614.2 220.7Wyoming 5.4 21.9 39.7 173.9 811.6 2772.2 282.0;PROC PRINCOMP out=crimcomp;run;PROC PRINT; ID STATENAME; VAR PRIN1 PRIN2 MURDER RAPE ROBBERY ASSAULT BURGLARY LARCENY AUTO;run;PROC GPLOT; PLOT PRIN2*PRIN
31、1=STATENAME; TITLE2 PLOT OF THE FIRST TWO PRINCIPAL COMPONENTS;run;PROC PRINCOMP data=CRIME COV OUT=crimcomp;run;PROC PRINT; ID STATENAME; VAR PRIN1 PRIN2 MURDER RAPE ROBBERY ASSAULT BURGLARY LARCENY AUTO;run;/* Add to have a map view*/proc sort data=crimcomp out=crimcomp; by STATENAME;run;proc sort
32、 data=maps.us2 out=mymap; by STATENAME;run;data both; merge mymap crimcomp; by STATENAME;run;proc gmap data=both; id _map_geometry_; choro PRIN1 PRIN2/levels=15; /* choro PRIN1/discrete; */run;Xuhua XiaSlide 24 MURDER RAPE ROBBERY ASSAULT BURGLARY LARCENY AUTOMURDER 1.0000 0.6012 0.4837 0.6486 0.385
33、8 0.1019 0.0688RAPE 0.6012 1.0000 0.5919 0.7403 0.7121 0.6140 0.3489ROBBERY 0.4837 0.5919 1.0000 0.5571 0.6372 0.4467 0.5907ASSAULT 0.6486 0.7403 0.5571 1.0000 0.6229 0.4044 0.2758BURGLARY 0.3858 0.7121 0.6372 0.6229 1.0000 0.7921 0.5580LARCENY 0.1019 0.6140 0.4467 0.4044 0.7921 1.0000 0.4442AUTO 0.
34、0688 0.3489 0.5907 0.2758 0.5580 0.4442 1.0000Correlation MatrixIf variables are not correlated, there would be no point in doing PCA.The correlation matrix is symmetric, so we only need to inspect either the upper or lower triangular matrix.Xuhua XiaSlide 25 Eigenvalue Difference Proportion Cumulat
35、ivePRIN1 4.11496 2.87624 0.587851 0.58785PRIN2 1.23872 0.51291 0.176960 0.76481PRIN3 0.72582 0.40938 0.103688 0.86850PRIN4 0.31643 0.05846 0.045205 0.91370PRIN5 0.25797 0.03593 0.036853 0.95056PRIN6 0.22204 0.09798 0.031720 0.98228PRIN7 0.12406 . 0.017722 1.00000Eigenvalues Xuhua XiaSlide 26Eigenvec
36、tors PRIN1 PRIN2 PRIN3 PRIN4 PRIN5 PRIN6 PRIN7MURDER 0.3002 -.6291 0.1782 -.2321 0.5381 0.2591 0.2675RAPE 0.4317 -.1694 -.2441 0.0622 0.1884 -.7732 -.2964ROBBERY 0.3968 0.0422 0.4958 -.5579 -.5199 -.1143 -.0039ASSAULT 0.3966 -.3435 -.0695 0.6298 -.5066 0.1723 0.1917BURGLARY 0.4401 0.2033 -.2098 -.0575 0.1010 0.5359 -.6481LARCENY 0.3573 0.4023 -.5392 -.2348 0.0300 0.0394 0.6016AUTO 0.2951 0.5024 0.5683 0.4192 0.3697 -.0572 0.1470 Do these eigenvectors mean anything? All crimes are positively
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 农资公司前台管理制度
- 小微金融公司管理制度
- 医院新型门诊管理制度
- 券商外包日常管理制度
- 学校项目落地管理制度
- 党员练功基地管理制度
- 国企合同审核管理制度
- 寝室卫生消毒管理制度
- 华为销售流程管理制度
- 小组妇女组长管理制度
- 人工智能赋能中学英语教学的创新路径探究
- x监理管理办法
- 2025湘美版(2024)小学美术一年级下册教学设计(附目录)
- 人教版(2024)小学数学一年级下册《欢乐购物街》教学设计及反思
- 统编版(2024)语文一年级下册第七单元综合素质测评A卷(含答案)
- 2025年生猪屠宰兽医卫生检疫人员考试题(附答案)
- 电子商务教师资格证提升策略试题及答案
- 杭州市萧山区部分校教科版六年级下册期末考试科学试卷(解析版)
- 土地测量服务投标方案(技术方案)
- 2025年医院会计笔试试题及答案
- 服务流程操作说明手册
评论
0/150
提交评论