一组空气污染大数据地主成分分析报告_第1页
一组空气污染大数据地主成分分析报告_第2页
一组空气污染大数据地主成分分析报告_第3页
一组空气污染大数据地主成分分析报告_第4页
一组空气污染大数据地主成分分析报告_第5页
已阅读5页,还剩9页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

1、一组空气污染数据的主成分分析【说明】下面的多元统计分析练习题摘自R.A. Joh nson等编写的应用多元统计分析(第五版),原书为:Richard A. Johnson and Dean W. Wichern. Applied MultivariateStatistical Analysis (5 th Ed). Pearson Education, Inc. 2003。我看的是中国统计出版社(China Statistics Press )2003 年发行的影印本。第一题为原书第1.6题,即第1章的第6题,第二题为原书第8.12题,即第8章的第12题。第二题用的是第一题的数据。1习题var

2、iables recordedR.1.6. The data in Table 1.5 are 42 measurements on air-pollution at 12:00 noon in the Los An geles area on differe nt days.(a) Plot the marginal dot diagrams for all the variables.(b) Con struct the x , S, and R arrays, and in terpret the en tries inTABLE 1.5 AIR-POLLUTION DATAWind (

3、 xjSolar radiati on (X2)CO (X3)NO (X4)NO (xs)O ( X6)HC (X7)8987212827107439537103435631088528154691428103890521212498474121555726421144782511111386452139467154103369142127377274181031070421173107241810397741910387641773871531644967421323969339531062531444988427638804213114530335236835110234884327636

4、78421111387921710366243983103731723871411073752411284548658436754110243103541692885419102586316122586721318277974925377952862668621114384043652Source:Data courtesy of Professor G.C. Tiao.8.12.Con siderthe air-pollutiondata listedin Table1.5. Your job isto summarizethesedata in fewer than p=7 dimensi

5、ons ifpossible.Con duct a prin cipalcomp onentan alysis of thedata using both thecovaria nee matrixS and the correlati on matrixR. What have you learned? Does it make any differenee which matrix is chosen foranalysis? Can the data be summarized in three or fewer dimensions? Can you interpret the pri

6、n cipal comp onen ts?2部分解答2.1部分统计参数利用Excel计算的平均值(x )和标准差SolarWind radiationCONONQOHCAverage7.573.8571434.547619 2.1904762 10.047619 9.4047619 3.0952381Stdev 1.581138817.335388 1.2337209 1.0873574 3.3709837 5.5658345 0.6917466Excel给出的协方差矩阵 SSolarWindradiati onCONONOQHCWind2.4404762Solarradiati on-2.7

7、14286 293.36054CO-0.369048 3.81632651.4858277NO-0.452381 -1.3537410.65759641.154195NO-0.571429 6.60204082.25963721.062358311.092971O-2.178571 30.0578232.7545351-0.7913833.052154230.24093HC0.1666667 0.60884350.1383220.17233561.01927440.58049890.4671202Excel给出相关系数矩阵 RSolarNONOQHCWindradiatio nCOWind1S

8、olarradiati on-0.1014421CO-0.1938030.18279341NO-0.269543-0.0735690.50215251NO-0.1098250.1157320.55658380.29689811Q-0.2535930.3191237 0.4109288-0.133952 0.16664221HC0.15609790.05201040.1660323 0.2347043 0.44776780.15445061从相关系数矩阵可以看出,CO与NO NO相关性明显,Q与Solar radiation 、CO相关性明显。后面的主成分分析将CO与NO NO归并到一个主成分,

9、将O与Solar radiation归并到一个主成分,将HG Wind归并到一个主成分。HC与Wind的相关系数并不高,但从正相关的角度看,二者的数值倒是最高的。方差极大正交旋转之后,HC与CO NO NO归并到一个因子,因为 HC与NO的相关系数较高,与 CO NO的相关系数高于其他变量。2.2主成分分析之一数据未经标准化下面是从相关矩阵 R出发,SPSS给出的结果。原始数据未经标准化。所谓从 R出发, 就是在 SPSS的 Factor Analysis: Extraction Analysis 选项中选中 Correlation Matrix。SPSS给出的相关系数矩阵(Correlati

10、on Matrix ),与Excel计算的结果一样。Correlation MatrixWINDSolar radiati onCONONO2O3HCWIND1.000-.101-.194-.270-.110-.254.156Solar radiatior-.1011.000.183-.074.116.319.052CO-.194.1831.000.502.557.411.166NO-.270-.074.5021.000.297-.134.235NO2-.110.116.557.2971.000.167.448O3-.254.319.411-.134.1671.000.154HC.156.05

11、2.166.235.448.1541.000公因子方差(Communalities )表如下。公因子方差变化于0.5440.795之间,相差不是很大。但是,公因子方差值没有达到0.8以上的,可见每一个变量体现在三个主成分中的信息都不超过80%Com munalitiesInitialEx tractionWIND1.000.737Solar radiation1.000.544CO1.000.725NO1.000.795NO21.000.681O31.000.722HC1.000.722Extraction Method: Principal Component Analysis.特征根与方差

12、贡献(Total Varianee Explained )如下表。可见提取三个主成分可以解 释原来7格变量的70.384%。Total Varia nee Expla inedComp onentIn itial Eige nvaluesExtraeti on Sums of Squared Load ingsTotal% of Varia neeCumulative %Total% of Varia neeCumulative %12.33733.38333.3832.33733.38333.38321.38619.80053.1831.38619.80053.18331.20417.201

13、70.3841.20417.20170.3844.72710.38780.7715.6539.33590.1066.5377.66797.7737.1562.227100.000Extraetio n Method: Prin eipal Comp onent An alysis.Scree Plot2.5Component Num ber主成分载荷矩阵(Component Matrix )见下表。Com ponent MatrixaComp onent123WIND-.362.328.706Solar radiati on.314-.620.246CO.842-8.03E-03-.125NO

14、.577.512-.447NO2.761.235.21603.496-.667.175HC.488.362.594Extractio n Method: Pr in cipal Comp onent An alysis.a. 3 comp onents extracted.将上表从SPSS中复制到Excel中,进行涂色分类,结果如下表所示。Comp onent123WIND-0.362020.3278090.706084Solar radiati on0.31424-0.619970.24631CO0.842417-0.00803-0.12466NO0.5772430.511736-0.446

15、71NO0.7612940.2351830.215682O30.496126-0.667490.175399HC0.4882570.3624660.593692主成分分类如下:CO NO NO。Solar radiati on、O。第一主成分的主要相关变量 第二主成分的主要相关变量第三主成分的主要相关变量:Wind、HC在主成分载荷图(Component Plot )中,三个变量分别落入三个不同的主成分代表的区 域。Component Plotmponent2主成分得分表如下。 最后一栏对几个典型的样本给出了简单的解释。注意解释的时候看清主成分载荷矩阵中载荷值的正负号。Cases:12f3典型

16、的说明S1).61591-0.8186-0.38418S2).03194-0.36015-0.26343S30.34752-0.54481-0.49701S4).2425-0.302931.80367样本4代表的区域 Wind、HC污染严重S50.12729-0.91941-0.4042S6).72612-0.192781.21954S72.036860.899821.4607样本7和8代表的区域与 CO NO NQ污染有明显S82.573090.77732-0.34124的关系S90.09802-0.817360.30334S100.506640.788030.88735S110.39040

17、.97744-1.48345S120.14485-0.45848-0.27016S131.924770.88883-0.66029S140.506620.631390.91242S150.89378-0.170361.19632S160.66037-0.398620.93758S170.87787-0.36350.3701S180.887331.53060.65731S190.429351.092530.48155S200.7510.924240.11384S210.428261.961331.18659样本21代表的区域Solar radiation、Q污染较小S220.69373-0.09

18、7470.51522S230.414840.206811.21242S241.162631.39047-2.12097S250.86691-1.703350.91799S260.91899-0.139150.18106S270.09994-0.51948-0.37202S281.32458-0.69110.65186S290.104720.39184-1.08681S301.85931.379330.6047S310.62672-0.083470.47051S320.142640.649410.72066S330.674211.56899-2.63096样本33代表的区域Wind、HC污染较小

19、S340.24874-1.956810.22088S351.714290.39216-0.08554S360.80238-1.13269-0.0517S371.00653-1.92662-1.17569样本37和38代表的区域 Solar radiation、Q污染严重S381.29486-1.77265-1.32357S391.68145-1.04272-0.66334S400.48079-0.49683-1.07633S410.72122-0.53042-0.57934S421.177760.98919-1.555382.3主成分分析之二一一数据未经标准化下面是从协方差矩阵 S出发,SPS

20、S合出的结果。原始数据未经标准化。所谓从 S出发, 就是在 SPSS的 Factor Analysis: Extraction Analysis 选项中选中 Covarianee Matrix 。公因子方差(Communalities )表如下。在未经处理的(RaW公因子方差一栏,其Initial 数值都是原始数据的方差。不过与前面Excel给出的协方差矩阵有所不同,Excel给出的是总体方差,SPSS给出的是抽样方差。例如以 Wind的Initial值为例,2.4404762 X 42/41=2.5 ,或者2.5 X 41/42=2.4404762 (对照前面的协方差矩阵)。重标的(Resc

21、aled )结果是 Extraction 值与Initial值之比。Commu nalitiesRawRescaledIn itialExtractio nIn itialExtractio nWIND2.5003.067E-021.0001.227E-02Solar radiati on300.516300.1341.000.999CO1.5226.017E-021.0003.953E-02NO1.1826.750E-031.0005.709E-03NO211.364.1791.0001.575E-02O330.9793.8461.000.124HC.4791.667E-031.0003.4

22、84E-03Extractio n Method: Prin cipal Comp onent An alysis.公因子方差的合计结果如下:RawRescaledIn itialExtractio n In itial Extractio nWIND2.50.030665110.012266Solarradiatio n300.51568 300.1336710.9987288CO1.5220674 0.060166610.0395295NO1.1823461 0.006750210.0057091NO211.363531 0.179005910.0157527O330.978513 3.8

23、45942810.1241487HC0.4785134 0.001667110.0034839合计348.54065 304.2578671.1996188特征根与方差贡献(Total Variance Explained )如下表。在 Raw一栏中显示,提取一 个主成分似乎可以解释原来7格变量的87.295%。但重标之后显示的数值却是17.137%。根据公因子方差表和合计结果,重标之前,全部的方差解释为304.25786/348.54065*100=87.295%;重标之后,全部的方差解释为1.1996188/7*100 = 17.137%。Total Varia nee Expla ine

24、dComp onentIn itial Eige nvaldbsExtracti on Sums of Squared Loadi ngTotal% of Varia nceCumulative %Total% of Varia nceCumulative %Raw1304.25887.29587.295304.25887.29587.295228.2768.11395.408311.4643.28998.69742.524.72499.42151.280.36799.7886.529.15299.9407.2106.014E-02100.000Rescaled1304.25887.29587

25、.2951.20017.13717.137228.2768.11395.408311.4643.28998.69742.524.72499.42151.280.36799.7886.529.15299.9407.2106.014E-02100.000Extractio n Method: Prin eipal Comp onent An alysis.a. When analyzing a covariance matrix, the initial eigenvalues are the same across the raw and rescaled solution.Scree Plot

26、Component Num ber主成分载荷矩阵(Component Matrix )见下表。可以看来,由于变量Solar radiation的方差很大,它绝对地控制了第一主成分。RawRescaledComp one ntComp one nt11WIND-.175-.111Solar radiati on17.324.999CO.245.199NO-.082-.076NO2.423.126031.961.352HC.041.059Extractio n Method: Prin cipal Comp onent An alysis.a. 1 comp onents extracted.2.

27、4主成分分析之三数据经过标准化下面是从协方差矩阵 S出发,SPSS给出的结果。原始数据经过标准化。可以看到所有 的结果重标前后一样,并且与从相关矩阵R出发计算的结果一样。公因子方差(Communalities )表如下,重标前后的结果一样。Commu nalitiesRawRescaledIn itialExtractio nIn itialExtractio nWIND1.000.7371.000.737Solar radiati on1.000.5441.000.544CO1.000.7251.000.725NO1.000.7951.000.795NO21.000.6811.000.681

28、O31.000.7221.000.722HC1.000.7221.000.722Extractio n Method: Prin cipal Comp onent An alysis.特征根与方差贡献(Total Variance Explai ned )如下表。重标前后结果一样。Total Varia nee Expla inedComp onentIn itial Eige nvalifesExtracti on Sums of Squared Loadi ngTotal% of Varia nceCumulative %Total% of Varia nceCumulative %Raw

29、12.33733.38333.3832.33733.38333.38321.38619.80053.1831.38619.80053.18331.20417.20170.3841.20417.20170.3844.72710.38780.7715.6539.33590.1066.5377.66797.7737.1562.227100.000Rescaled 12.33733.38333.3832.33733.38333.38321.38619.80053.1831.38619.80053.18331.20417.20170.3841.20417.20170.3844.72710.38780.7

30、715.6539.33590.1066.5377.66797.7737.1562.227100.000Extractio n Method: Prin eipal Comp onent An alysis.a. When analyzing a covariance matrix, the initial eigenvalues are the same across the raw and rescaled solution.Scree PlotepaxedComponent Num ber主成分载荷矩阵(Component Matrix )见下表,重标前后一样。可以看到,第一主成分 的相对

31、重要性受到标准化的极大影响。结论自然是:如果在极其不同的范围内测量变量,或者测量单位的量纲不同,变量必须经过标准化。 否则,应该从相关系数矩阵出发开展主成分分析。Comp onent MatrixRawRescaledComp onentComp onent123123WIND-.362.328.706-.362.328.706Solar radiati on.314-.620.246.314-.620.246CO.842-.008-.125.842-.008-.125NO.577.512-.447.577.512-.447NO2.761.235.216.761.235.21603.496-.667.175.496-.667.175HC.488.362.594.488.362.594Extractio n Method: Prin cipal Comp onen

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论