数据分析期末题.doc_第1页
数据分析期末题.doc_第2页
数据分析期末题.doc_第3页
数据分析期末题.doc_第4页
数据分析期末题.doc_第5页
已阅读5页,还剩7页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

数据分析方法课程设计题目概述:3、调查美国50个州7种犯罪率,得结果列于表1,其中给出的是美国50个州每100 000个人中七种犯罪的比率数据。这七种犯罪是:murder(杀人罪),rape(强奸罪),robbery(抢劫罪),assault(斗殴罪),burglary(夜盗罪),larceny(偷盗罪),auto(汽车犯罪)。表1 美国50个州七种犯罪的比率数据state州Murder 杀人罪rape强奸罪robbery抢劫罪assault斗殴罪burglary夜盗罪larceny偷盗罪auto汽车犯罪ALABAMA14.225.296.8278.31135.51881.9280.7ALASKA10.851.696.8284.01331.73369.8753.3ARIZONA9.534.2138.2312.32346.14467.4439.5ARKANSAS8.827.683.2203.4972.61862.1183.4CALIFORNIA11.549.4287.0358.02139.43499.8663.5COLORADO6.342.0170.7292.91935.23903.2477.1CONNECTICUT4.216.8129.5131.81346.02620.7593.2DELAWARE6.024.9157.0194.21682.63678.4467.0FLORIDA10.239.6187.9449.11859.93840.5351.4GEORGIA11.731.1140.5256.51351.12170.2297.9HAWAII7.225.5128.064.11911.53920.4489.4IDAHO5.519.439.6172.51050.82599.6237.6ILLINOIS9.921.8211.3209.01085.02828.5528.6INDIANA7.426.5123.2153.51086.22498.7377.4IOWA2.310.641.289.8812.52685.1219.9KANSAS6.622.0100.7180.51270.42739.3244.3KENTUCKY10.119.181.1123.3872.21662.1245.4LOUISIANA15.530.9142.9335.51165.52469.9337.7MAINE2.413.538.7170.01253.12350.7246.9MARYLAND8.034.8292.1358.91400.03177.7428.5MASSACHUSETTS3.120.8169.1231.61532.22311.31140.1MICHIGAN9.338.9261.9274.61522.73159.0545.5MINNESOTA2.719.585.985.81134.72559.3343.1MISSISSIPPI14.319.665.7189.1915.61239.9144.4MISSOURI9.628.3189.0233.51318.32424.2378.4MONTANA5.416.739.2156.8804.92773.2309.2NEBRASKA3.918.164.7112.7760.02316.1249.1NEVADA15.849.1323.1355.02453.14212.6559.2NEW HAMPSHIRE3.210.723.276.01041.72343.9293.4NEW JERSEY5.621.0180.4185.11435.82774.5511.5NEW MEXICO8.839.1109.6343.41418.73008.6259.5NEW YORK10.729.4472.6319.11728.02782.0745.8NORTH CAROLINA10.617.061.3318.31154.12037.8192.1NORTH DAKOTA0.99.013.343.8446.11843.0144.7OHIO7.827.3190.5181.11216.02696.8400.4OKLAHOMA8.629.273.8205.01288.22228.1326.8OREGON4.939.9124.1286.91636.43506.1388.9PENNSYLVANIA5.619.0130.3128.0877.51624.1333.2RHODE ISLAND3.610.586.5201.01489.52844.1791.4SOUTH CAROLINA11.933.0105.9485.31613.62342.4245.1SOUTH DAKOTA2.013.517.9155.7570.51704.4147.5TENNESSEE10.129.7145.8203.91259.71776.5314.0TEXAS13.333.8152.4208.21603.12988.7397.6UTAH3.520.368.8147.31171.63004.6334.5VERMONT1.415.930.8101.21348.22201.0265.2VIRGINIA9.023.392.1165.7986.22521.2226.7WASHINGTON4.339.6106.2224.81605.63386.9360.3WEST VIRGINIA6.013.242.290.9597.41341.7163.3WISCONSIN2.812.952.263.7846.92614.2220.7WYOMING5.421.939.7173.9811.62772.2282.01) 基于变量(Murder,rape,robbery,assault,burglary,larceny,auto)的观测值,求样本协方差矩阵和样本相关系数矩阵;2) 分别从和。出发做主成分分析:(1) 求样本主成分的贡献率、累计贡献率和各个样本主成分;(2) 在两种情况下,你认为应该保留几个主成分,其意义如何解释?(提示:要求累计贡献率达到80%以上)就此题而言,你认为基于和的分析结果哪个更合理?(3) 按第一主成分得分将美国50个州排序,结果如何?(4) 作以第一主成分得分为横坐标,第二主成分得分为纵坐标的散点图。3) 对表1的美国50个州七种犯罪的比率数据,分别试用快速聚类和类平均距离谱系聚类法将美国50个州分4类,并对聚类结果进行分析和比较。从聚类结果看,你认为哪种分类方法好?问题一采用sas得到样本协方差矩阵S:MurderRaperobberyassaultburglarylarcenyautoMurder14.951925.0138165.2459251.4141645.1653286.080951.4603rape25.0138115.7696562.6393798.50733313.5864795.56726.0126robbery165.2459562.63937805.4694934.1612434728650.7710092.42assault251.4141798.50734934.16110050.6727006.229427.365348.142burglary645.16533313.5862434727006.2187017.9248665.346664.15larceny286.08094795.5628650.7729427.36248665.3526943.562356.95auto51.4603726.012610092.425348.14246664.1562356.9537401.4样本相关系数矩阵R:Pearson相关系数,N=50MurderRaperobberyassaultburglarylarcenyautoMurder10.601220.483710.648550.385820.101920.06881rape0.6012210.591880.740260.712130.613990.3489robbery0.483710.5918810.557080.637240.446740.59068assault0.648550.740260.5570810.622910.404360.27584burglary0.385820.712130.637240.6229110.792120.55795larceny0.101920.613990.446740.404360.7921210.44418auto0.068810.34890.590680.275840.557950.444181问题二1、从R进行主成分分析:(1)、求样本主成分的贡献率、累计贡献率和各个样本主成分。贡献率:Eigenvalues of the Correlation MatrixEigenvalueDifferenceProportionCumulative14.114962.8762380.58790.587921.2387220.5129050.1770.764830.7258170.4093850.10370.868540.3164320.0584580.04520.913750.2579740.0359350.03690.950660.2220390.0979830.03170.982370.1240560.01771(2)累计贡献率到达80%以上,需保留三个主成分,前三个成分的累计贡献率已达到86.9%。Prin1Prin2Prin3Prin4Prin5Prin6Prin7Murder0.300279-0.629170.178245-0.232110.5381230.2591170.267593rape0.431759-0.16944-0.24420.0622160.188471-0.77327-0.29649robbery0.3968750.0422470.495861-0.55799-0.51998-0.11439-0.0039assault0.396652-0.34353-0.069510.629804-0.506650.1723630.191745burglary0.4401570.203341-0.2099-0.057560.1010330.535987-0.64812larceny0.357360.402319-0.53923-0.234890.0300990.0394060.60169auto0.2951770.5024210.5683840.4192380.369753-0.05730.147046由此三个主成分:PRIN1=0.300279murder + 0.431759 rape +0.396875 robbery +0.396652assault + 0.440157 burglary +0.357360arceny +0.295177autoPRIN2=-0.629174muder-0.169435rape+0.042247robbery-0.343528asault+0.203341burglary+ 0.402319larceny+0.502421 autoPRIN3=0.178245muder-0.2442rape+0.495861robbery-0.06951asault- 0.2099burglary-0.5392larceny+0.568auto从S进行主成分分析:贡献率:Eigenvalues of the Correlation MatrixEigenvalueDifferenceProportionCumulative1672099.9608440.30.87360.8736263659.6739443.590.08270.9563324216.0817902.620.03150.987846313.4643295.8140.00820.99653017.652980.4680.00390.9999637.18331.510175.67301 特征向量:Prin1Prin2Prin3Prin4Prin5Prin6Prin7Murder0.0008640.007077-0.007380.0222360.0050320.1849110.982437rape0.0087730.011477-0.01040.051813-0.005990.981012-0.18595robbery0.0569930.1659210.1103010.4572110.864522-0.02224-0.01101assault0.0591960.174243-0.150510.849046-0.46866-0.0536-0.00917burglary0.4653460.774439-0.34551-0.25358-0.00125-0.0039-0.0021larceny0.872863-0.481780.0597030.0491050.001277-0.003610.002712auto0.1213840.3317520.917649-0.0136-0.181370.0052530.00464累计贡献率:第一个成分贡献率已达到87.36%。主成分表达式:PRIN1=0.000864muder+0.008773rape+0.056993robbery+0.059196asault+0.465346burglary+0.872863larceny+0.121384auto分析:由于第一主成分对所有变量都有近似相等的载荷,因此可认为第一主成分是对所有犯罪率的总度量。第二主成分在变量auto和larceny上有高的正载荷,而在变量murder和assault上有高的负载荷;在burglary上存在小的正载荷,而在rape上存在小的负载荷。可以认为,这个主成分是用于度量暴力犯罪在犯罪性质上占的比重。第三主成分很难给出明显的解释。在依PRIN1排序的结果表中,排在前面的PRIN1值较小的州犯罪率较低,即北达科他NORTH DAKOTA(PRIN1= -3.96408)州犯罪率最低,PRIN1值较大的州,犯罪率较高,即内华达NEVADA(PRIN1= 5.26699)州犯罪率最高。在依PRIN2排序的结果表35.4中,排在前面的PRIN2值较小州的暴力犯罪性质比重较大。由此得出采用基于R的主成分分析 更合理,不仅能从总体度量犯罪率还能得到不同犯罪率组合的比重,更能突显地区不同犯罪率的高低。(3) 按第一主成分将50个州排序:statePrin1Prin2Prin3Prin4Prin5Prin6Prin7NORTH DAKOTA-3.964077650.387671-0.08603-0.18059-0.38235-0.40470.096798SOUTH DAKOTA-3.17202684-0.25446-0.138420.487677-0.71435-0.32213-0.03645WEST VIRGINIA-3.14772196-0.814250.536901-0.166690.042973-0.16643-0.20532IOWA-2.581561930.824753-0.51548-0.30085-0.291870.0946440.442398WISCONSIN-2.502960980.780831-0.42647-0.5309-0.10821-0.057710.253503NEWHAMPSHIRE-2.465622860.82503-0.20949-0.088070.1907370.391119-0.0941NEBRASKA-2.150707450.225739-0.11052-0.1684-0.16361-0.421950.184246VERMONT-2.06432760.944967-0.510790.104676-0.128990.310747-0.91333MAINE-1.826310780.578785-0.532410.337634-0.477060.553981-0.39406KENTUCKY-1.72690862-1.146630.657767-0.383110.5586670.023296-0.10775PENNSYLVANIA-1.72006943-0.19591.009178-0.19285-0.21512-0.34819-0.38228MONTANA-1.668012930.270992-0.3680.1479620.092177-0.049330.769386MINNESOTA-1.554342371.056437-0.14623-0.31594-0.01757-0.22692-0.27826MISSISSIPPI-1.50735805-2.546710.703693-0.209810.7095550.46267-0.19614IDAHO-1.43245398-0.00801-0.634140.121909-0.014950.1063640.165056WYOMING-1.424634680.062683-0.577520.2227650.043421-0.377980.62723ARKANSAS-1.05441043-1.34544-0.018340.0215360.022688-0.38604-0.31067UTAH-1.049956980.936561-0.64009-0.03263-0.09174-0.030470.180693VIRGINIA-0.91620755-0.69265-0.20438-0.42960.226637-0.100140.30802NORTH CAROLINA-0.69925167-1.67027-0.090410.650179-0.297950.9541930.20695KANSAS-0.63406694-0.02804-0.49573-0.32463-0.146450.20564-0.02609CONNECTICUT-0.541329491.501230.7838850.086190.1848860.2814-0.08962INDIANA-0.49989552.6E-050.24333-0.267450.249195-0.42050.030475OKLAHOMA-0.32136304-0.62429-0.121340.2626410.433196-0.13097-0.4257RHODE ISLAND-0.201555152.1465760.95681.1303510.3161220.9998670.297514TENNESSEE-0.13659511-1.134980.652999-0.163980.182649-0.21754-0.6823ALABAMA-0.04988023-2.09610.5016450.2509850.4984860.4336180.118075NEW JERSEY0.2178738250.9642090.603874-0.19902-0.24450.242962-0.078OHIO0.2395303350.0905270.459642-0.57002-0.13419-0.326060.073085GEORGIA0.490407599-1.380790.244629-0.062480.2021010.025782-0.33221ILLINOIS0.5129024950.094231.121196-0.431250.0182620.0377820.825377MISSOURI0.556366491-0.558510.563358-0.30673-0.15163-0.06676-0.12561HAWAII0.8231313251.823918-0.78176-1.180251.0932920.544922-0.1003WASHINGTON0.9305796340.737764-1.303870.156067-0.08755-0.72808-0.46312DELAWARE0.9645806541.296743-0.52586-0.41732-0.018720.4035610.206165MASSACHUSETTS0.9784389792.6310542.542251.8131030.4407390.09248-0.20652LOUISIANA1.12020258-2.083270.3674580.2033960.3592190.2019580.644098NEW MEXICO1.214170935-0.95076-1.072760.53541-0.34152-0.41341-0.02135TEXAS1.396960942-0.68131-0.07992-0.603750.9295480.167733-0.01221OREGON1.449002120.586025-1.245050.418182-0.35117-0.58967-0.2375SOUTH CAROLINA1.60336057-2.16211-0.552611.387271-0.721920.691138-0.22241MARYLAND2.182798286-0.194740.381652-0.18256-1.35485-0.431690.360049MICHIGAN2.2733343990.1548740.535674-0.33238-0.24676-0.628670.066708ALASKA2.421514980.166523-0.069731.1604721.470051-1.497810.464808COLORADO2.5092925190.916596-1.151580.112603-0.16923-0.33103-0.24066ARIZONA3.0141382810.844945-1.75195-0.116210.2802111.070440.057515FLORIDA3.111753982-0.60392-1.215410.495075-0.819670.2895850.35866NEW YORK3.4524801810.4328932.736615-0.99366-1.268020.1262460.033301CALIFORNIA4.2838037330.1431870.2761550.0251220.057931-0.37708-0.46401NEVADA5.26698533-0.25262-0.30241-0.980080.3572160.339124-0.12412(4)作图问题三1、快速聚类由SAS处理得到:(1)、初始聚点Initial SeedsClusterMurderraperobberyassaultburglarylarcenyauto14.339.6106.2224.81605.63386.9360.323.120.8169.1231.61532.22311.31140.139.534.2138.2312.32346.14467.4439.54613.242.290.9597.41341.7163.3(2)、分类结果:ClusterObsDistance from Seed12359.3120170.512291.6614130329.5131278.4132473.5137430.7139360.4143186.1144363.7147327.827280.8210379.3212195.3213348.821465.936215387.5216284.6218154.6219222.5221807.4223122.9225200.4226408.6227420.1229220.5235212.9236300240546.4245364246216.6249346.3250406.433561.735473.436132.438441.539283.1311251.1328453.641325.144203.841756.4826424439.2433442434450.6438130.7441304442402.7448407.74个类的距离矩阵如下:Cluster12341.659.7157969.55381502.3832659.7157.1623.102852.59163969.55381623.102.2452.15741502.383852.59162452.157.从聚类结果看出第一类是以盗窃罪为主;第二类是以强奸、抢劫、斗殴罪为主,且所属城市较多,说明该类犯罪是较普片的;第三类是以汽车犯罪为主;第四类是以杀人、斗殴罪为主。由距离矩阵可以看出各类之间的差异较小,说明分类比较合理。2、类平均谱系聚类法聚类结果:类的数目频数类平均距离49MONTANAWYOMING20.026848IOWAWISCONSIN20.067547KENTUCKYPENNSYLVANIA20.08746INDIANAMINNESOTA20.092845IDAHOVIRGINIA20.092944GEORGIAOKLAHOMA20.

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论