




版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
1、1chapter 2 the description of quantitative data定量资料的统计描述定量资料的统计描述刘刘 沛沛流行病与卫生统计学系流行病与卫生统计学系2outlineo频数与频数分布 frequency & distributiono集中趋势和离散程度 central tendency & dispersiono统计图表 table & graph3data presentationousing graphsnfrequency polygonsnhistogramousing statisticsnmeasures of location
2、 oarithmetic meano medianomodeo geometric mean nmeasures of dispersion orangeo interquartile rangeostandard deviation, varianceocoefficient of variation 4part idescription of quantitative data by frequency table & graph5raw data120 values of height (cm) for 12-year-old boys in 1997: 142.3 156.6
3、142.7 145.7 138.2 141.6 142.5 130.5 134.5 148.8142.3 156.6 142.7 145.7 138.2 141.6 142.5 130.5 134.5 148.8134.4 148.8 137.9 151.3 140.8 149.8 145.2 141.8 146.8 135.1134.4 148.8 137.9 151.3 140.8 149.8 145.2 141.8 146.8 135.1150.3 133.1 142.7 143.9 151.1 144.0 145.4 146.2 143.3 156.3150.3 133.1 142.7
4、 143.9 151.1 144.0 145.4 146.2 143.3 156.3141.9 140.7 141.2 141.5 148.8 140.1 150.6 139.5 146.4 143.8141.9 140.7 141.2 141.5 148.8 140.1 150.6 139.5 146.4 143.8143.5 139.2 144.7 139.3 141.9 147.8 140.5 138.9 134.7 147.3143.5 139.2 144.7 139.3 141.9 147.8 140.5 138.9 134.7 147.3138.1 140.2 137.4 145.
5、1 145.8 147.9 150.8 144.5 137.1 147.1138.1 140.2 137.4 145.1 145.8 147.9 150.8 144.5 137.1 147.1142.9 134.9 143.6 142.3 142.9 134.9 143.6 142.3 125.9 125.9 132.7 152.9 147.9 141.8 141.4132.7 152.9 147.9 141.8 141.4140.9 141.4 140.9 141.4 160.9 160.9 154.2 137.9 139.9 149.7 147.5 136.9 148.1 154.2 13
6、7.9 139.9 149.7 147.5 136.9 148.1134.7 138.5 138.9 137.7 138.5 139.6 143.5 142.9 129.4 142.5134.7 138.5 138.9 137.7 138.5 139.6 143.5 142.9 129.4 142.5141.2 148.9 154.0 147.7 152.3 146.6 132.1 145.9 146.7 144.0141.2 148.9 154.0 147.7 152.3 146.6 132.1 145.9 146.7 144.0135.5 144.4 143.4 137.4 143.6 1
7、50.0 143.3 146.5 149.0 142.1135.5 144.4 143.4 137.4 143.6 150.0 143.3 146.5 149.0 142.1140.2 145.4 142.4 148.9 146.7 139.2 139.6 142.4 138.7 139.9140.2 145.4 142.4 148.9 146.7 139.2 139.6 142.4 138.7 139.96how to get a frequency table?odecide the number of classes and the intervals corresponding to
8、each class.oexamine each observation and assign it to the corresponding class.ocounting the observations in each class.opresent the result by a table.7class interval for height (cm)frequency( f )relative frequency124128132136140144148152156160totalfrequency table for quantitative datafrequency table
9、 for quantitative data 1 21022372615 4 2 11200.00830.01670.08330.18340.30830.21670.12500.03330.01670.00831.00008124132140148156164010203040the frequency distribution of the heights of 120 12-year-old boys frequencyhistogram for quantitative data高峰位于中间,两侧基本对称normal distributionvery useful information
10、 for the final analysis9from the histogram, we foundocharacteristics of the datanit is symmetry.nwith one peak located at the center of the data.ncalled “ symmetric and unimodal distribution ”.na large part of the boys are about 140-152 cm high.nthe further from the average level, the less boys.over
11、y useful information for the final analysis.10frequency table for categorical datablood typefreq.relative freq.(%)o20540.43a11222.09b15029.59ab 40 7.89total507100.0011skewness:nskewness means the lack of symmetry in a probability distribution. (the cambridge dictionary of statistics in the medical s
12、ciences.)nan asymmetric distribution is called skew. (armitage: statistical methods in medical research.)12positive & negative skewnessopositive skewness when it has a long thin tail at the right onegative skewness when it has a long thin tail to the left.na distribution which the upper tail is
13、longer than the low, would be called positively skew13kurtosisthe extent to which the peak of a unimodal distribution (probability or frequency) departs from the shape of the normal distribution.leptokurtic: more peaked (more pointed)platykurtic:more flatleptokurtosis, platykurtosis14kurtosisoutcome
14、 variable0246810121416182005101520253035404550outcome variable0246810121416182005101520253035404550number of observationsleptokurtosis platykurtosis15fig. the distribution of hg (hydrargyrum) of 237 adults hair 1 3 5 7 9 11 13 15 17 19 21hg (umol/kg)70605040302010 0frequency16fig. the distribution o
15、f scores of qol (quality of life ) of 892 senior citizen0 10 20 30 40 50 60 70 80 90 100qol400300200100 0frequency17fig. the distribution of survival times for 102malignant melanoma patients(恶性黑素瘤)1 5 10 15 20 25 30 35 40 45survival time (month)40302010 0frequency18fig. the distribution of ages at d
16、eath of males in 19901992 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85age at death (year)2500200015001000 500 0frequency19other types of chartsopie chartoscatter plotooall these would be introduced in a lecture after 2 weeks by zhao.20examples of pie charts21examples of scatter ploty heights
17、at 20(inch)x x heights at 2-year-old heights at 2-year-old(inchinch)303234363840636567697122examples of scatter plot170accidentstime (month)0102030405060708090 100 110 120 130 140 150 16040050060070080090010001100120023part iidescription of quantitative data by statistics measures of central tendenc
18、y & dispersion24data presentationousing statistics central tendency : arithmetic mean, median, mode, geometric mean, weight average measures of dispersion: range, interquartile range, standard deviation, variance, coefficient of variation(cv), 25central tendency: average 集中趋势:平均数集中趋势:平均数o算数均数算数均
19、数 arithmetic meanarithmetic meano几何均数几何均数 geometric meangeometric meano中位数中位数 和百分位数和百分位数median & percentilesmedian & percentileso众数众数 modemode26arithmetic mean, meanothe most widely utilized measure of locationopopulation mean: osample mean: a letter with a bar over itopreferred when the dat
20、a are “symmetry with single peak”.121ninixxxxnn121ninixxxxxnn27an exampleoa small population consists of 10 boysothe heights (cm) are:136 160 155 142 138 152 148 140 145 161136 160161147.71028oif 5 of the 10 boys were sampled randomly and we want to calculate the sample meano136 142 138 148 145136 1
21、42 138 148 145141.85x29weighted mean mean is a special case of weighted mean111112 nnnnnwxxxx1122wnnxw xw xw x30an exampleoyour final score of biostatistics 0.60practice0.40reportwx31geometric mean: 12lnlnlnlnlnexp nxxxxxxngx12nngxxx32example for geometric meanotitre value of five antibodyo1:10, 1:2
22、0, 1:40, 1:80, 1:16051020408016040 gln3.6889ln10ln20ln40ln80ln1603.6889540 xxge62516080402010x33medianomiddle value in ranked list.nhalf of the values fall above and half fall belownadvantage: less affected by extremes;ndrawback: wasteful of information.npreferable when data are not symmetric (1)/ 2
23、/ 2/ 2 1 when is odd ()/ 2 when is envennnxnmxxn 34examples for medianovalues of hg (hydrargyrum) of hair 1.1, 1.8 3.5 4.2 4.8 5.6 5.9 7.1 10.5 m=4.8 1.1, 1.8 3.5 4.2 4.8 5.6 5.9 7.1 16 m=4.8 1.1, 1.8 3.5 4.2 4.8 5.6 5.9 7.1 10.5 16 m=(4.8+5.6)/2=5.2 35percentileo100 centile, percentile x% px (100-x
24、)%oquartiles:quartiles:nfirst quartile: 25% (q ql l)nsecond quartile:mediannthird quartile:75% (q qu u)36modeothe mode is defined as the most frequently occurring value in the data set.oeample: 4 5 5 6 1 4 8 9 5 2othe mode is 537which measure should we use?omean:nsymmetric, unimodal;og: nif log tran
25、sformation creates symmetric, unimodal;omode:nunimodal;om:ndistribution free. uncertain data othe subjects should be homogeneity when we calculate average!38mean, mode and medianmean=median=mode39mean, mode and medianmodemedianmean40mean, mode & medianmeanmedianmode41 it has been said that a fel
26、low with one leg frozen in ice and the other leg in boiling water is comfortable on average !42真实的笑料:国家统计局不会计算平均数国家统计局不会计算平均数!?!?o国家统计局人口就业司司长冯乃林表示,工资“被增长”是一种误解,在金融危机中,处于工资低端的岗位和企业减少,而处于工资高端的岗位和企业变化较小,是造成平均工资数据仍然上升的原因之一。国家统计局承认因统计面过窄致使平均工资被增长o国家统计局2009年7月29日表示,上半年,中国城镇单位在岗职工平均工资为14638元,同比增长12.9%,有网民
27、称,统计数据与自己的收入不符。43measures of dispersionmeasures of dispersiongroup a: 26 28 30 32 34group b: 24 27 30 33 36group c: 24 29 30 31 3444range range r r = = maxmaxminminadvantage: easy to calculate.drawback: unstable, not sensitive. 45exampleogroup a: 26 28 30 32 34ogroup b: 24 27 30 33 36ogroup c: 26 29
28、 30 31 34otheir range are 8, 12 and 8 respectively.ohowever, group a and c have different degree of dispersion46inter-quartile range qu ql p75 p 25 47varianceoa population variance is denoted by 2,oa sample variance is denoted by s2, 22xn 221xxsn 48standard deviation (sd)oa population sd is denoted
29、by ,oa sample sd is denoted by s, 2xn 21xxsn 49example:a: 26 28 30 32 34b: 24 27 30 33 36c: 26 29 30 31 34 range variance sd meangroup a: 8 10.03.16 30group b: 1222.54.74 30group c: 8 8.52.92 30 50coefficient of variation coefficient of variation, cv100%scvx nonzero mean. make comparison between dif
30、ferent distributions. for variables with different scale or unit; for variables with more different means.51example: example: comparing the dispersion of comparing the dispersion of two variablestwo variables4.95: 100%2.98%166.064.96: 100%9.23%53.72heightcvweightcv mean sdheight: 166.06(cm)4.95(cm)w
31、eight:53.72(kg)4.96(kg)52which measure should we use?osd, variancenfor unimodal, symmetric, ocvnfor different units; for more different means.orangenfor any distribution, wasteful of information.ointerquartilenfor any distribution, robust, wasteful of information.the subjects should be homogeneity!5
32、3what do the variance and sd tell us?olarge variance (sd) means:nmore variable, wider range,nlower degree of representativeness of mean.osmall variance (sd) means:nless variable, narrower range,nhigher degree of representativeness of mean.54average and dispersionomeansd(min,max)omedianinterquartile
33、range(min,max)ousing both average and dispersion.55summarize:oeach variable has its own distribution;odescriptive using graphsusing statisticsaverage:mean, g, m , modedispersion: sd, variance, q, cv, rochoosing appropriate measurement;ousing average with dispersion.56part iiidescription of the relat
34、ionship between two quantitative variable57oexamplenwhat is the relationship between a mothers weight and her babys weight?nwhat is the relationship between the height and age of young boys?58description of association between 2 quantitative variableocorrelation: when two variables varies together,
35、we would say they are correlated.59measure of correlationopearsons linear correlation coefficientocoefficient of product-moment correlationoalways be abbreviated as correlation coefficiento -1, 1oit measures the strength and direction of the correlation. npopulations correlation coefficient: nsample
36、s correlation coefficient: r60othe larger the absolute value of correlation coefficient , the stronger the correlation.oif the sign is positive, the two variables varies at the same direction. else, they varies at the opposite direction.61values of correlation coefficientovalues near 1 indicate a st
37、rong positive association.o values near -1 indicate a strong negative association.o values around 0 indicate a weak association.62r=0r=0r-1r1completelypositivecompletelynegativenullnull0r1-1r0r=0 r=0nullpositivenegativenulldifferent patterns of correlation63computation of correlation coefficient22xy
38、xx yyxxyylrllxxyyyxsyysxxnr1164111213141516x5.05.56.06.5ybigsmallsmall65othe association of heights of 2 years-old and 20 years old。id123456782 years old39 30 32 34 35 36 36 3020 years old71 63 63 67 68 68 70 646622272931870.008xxlxx225343571267.508yylyy272 5341822165.008xylxxyy6765.000.945670.00 67
39、.50xyxx yylrll68awful computation?opractical class in your scheduleoenjoy the power of stata!69statistical discription of categorical data70outlinentable and graphnrelative number71frequency table of blood typeblood typefreq.relative freq.(%)o20540.43a11222.09b15029.59ab 40 7.89total507100.0072binar
40、y datagenderfrequencyrelative frequencymale4541.7female6358.3total108100.073multiple dataoccupationfrequencyrelative frequencylabour2825.9farmer2321.3office2422.2business1816.7others1513.9total10810074ranked dataresultfrequencyrelative frequency(%)cumulative frequency.cumulative relative frequency(%
41、)-8053.38053. 32013.310066.6+2516.712583.3+151014093.3+106.7150100total150-75two-way frequency tableotable for analysistreatmenteffect of drugtotaleffectivenot effectivedrug41445placebo241135total65157076numerical methods-relative number orateoproportionoratio 77rate-force indexoa single figure that
42、 measures the forces of specific events, for example death, disease. mortality & morbidity)oa=the frequency with which an event has occurred during some specified period of time. oa+b= the number of person exposed to the risk of the event during the same period of timeok=some number such as %, and so on.othe denominator should not be two small(=50%)?akab78vital statistics-rates as measure of health status.oincidence ra
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 湖南商务职业技术学院单招测试题(附解析)英语
- 教师招聘之《小学教师招聘》复习提分资料及参考答案详解(模拟题)
- 2025年教师招聘之《幼儿教师招聘》考前冲刺练习题及参考答案详解(典型题)
- 教师招聘之《小学教师招聘》能力提升题库附答案详解【培优b卷】
- 教师招聘之《小学教师招聘》考前冲刺测试卷附有答案详解及参考答案详解(培优a卷)
- 押题宝典教师招聘之《幼儿教师招聘》通关考试题库及参考答案详解【巩固】
- 演出经纪人之《演出经纪实务》每日一练及参考答案详解1套
- 2025年教师招聘之《幼儿教师招聘》预测试题含答案详解【新】
- 2025年公务员考试行测真题及答案
- 白酒行业盈利能力分析-以山西汾酒为例
- 《灭火器维修》GA95-2015(全文)
- 纳米材料ppt课件精品课件
- 广东工业大学年《电机学》期末试题及答案解析
- 解读《义务教育体育与健康课程标准(2022年版)》2022年体育与健康新课标专题PPT
- 2019版外研社高中英语必修三单词默写表
- 食堂合作协议范本食堂档口合作协议.doc
- 直接还原铁生产工艺
- 建筑识图题库及答案
- 《幂的运算》习题精选及答案
- 异质结TCO设备:RPD与PVD比较分析(2021年).doc
- PPT汇报评分表(共1页)
评论
0/150
提交评论