




版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
1、.Week Six Analyzing categorical data: Chi-squared tests .This week lecture will cover.Analysing categorical data (nominal) Chi-square test of differences between proportions Chi-square test of independence.SPSS单样本非参数检验总体分布的总体分布的chi-square检验检验(1)目的目的: 根据样本数据推断总体的分布与某个已知分布是否有显著差异根据样本数据推断总体的分布与某个已知分布是否
2、有显著差异-吻合性检验。吻合性检验。适用于分类资料的统计推断适用于分类资料的统计推断.SPSS单样本非参数检验单样本非参数检验l总体分布的chi-square检验(2)基本假设: H0:总体分布与理论分布无显著差异(3)基本方法 根据已知总体的构成比计算出样本中各类别的期望频数,计算实际观察频数与期望频数的差距,即:计算卡方值 卡方值较小,则实际频数和期望频数相差较小.如果P大于a,不能拒绝H0,认为总体分布与已知分布无显著差异.反之.SPSS单样本卡方检验总体分布的总体分布的chi-square检验检验(4)基本操作步骤基本操作步骤:菜单:analyze-nonparametric test
3、-chi square选定待检验变量入test variable list 框确定待检验个案的取值范围(expected range)get from data:全部样本use specified range:用户自定义个案范围指定期望频数(expected values)all categories equal:所有类别有相同的构成比value:用户自定义构成比.Categorical variableVariables that describe categories of entitiesDealing with them all the time in statisticsMaking
4、 comparisons among variablesFor example, whether consumers prefer a particular brand of a product among other competing brands.Checking whether there is a relationship between two categorical variables Gender and preference for a product, whether the preference for a product is independent from gend
5、er.Chi-square test for differences between proportionsThis test involves with nominal data produced by multinomial experimentIt is a generalisation of a binomial experimentThese test the null hypothesis that data in the target population has a particular probability distribution.Example 1We might te
6、st whether consumers are indifferent to which of four materials (glass, plastic, steel or aluminium) that could be used to make soft drink containers.The null hypothesis is that they are indifferent (or that equal numbers prefer glass, plastic, steel and aluminium).Example 1DataLet pG be the probabi
7、lity that an individual selected at random will nominate glass as his/her preference if required to make a choice. Similarly for pP (plastic), pS (steel) and pA (aluminium)HypothesesHO: pG = pP = pS = pA = 0.25.HA: at least one pi 0.25.The alternative is that at least one material is more preferred
8、(or less preferred) than the others.Example 1cont.Procedure:Select a random sample of, say, 100 consumers and determine their preferences.Under the null hypothesisWe expect 25 consumers to nominate glass, 25 to nominate plastic, 25 to nominate steel and 25 to nominate aluminiumThese are the expected
9、 frequencies, Ei.Ei = n pi.We compare the expected frequencies with the sample results or the observed frequencies, Oi. If they are approximately the same we would conclude that the null hypothesis is true.Oi Ei HO is probably true.Example 1cont., Chi squareE)EO(i221GiiWe require a test statistic to
10、 decide whether the difference is large enough to reject the null hypothesis.We use chi square with G - 1 degrees of freedom where G is the number of groups.Suppose in our example, 39 prefer glass, 16 prefer plastic, 20 prefer steel and 25 prefer aluminium. Recall that the expected frequencies were
11、all 25.08.1225)2525(25)2520(25)2516(25)2539(23222223.Obtain the critical value of chi square Critical 23 = 7.82. Obtain the critical value at 5% significance level at 3 d.f., (Table E4, page 742, Berenson et.al. 2013)i.e. there is only a 5 percent chance or less that 23 7.82 if HO is true. Compariso
12、n of chi square values23 = 12.08 7.82 reject HO. Conclusion: at the 5% significance level there is sufficient evidence to reject the null hypothesis. At least one of the probabilities (pi) is different. The sample results indicate that the materials are not equally preferred by consumers in the targ
13、et population. Thus, at least preferences for two materials are different.Chi square test using SPSSExample : Suppose that we want to test whether or not customers have a colour preference for packaging. Three different colours, Blue, Green & Purple, are considered. The null hypothesis is that t
14、hey dont have colour preference.Use Analyse/Nonparametric tests /Chi-Square.The default is that the probabilities are equal.Main display colour2630.0-4.03730.07.02730.0-3.090BlueGreenPurpleTotalObserved NExpected NResidualNumbers of consumers actually choosing particular colours.Numbers of consumers
15、 expected to choose particular colours if the null is true.Main display colour2630.0-4.03730.07.02730.0-3.090BlueGreenPurpleTotalObserved NExpected NResidualDifferent but differentenough to reject the null? .Test Statistics2.4672.291Chi-SquareadfAsymp. Sig.Main DisplayColour0 cells (.0%) have expect
16、ed frequencies less than5. The minimum expected cell frequency is 30.0.a. Degrees of freedom,groups - 1Chi-square statistic.Test Statistics2.4672.291Chi-SquareadfAsymp. Sig.Main DisplayColourCheck this to test the null.Check the sig value to test Ho Cannot reject the null (Ho) that all three colours
17、 are equally preferredbecause Sig 0.05.Conclusion: At 5% significance level there is no sufficient evidence to conclude that consumers in the target population have preference for at least one of three colours of packaging. .Tests of independence Chi-squared test of a contingency tableThis test sati
18、sfies two different problem objectives :Are two nominal variables related? Are there differences among two or more population of nominal variables?Consider the following 3 featuresHeight in centimetres, Weight in kilograms & Colour of eyes.Whilst some people are tall and thin, on average taller
19、people weigh more than shorter people.Weight and height are not independent. It seems unlikely that people with blue eyes weigh more, on average, than people with brown eyes.Weight and eye colour are almost certainly independent.交叉分组下的频数分析目的 了解不同变量在不同水平下的数据分布情况 例:学习成绩与性别有关联吗?(两变量)例:职业、性别、爱逛商店有关联吗?(三
20、变量)分析的主要步骤产生交叉列联表分析列联表中变量间的关系.产生交叉列联表收入 职称 高(人) 中(人) 低(人) 高工 工程师 助工 技术员 合计 什么是列联表列变量行变量地区控制变量频数.产生交叉列联表基本操作步骤(1)菜单选项: analyze-descriptive statistics- crosstabs(2)选择一个变量作为行变量到row框.(3)选择一个变量作为列变量到column框.(4)可选一个或多个变量作为控制变量到layer框.控制变量的层次设置:同层为水平数加水平数加;不同层为水平数积水平数积.(5)是否显示各分组的棒图(display clustered bar c
21、harts ).产生交叉列联表进一步计算 cells选项:选择在频数分析表中输出各种百分比.row:行百分比(Row pct);column:列百分比(Col pct);total:总百分比(Tot pct); .分析列联表中变量间的关系目的: 通过列联表分析,检验行列变量之间是否独立。方法: 卡方检验:对品质数据的相关性进行度量.分析列联表中变量间的关系卡方检验 年龄与工资收入交叉列联表 低 中 高 青 400 0 0 中 0 5000 老 0 0 600 低 中 高 青 0 0 500 中 0 6000 老 400 0 0.分析列联表中变量间的关系卡方检验基本步骤(1)H0:行列变量之间无
22、关联或相互独立(2)构造卡方统计量统计量服从(r-1)*(c-1)个自由度的卡方分布count:观察(实际)频数expected count:期望频数(期望频数反映的是H0成立情况下的数据分布特征)Residual:剩余(观察频数-期望频数)优良中及格总数男1055323女8124125总数1817944837.535.418.88.3100eeofff22)(.不患肺癌不患肺癌患肺癌患肺癌总计总计不吸烟不吸烟7775427817吸烟吸烟2099492148总计总计98749199651、列联表2、三维柱形图3、二维条形图不患肺癌患肺癌吸烟不吸烟不患肺癌患肺癌吸烟不吸烟080007000600
23、050004000300020001000从三维柱形图能清晰看出从三维柱形图能清晰看出各个频数的相对大小。各个频数的相对大小。从二维条形图能看出,吸烟者中从二维条形图能看出,吸烟者中患肺癌的比例高于不患肺癌的比例。患肺癌的比例高于不患肺癌的比例。通过图形直观判断两个分类变量是否相关:通过图形直观判断两个分类变量是否相关:.Tests of independence contExample 2Suppose we interviewed 400 people & asked themwhich of three age groups they are in (under 25, 25 t
24、o 60, and over 60).We also ask their response to the statement that “All imports of automobiles should be banned in order to protect the local industry” (agree, no view either way, disagree).attitudes towards banning importsagreeno viewdisagree Total age groupunder 2519 53 25 9725 - 6046 94 47 187ov
25、er 6030 56 30 116Total95203102 400.Tests of independence contExample 2 cont.Null hypothesis: The null hypothesis is that answers to the two questions are independent.Under the null:Probover 60 and agree = Probover 60 ProbagreeMultiplication rule for independent eventsExpected frequency= Probover 60
26、Probagree sample size.nCRnnCnREjijiijProcedureWe set up a cross-tabulation showing the observed frequencies of answers to the two questions.We calculate the expected frequencies.TestOur test is based on a comparison of the observed and expected frequencies.Short-cut for expected frequencies.Age *att
27、itude to banning imports Cross tabulation19.053.025.097.023.049.224.796.946.094.047.0187.044.494.947.7187.030.056.030.0116.027.658.929.6116.195.0203.0102.0400.095.0203.0102.0400.0CountExpected CountCountExpected CountCountExpected CountCountExpected CountUnder 2525-60Over 60AgeGroupTotalAgreeNo view
28、DisagreeAttitude to ban importsTotalCalculation for expectedfrequency of agree and over 60,95 116 / 400.Age *attitude to banning imports Cross tabulation19.053.025.097.023.049.224.796.946.094.047.0187.044.494.947.7187.030.056.030.0116.027.658.929.6116.195.0203.0102.0400.095.0203.0102.0400.0CountExpe
29、cted CountCountExpected CountCountExpected CountCountExpected CountUnder 2525-60Over 60AgeGroupTotalAgreeNo viewDisagreeAttitude to ban importsTotalThe count (observed) and the expected are different, but different enough to reject the null?.Chi-squared test for independenceE)EO(ij22)1c()1r (ijijRationale:Oij Eij HO is probably true.Test statisticWe requi
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 河南省南阳市第十三中学2025届数学八下期末达标检测试题含解析
- 未来科技对战略的影响试题及答案
- 创新思维下的公司战略与风险管理2025试题及答案
- 网络安全策略在实践中的落地执行试题及答案
- 河北省保定市莲池区十三中学2025届数学八下期末联考试题含解析
- 风险管理工具在企业战略规划中的应用试题及答案
- 深入了解程序设计中的最佳实践试题及答案
- 2025年软件设计师考试应对策略试题及答案
- 法学概论的法律执行与落实探讨试题及答案
- 法学概论在社会公正实现中的作用研究试题及答案
- 自愿放弃孩子协议书(2篇)
- 汉谟拉比法典中文版
- 2025届高考地理复习+情景类型题分析
- DLT 1529-2016 配电自动化终端设备检测规程
- 2018年四川省中职学校技能大赛建筑CAD赛项 样题
- 芯片封装可靠性评价与失效分析
- 2024年人工智能训练师(初级)职业鉴定理论考试题库及答案
- 质量环境职业健康安全管理体系三合一整合全套体系文件(管理手册+程序文件)
- 山东省青岛市崂山区2023-2024学年七年级下学期期末数学试题
- 氧气吸入操作评分标准(中心供氧)
- JT-T-969-2015路面裂缝贴缝胶
评论
0/150
提交评论