SATA7055STAT7055_L10_Chi-Sq-Tests_第1页
SATA7055STAT7055_L10_Chi-Sq-Tests_第2页
SATA7055STAT7055_L10_Chi-Sq-Tests_第3页
SATA7055STAT7055_L10_Chi-Sq-Tests_第4页
SATA7055STAT7055_L10_Chi-Sq-Tests_第5页
已阅读5页,还剩37页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

1、STAT7055 Topic 10 Chi-Squared Tests STAT7055 - Topic 101/41 Introduction Tests for Categorical Data I Testing population proportions can be thought of as making inferences about populations of categorical data with two categories. IFor example, “Coke” and “Pepsi”. We can denote Coke as a success, Pe

2、psi as a failure and perform tests on the population proportion of successes, p. I Chi-squared tests are designed to make inferences about populations of categorical data with two or more categories. IFor example, “Coke”, “Pepsi” and “Dr Pepper”. STAT7055 - Topic 102/41 Introduction Chi-Squared Test

3、s 1. Chi-squared goodness-of-fi t test. IUsed to analyse a population of categorical data arising from one categorical variable with k categories. I Specifi cally, used to describe the population proportion of observations in each category. 2. Chi-squared test of a contingency table. IUsed to analys

4、e a population of categorical data arising from two categorical variables with r and c categories, respectively. I Specifi cally, used to determine whether the two categorical variables are independent. STAT7055 - Topic 103/41 Chi-Squared Goodness-of-Fit TestExample 1 College Example I Suppose we su

5、rveyed a sample of 206 students and recorded the college in which they studied. I The data is summarised below: CollegeBusinessArtsCSScienceLaw Count6520356224 I We want to test whether the population proportions of students in each college are equal. STAT7055 - Topic 104/41 Chi-Squared Goodness-of-

6、Fit TestExample 1 Hypotheses I Let p1, p2, p3, p4and p5 denote the population proportion of students in Business, Arts, Computer Science, Science, and Law, respectively. H0: Population proportions in each college are equal. That is,p1= p2= p3= p4= p5= 1 5 H1: Not all population proportions are equal

7、. STAT7055 - Topic 105/41 Chi-Squared Goodness-of-Fit TestExample 1 Test Statistic I If we assume H0is true, what would be the expected count in each college? I We would expect one-fi fth of total students in each college, i.e, 1 5 206 = 41.2 students. CollegeBusinessArtsCSScienceLaw Observed6520356

8、224 Expected41.241.241.241.241.2 STAT7055 - Topic 106/41 Chi-Squared Goodness-of-Fit TestExample 1 Test Statistic I If the observed counts are close to the expected counts, then that is evidence supporting H0. I If the observed counts are far from the expected counts, then that is evidence against H

9、0. I So we need a test statistic that measures how close the observed counts are to the expected counts. STAT7055 - Topic 107/41 Chi-Squared Goodness-of-Fit TestExample 1 Test Statistic I The test statistic that we use is called the chi-squared goodness-of-fi t statistic: 2= k X i=1 (fi ei)2 ei wher

10、e k is the number of categories, fiare the observed counts and eiare the expected counts. STAT7055 - Topic 108/41 Chi-Squared Goodness-of-Fit TestExample 1 Test Statistic I For our data: 2= 5 X i=1 (fi ei)2 ei = (65 41.2)2 41.2 + (20 41.2)2 41.2 + (35 41.2)2 41.2 + (62 41.2)2 41.2 + (24 41.2)2 41.2

11、= 43.2718 STAT7055 - Topic 109/41 Chi-Squared Goodness-of-Fit TestExample 1 Decision Rule I A small 2-statistic supports H0, whereas a large 2-statistic supports H1. I So we should reject H0when the 2-statistic is large, meaning that this is an upper-tailed test. I The sampling distribution of this

12、2-statistic is the chi-squared distribution with k 1 degrees of freedom. STAT7055 - Topic 1010/41 Chi-Squared Goodness-of-Fit TestExample 1 Chi-Squared Distribution I Another special continuous distribution (see Keller pages 292-295 (10e) or 277-280 (11e). I It has one parameter called the degrees o

13、f freedom. I There is a 2-table listing probabilities. B-11 APPENDIX B Degrees of Freedom!2 .995 !2 .990 !2 .975 !2 .950 !2 .900 !2 .100 !2 .050 !2 .025 !2 .010 !2 .005 10.0000390.0001570.0009820.003930.01582.713.845.026.637.88 20.01000.02010.05060.1030.2114.615.997.389.2110.6 30.0720.1150.2160.3520

14、.5846.257.819.3511.312.8 40.2070.2970.4840.7111.067.789.4911.113.314.9 50.4120.5540.8311.151.619.2411.112.815.116.7 60.6760.8721.241.642.2010.612.614.416.818.5 70.9891.241.692.172.8312.014.116.018.520.3 81.341.652.182.733.4913.415.517.520.122.0 91.732.092.703.334.1714.716.919.021.723.6 102.162.563.2

15、53.944.8716.018.320.523.225.2 112.603.053.824.575.5817.319.721.924.726.8 123.073.574.405.236.3018.521.023.326.228.3 133.574.115.015.897.0419.822.424.727.729.8 144.074.665.636.577.7921.123.726.129.131.3 154.605.236.267.268.5522.325.027.530.632.8 165.145.816.917.969.3123.526.328.832.034.3 175.706.417.

16、568.6710.124.827.630.233.435.7 186.267.018.239.3910.926.028.931.534.837.2 196.847.638.9110.111.727.230.132.936.238.6 207.438.269.5910.912.428.431.434.237.640.0 218.038.9010.311.613.229.632.735.538.941.4 228.649.5411.012.314.030.833.936.840.342.8 239.2610.211.713.114.832.035.238.141.644.2 249.8910.91

17、2.413.815.733.236.439.443.045.6 2510.511.513.114.616.534.437.740.644.346.9 2611.212.213.815.417.335.638.941.945.648.3 2711.812.914.616.218.136.740.143.247.049.6 2812.513.615.316.918.937.941.344.548.351.0 2913.114.316.017.719.839.142.645.749.652.3 3013.815.016.818.520.640.343.847.050.953.7 4020.722.2

18、24.426.529.151.855.859.363.766.8 5028.029.732.434.837.763.267.571.476.279.5 6035.537.540.543.246.574.479.183.388.492.0 7043.345.448.851.755.385.590.595.0100104 8051.253.557.260.464.396.6102107112116 9059.261.865.669.173.3108113118124128 10067.370.174.277.982.4118124130136140 f(x2) x2 A 0 A 2 x TABLE

19、 5 Critical Values of the !2Distribution App-B.qxd 11/22/10 6:41 PM Page B-11 STAT7055 - Topic 1011/41 Chi-Squared Goodness-of-Fit TestExample 1 Decision Rule and Conclusion I Decision Rule: IWe compare the 2-statistic to a chi-squared distribution with k 1 = 4 degrees of freedom. IFor = 5%, the cri

20、tical value is 9.49, so the rejection region is 2 9.49. I Conclusion: ISince 43.27 9.49, our test statistic lies in the rejection region, so we reject H0. IConclude that the population proportions of students in each college are not equal. STAT7055 - Topic 1012/41 Chi-Squared Goodness-of-Fit TestExa

21、mple 1 Decision Rule and Conclusion B-11 APPENDIX B Degrees of Freedom!2.995!2.990!2.975!2.950!2.900!2.100!2.050!2.025!2.010!2.005 10.0000390.0001570.0009820.003930.01582.713.845.026.637.88 20.01000.02010.05060.1030.2114.615.997.389.2110.6 30.0720.1150.2160.3520.5846.257.819.3511.312.8 40.2070.2970.

22、4840.7111.067.789.4911.113.314.9 50.4120.5540.8311.151.619.2411.112.815.116.7 60.6760.8721.241.642.2010.612.614.416.818.5 70.9891.241.692.172.8312.014.116.018.520.3 81.341.652.182.733.4913.415.517.520.122.0 91.732.092.703.334.1714.716.919.021.723.6 102.162.563.253.944.8716.018.320.523.225.2 112.603.

23、053.824.575.5817.319.721.924.726.8 123.073.574.405.236.3018.521.023.326.228.3 133.574.115.015.897.0419.822.424.727.729.8 144.074.665.636.577.7921.123.726.129.131.3 154.605.236.267.268.5522.325.027.530.632.8 165.145.816.917.969.3123.526.328.832.034.3 175.706.417.568.6710.124.827.630.233.435.7 186.267

24、.018.239.3910.926.028.931.534.837.2 196.847.638.9110.111.727.230.132.936.238.6 207.438.269.5910.912.428.431.434.237.640.0 218.038.9010.311.613.229.632.735.538.941.4 228.649.5411.012.314.030.833.936.840.342.8 239.2610.211.713.114.832.035.238.141.644.2 249.8910.912.413.815.733.236.439.443.045.6 2510.5

25、11.513.114.616.534.437.740.644.346.9 2611.212.213.815.417.335.638.941.945.648.3 2711.812.914.616.218.136.740.143.247.049.6 2812.513.615.316.918.937.941.344.548.351.0 2913.114.316.017.719.839.142.645.749.652.3 3013.815.016.818.520.640.343.847.050.953.7 4020.722.224.426.529.151.855.859.363.766.8 5028.

26、029.732.434.837.763.267.571.476.279.5 6035.537.540.543.246.574.479.183.388.492.0 7043.345.448.851.755.385.590.595.0100104 8051.253.557.260.464.396.6102107112116 9059.261.865.669.173.3108113118124128 10067.370.174.277.982.4118124130136140 f(x2) x2 A 0 A 2 x TABLE 5 Critical Values of the !2Distributi

27、on App-B.qxd 11/22/10 6:41 PM Page B-11 I Decision Rule: IWe compare the 2-statistic to a chi-squared distribution with k 1 = 4 degrees of freedom. IFor = 5%, the critical value is 9.49, so the rejection region is 2 9.49. I Conclusion: ISince 43.27 9.49, our test statistic lies in the rejection regi

28、on, so we reject H0. IConclude that the population proportions of students in each college are not equal. STAT7055 - Topic 1012/41 Chi-Squared Goodness-of-Fit TestSummary Summary I The Chi-squared goodness-of-fi t test is used in the situation where we have one categorical variable with k categories

29、. I Hypotheses: H0: Population proportions in each category are given by some specifi ed values. H1: At least one population proportion is not equal to its specifi ed value. STAT7055 - Topic 1013/41 Chi-Squared Goodness-of-Fit TestSummary Summary I Test statistic: 2= k X i=1 (fi ei)2 ei where k is t

30、he number of categories, fiare the observed counts in each category and eiare the expected counts (based on H0) in each category. STAT7055 - Topic 1014/41 Chi-Squared Goodness-of-Fit TestSummary Summary I Decision rule and conclusion: ICompare 2-statistic to a chi-squared distribution with k 1 degre

31、es of freedom. I At a signifi cance level of , reject H0if 2 2 ,k1, where 2 ,k1 is the critical value that cuts off 100% in the upper tail of the chi-squared distribution with k 1 degrees of freedom. STAT7055 - Topic 1015/41 Chi-Squared Goodness-of-Fit TestExample 2 Network Satisfaction Example I A

32、survey was performed within a company to assess the degree to which staff were satisfi ed with the computer network. I Staff were classifi ed as either administrative, technical or management. I Responses were classifi ed as either satisfi ed, neutral or dissatisfi ed. STAT7055 - Topic 1016/41 Chi-S

33、quared Goodness-of-Fit TestExample 2 Network Satisfaction Example Group Total AdminTechManage Response Satisfi ed37113583 Neutral18251053 Dissatisfi ed20242064 Total756065200 STAT7055 - Topic 1017/41 Chi-Squared Goodness-of-Fit TestExample 2 Network Satisfaction Example I Test whether there are half

34、 as many staff who are satisfi ed than staff who are either neutral or dissatisfi ed. I That is, for every staff member who is satisfi ed, there are two who are neutral or dissatisfi ed. ResponseTotal Satisfi ed83 Neutral53 Dissatisfi ed64 Total200 STAT7055 - Topic 1018/41 Chi-Squared Goodness-of-Fi

35、t TestExample 2 Hypotheses H0 : There are half as many staff satisfi ed with the network than not (either neutral or dissatisfi ed). That is,psat= 1 3 andpnot-sat= 2 3 H1: The population proportions do not match that given above. STAT7055 - Topic 1019/41 Chi-Squared Goodness-of-Fit TestExample 2 Hyp

36、otheses I How did we get the probabilities for H0? I The question asked us to test whether the ratio of satisfi ed to not-satisfi ed staff was 1 to 2. I To convert the ratio 1 : 2 to probabilities, add together the two numbers in the ratio and use that as the denominator. I That is, psat= 1 3 and pn

37、ot-sat= 2 3. STAT7055 - Topic 1020/41 Chi-Squared Goodness-of-Fit TestExample 2 Test Statistic I First calculate the expected counts: ResponseObservedExpected Satisfi ed83200 1 3 = 66.67 Neutral or Dissatisfi ed117200 2 3 = 133.33 Total200200 STAT7055 - Topic 1021/41 Chi-Squared Goodness-of-Fit Test

38、Example 2 Test Statistic I Then calculate the 2-statistic: 2= 2 X i=1 (fi ei)2 ei = (83 66.67)2 66.67 + (117 133.33)2 133.33 = 6.0025 STAT7055 - Topic 1022/41 Chi-Squared Goodness-of-Fit TestExample 2 Decision Rule and Conclusion I Decision rule: ICompare to a chi-squared distribution with k 1 = 1 d

39、egree of freedom. IReject H0 at the 5% signifi cance level if 2 2 0.05,1 = 3.84 (from table). I Conclusion: ISince 6.0025 3.84, reject H0and conclude that there are not half as many satisfi ed staff as there are neutral or dissatisfi ed staff . STAT7055 - Topic 1023/41 Chi-Squared Test of a Continge

40、ncy Table Chi-Squared Test of a Contingency Table I Contingency table: I A cross-classifi cation table of counts that summarises the joint distribution of two categorical variables, each with a fi nite number of categories. I Chi-squared test of a contingency table: IUsed to determine whether two ca

41、tegorical variables are independent. That is, to determine whether the distribution of one categorical variable is the same across all categories of the other categorical variable. STAT7055 - Topic 1024/41 Chi-Squared Test of a Contingency TableExample 1 College Example I Lets go back to the example

42、 where we surveyed 206 students and recorded the college in which they studied. I Suppose now we also recorded whether each student was male or female. I Now we have two categorical variables. STAT7055 - Topic 1025/41 Chi-Squared Test of a Contingency TableExample 1 College Example FemaleMaleTotal A

43、rts61420 Business333265 CS92635 Law15924 Science332962 Total96110206 STAT7055 - Topic 1026/41 Chi-Squared Test of a Contingency TableExample 1 Hypotheses H0: College and gender are independent. That is, the proportion of females is the same across all colleges. H1: College and gender are not indepen

44、dent. That is, the proportion of females diff ers across colleges. STAT7055 - Topic 1027/41 Chi-Squared Test of a Contingency TableExample 1 Test Statistic I Very similar to the chi-squared goodness-of-fi t statistic: 2= r X i=1 c X j=1 (fij eij)2 eij where r and c are the number of rows and columns

45、, respectively, in the table, and fijand eijare the observed and expected counts, respectively, for the cell in the ith row and jth column of the table. STAT7055 - Topic 1028/41 Chi-Squared Test of a Contingency TableExample 1 Expected Counts I But the expected counts are calculated diff erently. I

46、Remember that to calculate the test statistic we assume H0is true. I Under H0, the two categorical variables are independent. I Expected counts are calculated from the margins of the table and are based on the independence of discrete random variables. STAT7055 - Topic 1029/41 Chi-Squared Test of a

47、Contingency TableExample 1 Expected Counts I If the variables are independent, then the joint probability of falling in the (i,j)th cell in the table is equal to: p(i,j) = pr(i) pc(j) I But the marginal probabilities are equal to: pr(i) = ith row total sample size pc(j) = jth column total sample siz

48、e STAT7055 - Topic 1030/41 Chi-Squared Test of a Contingency TableExample 1 Expected Counts I Therefore, the expected counts are given by: eij= sample size p(i,j) = sample size pr(i) pc(j) = sample size ith row total sample size jth column total sample size = ith row total jth column total sample si

49、ze STAT7055 - Topic 1031/41 Chi-Squared Test of a Contingency TableExample 1 Expected Counts FemaleMaleTotal Arts 2096 206 = 9.32 20110 206 = 10.6820 Business 6596 206 = 30.29 65110 206 = 34.7165 CS 3596 206 = 16.31 35110 206 = 18.6935 Law 2496 206 = 11.18 24110 206 = 12.8224 Science 6296 206 = 28.8

50、9 62110 206 = 33.1162 Total96110206 STAT7055 - Topic 1032/41 Chi-Squared Test of a Contingency TableExample 1 Test Statistic I For our data, 2= r X i=1 c X j=1 (fij eij)2 eij = (6 9.32)2 9.32 + (33 30.29)2 30.29 + . . + (29 33.11)2 33.11 = 12.3361 STAT7055 - Topic 1033/41 Chi-Squared Test of a Conti

51、ngency TableExample 1 Decision Rule and Conclusion I Decision rule: IThe sampling distribution of the 2-statistic for a test of a contingency table is a chi-squared distribution with (r 1) (c 1) degrees of freedom. ISo compare our 2-statistic to a chi-squared distribution with (5 1) (2 1) = 4 degree

52、s of freedom. IFor = 5%, reject H0if 2 9.49. I Conclusion: ISince 12.3361 9.49, the test statistic falls within the rejection region and we reject the null hypothesis. ICollege and gender are not independent, that is, the proportion of females in each college is diff erent. STAT7055 - Topic 1034/41

53、Chi-Squared Test of a Contingency TableSummary Summary I A Chi-squared test of a contingency table is used in the situation where we have two categorical variables, each with a fi nite number of categories. I Hypotheses: H0: The variables are independent. (The distribution of one variable is the same across all categories of the second variable.) H1: The variables are not independent. (The distribution of one variable diff ers across the categories of the second variable.) STAT7055 - Topic 1035/41 Chi-Squared Test of a Contingency Tab

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论