第十讲 虚拟变量DUMMY VARIALBE.ppt_第1页
第十讲 虚拟变量DUMMY VARIALBE.ppt_第2页
第十讲 虚拟变量DUMMY VARIALBE.ppt_第3页
第十讲 虚拟变量DUMMY VARIALBE.ppt_第4页
第十讲 虚拟变量DUMMY VARIALBE.ppt_第5页
已阅读5页,还剩68页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

1、第十讲 虚拟变量DUMMY VARIALBE,DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES TWO SETS OF DUMMY VARIABLES SLOPE DUMMY VARIABLES,DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES,This sequence explains how you can include qualitative explanatory variables

2、 in your regression model.,DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES,Suppose that you have data on the annual recurrent expenditure, COST, and the number of students enrolled, N, for a sample of secondary schools, of which there are two types: regular and occupational.,DUMMY VARIABLE CLASSIF

3、ICATION WITH TWO CATEGORIES,The occupational schools aim to provide skills for specific occupations and they tend to be relatively expensive to run because they need to maintain specialized workshops.,DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES,One way of dealing with the difference in the cos

4、ts would be to run separate regressions for the two types of school.,DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES,However this would have the drawback that you would be running regressions with two small samples instead of one large one, with an adverse effect on the precision of the estimates

5、of the coefficients.,OCC = 0 Regular schoolCOST = b1 + b2N + u OCC = 1 Occupational schoolCOST = b1 + b2N + u,DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES,Another way of handling the difference would be to hypothesize that the cost function for occupational schools has an intercept b1 that is g

6、reater than that for regular schools.,b1,b1,DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES,Effectively, we are hypothesizing that the annual overhead cost is different for the two types of school, but the marginal cost is the same. The marginal cost assumption is not very plausible and we will re

7、lax it in due course.,OCC = 0 Regular schoolCOST = b1 + b2N + u OCC = 1 Occupational schoolCOST = b1 + b2N + u,b1,b1,DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES,d,Let us define d to be the difference in the intercepts: d = b1 - b1.,OCC = 0 Regular schoolCOST = b1 + b2N + u OCC = 1 Occupational

8、 schoolCOST = b1 + b2N + u,b1,b1,DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES,Then b1 = b1 + d and we can rewrite the cost function for occupational schools as shown.,b1+d,d,OCC = 0 Regular schoolCOST = b1 + b2N + u OCC = 1 Occupational schoolCOST = b1 + d + b2N + u,b1,Combined equationCOST = b

9、1 + d OCC + b2N + u OCC = 0 Regular schoolCOST = b1 + b2N + u OCC = 1 Occupational schoolCOST = b1 + d + b2N + u,DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES,We can now combine the two cost functions by defining a dummy variable OCC that has value 0 for regular schools and 1 for occupational sc

10、hools.,d,b1,b1+d,DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES,Dummy variables always have two values, 0 or 1. If OCC is equal to 0, the cost function becomes that for regular schools. If OCC is equal to 1, the cost function becomes that for occupational schools.,d,b1,b1+d,Combined equationCOST

11、= b1 + d OCC + b2N + u OCC = 0 Regular schoolCOST = b1 + b2N + u OCC = 1 Occupational schoolCOST = b1 + d + b2N + u,DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES,We will now fit a function of this type using actual data for a sample of 74 secondary schools in Shanghai.,School TypeCOST N OCC 1Occ

12、upational345,0006231 2Occupational 537,0006531 3Regular 170,0004000 4Occupational 526.0006631 5Regular100,0005630 6Regular 28,0002360 7Regular 160,0003070 8Occupational 45,0001731 9Occupational 120,0001461 10 Occupational61,000991,DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES,The table shows the

13、 data for the first 10 schools in the sample. The annual cost is measured in yuan, one yuan being worth about 20 cents U.S. at the time. N is the number of students in the school.,School TypeCOST N OCC 1Occupational345,0006231 2Occupational 537,0006531 3Regular 170,0004000 4Occupational 526.0006631

14、5Regular100,0005630 6Regular 28,0002360 7Regular 160,0003070 8Occupational 45,0001731 9Occupational 120,0001461 10 Occupational61,000991,DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES,OCC is the dummy variable for the type of school.,. Dependent Variable: COST Method: Least Squares Date: 05/16/04

15、 Time: 19:22 Sample: 1 74 Included observations: 74 VariableCoefficientStd. Errort-StatisticProb. C-33612.5523573.47-1.4258640.1583 N331.449339.758448.3365780.0000 OCC133259.120827.596.3982010.0000 R-squared0.615637 Mean dependent var187418.0 Adjusted R-squared0.604810 S.D. dependent var141969.9 S.E

16、. of regression89248.09 Akaike info criterion25.67592 Sum squared resid5.66E+11 Schwarz criterion25.76933 Log likelihood-947.0092 F-statistic56.86072 Durbin-Watson stat2.422989 Prob(F-statistic)0.000000,DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES,We now run the regression of COST on N and OCC,

17、 treating OCC just like any other explanatory variable, despite its artificial nature.,DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES,COST = -34,000 + 133,000OCC + 331N,The regression results have been rewritten in equation form. From it we can derive cost functions for the two types of school by

18、 setting OCC equal to 0 or 1.,Regular School (OCC = 0),DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES,COST = -34,000 + 133,000OCC + 331N,COST = -34,000 + 331N,If OCC is equal to 0, we get the equation for regular schools, as shown. It implies that the marginal cost per student per year is 331 yua

19、n and that the annual overhead cost is -34,000 yuan.,Regular School (OCC = 0),DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES,COST = -34,000 + 133,000OCC + 331N,COST = -34,000 + 331N,Obviously having a negative intercept does not make any sense at all and it suggests that the model is misspecified

20、 in some way. We will come back to this later.,Regular School (OCC = 0) Occupational School (OCC = 1),DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES,Putting OCC equal to 1, we estimate the annual overhead cost of an occupational school to be 99,000 yuan. The marginal cost is the same as for regul

21、ar schools. It must be, given the model specification.,COST = -34,000 + 133,000OCC + 331N,COST = -34,000 + 331N,COST = -34,000 + 133,000 + 331N,= 99,000 + 331N,DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES,The scatter diagram shows the data and the two cost functions derived from the regression

22、results.,. Dependent Variable: COST Method: Least Squares Date: 05/16/04 Time: 19:22 Sample: 1 74 Included observations: 74 VariableCoefficientStd. Errort-StatisticProb. C-33612.5523573.47-1.4258640.1583 N331.449339.758448.3365780.0000 OCC133259.120827.596.3982010.0000 R-squared0.615637 Mean depende

23、nt var187418.0 Adjusted R-squared0.604810 S.D. dependent var141969.9 S.E. of regression89248.09 Akaike info criterion25.67592 Sum squared resid5.66E+11 Schwarz criterion25.76933 Log likelihood-947.0092 F-statistic56.86072 Durbin-Watson stat2.422989 Prob(F-statistic)0.000000,DUMMY VARIABLE CLASSIFICA

24、TION WITH TWO CATEGORIES,We will perform a t test on the coefficient of the dummy variable. our null hypothesis is that there is no difference in the overhead costs of the two types of school. The t statistic is 6.40, so it is rejected at the 0.1% significance level.,DUMMY CLASSIFICATION WITH MORE T

25、HAN TWO CATEGORIES,This sequence explains how to extend the dummy variable technique to handle a qualitative explanatory variable which has more than two categories.,COST = b1 + dTTECH + dWWORKER + dVVOC + b2N + u,DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES,In the previous sequence we used a

26、dummy variable to differentiate between regular and occupational schools when fitting a cost function.,COST = b1 + dTTECH + dWWORKER + dVVOC + b2N + u,DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES,In actual fact there are two types of regular secondary school in Shanghai. There are general scho

27、ols, which provide the usual academic education, and vocational schools.,COST = b1 + dTTECH + dWWORKER + dVVOC + b2N + u,DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES,As their name implies, the vocational schools are meant to impart occupational skills as well as give an academic education.,COS

28、T = b1 + dTTECH + dWWORKER + dVVOC + b2N + u,DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES,However the vocational component of the curriculum is typically quite small and the schools are similar to the general schools. Often they are just general schools with a couple of workshops added.,COST =

29、 b1 + dTTECH + dWWORKER + dVVOC + b2N + u,DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES,Likewise there are two types of occupational school. There are technical schools training technicians and skilled workers schools training craftsmen.,COST = b1 + dTTECH + dWWORKER + dVVOC + b2N + u,DUMMY CLA

30、SSIFICATION WITH MORE THAN TWO CATEGORIES,So now the qualitative variable has four categories. The standard procedure is to choose one category as the reference category and to define dummy variables for each of the others.,COST = b1 + dTTECH + dWWORKER + dVVOC + b2N + u,DUMMY CLASSIFICATION WITH MO

31、RE THAN TWO CATEGORIES,In general it is good practice to select the most normal or basic category as the reference category, if one category is in some sense more normal or basic than the others.,COST = b1 + dTTECH + dWWORKER + dVVOC + b2N + u,DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES,In th

32、e Shanghai sample it is sensible to choose the general schools as the reference category. They are the most numerous and the other schools are variations of them.,COST = b1 + dTTECH + dWWORKER + dVVOC + b2N + u,DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES,Accordingly we will define dummy varia

33、bles for the other three types. TECH will be the dummy for the technical schools: TECH is equal to 1 if the observation relates to a technical school, 0 otherwise.,COST = b1 + dTTECH + dWWORKER + dVVOC + b2N + u,DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES,Each of the dummy variables will have

34、 a coefficient which represents the extra overhead costs of the schools, relative to the reference category.,COST = b1 + dTTECH + dWWORKER + dVVOC + b2N + u,DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES,Note that you do not include a dummy variable for the reference category, and that is the re

35、ason that the reference category is usually described as the omitted category.,COST = b1 + dTTECH + dWWORKER + dVVOC + b2N + u,DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES,If an observation relates to a general school, the dummy variables are all 0 and the regression model is reduced to its ba

36、sic components.,COST = b1 + dTTECH + dWWORKER + dVVOC + b2N + u General SchoolCOST = b1 + b2N + u (TECH = WORKER = VOC = 0),DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES,If an observation relates to a technical school, TECH will be equal to 1 and the other dummy variables will be 0. The regress

37、ion model simplifies as shown.,COST = b1 + dTTECH + dWWORKER + dVVOC + b2N + u General SchoolCOST = b1 + b2N + u (TECH = WORKER = VOC = 0) Technical SchoolCOST = (b1 + dT) + b2N + u (TECH = 1; WORKER = VOC = 0),COST = b1 + dTTECH + dWWORKER + dVVOC + b2N + u General SchoolCOST = b1 + b2N + u (TECH =

38、 WORKER = VOC = 0) Technical SchoolCOST = (b1 + dT) + b2N + u (TECH = 1; WORKER = VOC = 0) Skilled Workers SchoolCOST = (b1 + dW) + b2N + u (WORKER = 1; TECH = VOC = 0) Vocational SchoolCOST = (b1 + dV) + b2N + u (VOC = 1; TECH = WORKER = 0),DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES,The reg

39、ression model simplifies in a similar manner in the case of observations relating to skilled workers schools and vocational schools.,COST,N,b1+dT,b1+dW,b1+dV,b1,Workers,Vocational,dW,dV,dT,DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES,The diagram illustrates the model graphically. The d coeffic

40、ients are the extra overhead costs of running technical, skilled workers, and vocational schools, relative to the overhead cost of general schools.,Technical,General,COST,N,dW,dV,dT,Note that we do not make any prior assumption about the size, or even the sign, of the d coefficients. They will be es

41、timated from the sample data.,DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES,Workers,Vocational,Technical,General,b1+dT,b1+dW,b1+dV,b1,School TypeCOST N TECH WORKERVOC 1Technical345,000623100 2Technical 537,000653100 3General 170,000400000 4Workers 526.000663010 5General 100,000563000 6Vocationa

42、l 28,000236001 7Vocational 160,000307001 8Technical 45,000173100 9Technical 120,000146100 10 Workers 61,00099010,Here are the data for the first 10 of the 74 schools. Note how the values of the dummy variables TECH, WORKER, and VOC are determined by the type of school in each observation.,DUMMY CLAS

43、SIFICATION WITH MORE THAN TWO CATEGORIES,DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES,The scatter diagram shows the data for the entire sample, differentiating by type of school.,Dependent Variable: COST Method: Least Squares Date: 05/16/04 Time: 20:32 Sample: 1 74 Included observations: 74 Va

44、riableCoefficientStd. Errort-StatisticProb. C-54893.0926673.08-2.0579960.0434 N342.633540.219508.5190900.0000 TECH154110.926760.415.7589150.0000 WORKER143362.427852.805.1471440.0000 VOC53228.6431061.651.7136450.0911 R-squared0.632050 Mean dependent var187418.0 Adjusted R-squared0.610719 S.D. depende

45、nt var141969.9 S.E. of regression88578.37 Akaike info criterion25.68634 Sum squared resid5.41E+11 Schwarz criterion25.84202 Log likelihood-945.3946 F-statistic29.63132 Durbin-Watson stat2.503728 Prob(F-statistic)0.000000,DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES,The coefficient of N indicat

46、es that the marginal cost per student per year is 343 yuan.,Dependent Variable: COST Method: Least Squares Date: 05/16/04 Time: 20:32 Sample: 1 74 Included observations: 74 VariableCoefficientStd. Errort-StatisticProb. C-54893.0926673.08-2.0579960.0434 N342.633540.219508.5190900.0000 TECH154110.9267

47、60.415.7589150.0000 WORKER143362.427852.805.1471440.0000 VOC53228.6431061.651.7136450.0911 R-squared0.632050 Mean dependent var187418.0 Adjusted R-squared0.610719 S.D. dependent var141969.9 S.E. of regression88578.37 Akaike info criterion25.68634 Sum squared resid5.41E+11 Schwarz criterion25.84202 L

48、og likelihood-945.3946 F-statistic29.63132 Durbin-Watson stat2.503728 Prob(F-statistic)0.000000,DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES,The coefficients of TECH, WORKER, and VOC are 154,000, 143,000, and 53,000, respectively, and should be interpreted as the additional annual overhead cos

49、ts, relative to those of general schools.,COST = -55,000 + 154,000TECH + 143,000WORKER + 53,000VOC + 343N General SchoolCOST= -55,000 + 343N (TECH = WORKER = VOC = 0) Technical SchoolCOST= -55,000 + 154,000 + 343N (TECH = 1; WORKER = VOC = 0)= 99,000 + 343N Skilled Workers SchoolCOST= -55,000 + 143,

50、000 + 343N (WORKER = 1; TECH = VOC = 0)= 88,000 + 343N Vocational SchoolCOST= -55,000 + 53,000 + 343N (VOC = 1; TECH = WORKER = 0)= -2,000 + 343N,DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES,Note that in each case the annual marginal cost per student is estimated at 343 yuan. The model specifi

51、cation assumes that this figure does not differ according to type of school.,DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES,The four cost functions are illustrated graphically.,TWO SETS OF DUMMY VARIABLES,The explanatory variables in a regression model may include multiple sets of dummy variable

52、s. This sequence provides an example of a model with two types.,COST = b1 + d OCC + e RES + b2N + u,TWO SETS OF DUMMY VARIABLES,We will continue with the school cost function model and extend it to take account of the fact that some of the schools are residential.,COST = b1 + d OCC + e RES + b2N + u

53、,TWO SETS OF DUMMY VARIABLES,To model the higher overhead costs of residential schools, we introduce a dummy variable RES which is equal to 1 for them and 0 for non-residential schools. e is the extra annual overhead cost of a residential school, relative to that of a non-residential one.,COST = b1

54、+ d OCC + e RES + b2N + u,TWO SETS OF DUMMY VARIABLES,We will also make a distinction between occupational and regular schools, using the dummy variable OCC defined in the first sequence.,COST = b1 + d OCC + e RES + b2N + u,TWO SETS OF DUMMY VARIABLES,In the case of a non-residential occupational sc

55、hool, RES is 0 and OCC is 1, so the overhead cost increases by d. If the school is both occupational and residential, it increases by (d + e).,COST = b1 + d OCC + e RES + b2N + u Regular, non-residentialCOST = b1 + b2N + u (OCC = RES = 0) Regular, residentialCOST = (b1 + e ) + b2N + u (OCC = 0; RES

56、= 1) Occupational, non-residentialCOST = (b1 + d ) + b2N + u (OCC = 1; RES = 0) Occupational, residentialCOST = (b1 + d + e ) + b2N + u (OCC = RES = 1),COST,N,b1+d +e,b1+d,b1+e,b1,Occupational, residential,Regular, non-residential,d,e,d +e,e,Occupational,non-residential,Regular,residential,TWO SETS

57、OF DUMMY VARIABLES,The diagram illustrates the model graphically. Note that the effects of the different components of the model are assumed to be separate and additive in this specification.,TWO SETS OF DUMMY VARIABLES,Here are the data for the first 10 schools. Note how the values of the dummy var

58、iables vary according to the characteristics of the school.,School Type Residential?COST N OCCRES 1OccupationalNo345,00062310 2Occupational Yes537,00065311 3Regular No170,00040000 4Occupational Yes526.00066311 5RegularNo100,00056300 6Regular No28,00023600 7Regular Yes160,00030701 8Occupational No45,

59、00017310 9Occupational No120,00014610 10 OccupationalNo61,0009910,Dependent Variable: COST Method: Least Squares Date: 05/16/04 Time: 21:06 Sample: 1 74 Included observations: 74 VariableCoefficientStd. Errort-StatisticProb. C-29045.2723291.54-1.2470310.2165 N321.833039.402258.1678840.0000 OCC109564.624039.584.5576740.0000 RES57909.0130821.311.8788630.0644 R-squared0.634090 Mean dependent var187418.0 Adj

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论