




版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
1、1,第七章多元回归分析,异方差问题的处理,2,contents,Whats heteroskedasticity? Why worry about heteroskedasticity? How to test the heteroskedasticity? Corrections for heteroskedasticity?,3,Whats heteroskedasticity?,4,What is Heteroskedasticity,Recall the assumption of homoskedasticity implied that conditional on the exp
2、lanatory variables, the variance of the unobserved error, u, was constant var(u|x)=s2 (homoskedasticity) If this is not true, that is if the variance of u is different for different values of the xs, then the errors are heteroskedastic var(ui|xi)=si2(heteroskedasticity) Example: if we examine a cros
3、s section of firms in one industry, error terms associated with very large firms might have larger variances than those error terms associated with smaller firms; sales of larger firms might be more volatile than sales of smaller firms. Consider a cross-section study of family income and expenditure
4、s. It seems plausible to expect that low income individuals would spend at a rather steady rate, while the spending patterns of high income families would be relatively volatile.,5,.,x,x1,x2,y,f(y|x),Example of Heteroskedasticity,x3,.,.,E(y|x) = b0 + b1x,6,Patterns of heteroskedasticity,7,Why Worry
5、About Heteroskedasticity?,8,Why Worry About Heteroskedasticity?,OLS is still unbiased and consistent, even if we do not assume homoskedasticity The R2 and adj-R2 are unaffected by heteroskedasticity. The standard errors of the estimates are biased if we have heteroskedasticity The OLS estimates aren
6、t efficient, thats the variances of the estimates are not the smallest variances. If the standard errors are biased, we can not use the usual t statistics or F statistics or LM statistics for drawing inferences,9,How to test the heteroskedasticity?,10,Testing for HeteroskedasticityGolfeld-Quandt Tes
7、t,11,Testing for HeteroskedasticityGolfeld-Quandt Test,Essentially want to test H0: Var(u|x1, x2, xk) = s2, which is equivalent to H0: E(u2|x1, x2, xk) = E(u2) = s2 H1: si2 = cxi2. Goldfeld-Quandt test procedure: Order the data by the magnitude of the independent variable x, which is thought to be r
8、elated to the error variance. Omit the middle d observations. d might be chosen, for example, to be approximately 1/5 of the total sample size. Fit the two separate regressions, the first for the portion of the data associated with low values of x and the second associated with high values of x. eac
9、h regression will involve (n-d)/2 pieces of data and (n-d)/2-k-1 degrees of freedom. Calculate the residual sum of squares associated with each regression: SSR1 associated with low xs and SSR2 associated with high xs. The statistic SSR2 /SSR1 will be distributed as an F statistic with n-d-2(k+1)/2 d
10、egress of freedom in both the numerator and the denominator.,12,Example:Goldfeld-Quandt Test, (HR: Ex6.2, 154),Insheet using pathex61.txt sort inc reg hexp inc if inc=15, get SSR2=2.024, n1=n2=10, k+1=2 Form statistic F=SSR2/SSR1=6.7467 The critical value F8,8=3.438 So we reject the null hypothesis
11、and commit that the data are heteroskedasticity.,13,Testing for Heteroskedasticity,Essentially want to test H0: Var(u|x1, x2, xk) = s2, which is equivalent to H0: E(u2|x1, x2, xk) = E(u2) = s2 If assume the relationship between u2 and xj will be linear, can test as a linear restriction So, for u2 =
12、d0 + d1x1 + dk xk + v, this means testing H0: d1 = d2 = = dk = 0,14,The Breusch-Pagan Test,Dont observe the error, but can estimate it with the residuals from the OLS regression regress y on x1,x2,xk. We get the residual i After regressing the residuals squared on all of the xs, can use the R2 to fo
13、rm an F or LM test regress 2 on x1,x2,xk. And test the joint zero hypotheses of the regressors. The F statistic is just the reported F statistic for overall significance of the regression, F = R2/k/(1 R2)/(n k 1), which is distributed Fk, n k - 1 The LM statistic is LM = nR2, which is distributed c2
14、k,15,Ex6.2 HR book,reg hexp inc /* use all observations*/ predict res, r /* get the residuals*/ gen ressq=res2 /*square of res*/ reg ressq inc get the F value is 10.13 and p-value is 0.52%. So, we reject the null hypothesis of homoskedasticity at 1% significance. Use LM test, nR=200.36=7.2 The criti
15、cal value 2(1)=3.84, p-value is 0.73%, we get the same result.,16,Example: Housing price Equation (Wooldridge, p267),Estimated model prce =-21770.31+2.068lotsize + 122.778sqrft + 13852.52 bdrms predict res, r. we get the residuals i of above eq. gen ressq=res2 reg ressq on lotsize, sqrft, bdrms ress
16、q=-5.52e9+201520.9lotsize+1691037sqrft+1.04e9bdrms F=5.34 p-value = 0.20% nR2=880.1601=14.1152 2(3)=7.8147 p-value = 0.28% So, we have a strong evidence to reject the null hypothesis of homoskedasticity.,17,Example: Housing price Equation (Wooldridge, p267), cont.,We check whether there is heteroske
17、dasticity in log form. Estimated model is log(prce) =5.611+0.168log(lotsize) + 0.700log(sqrft) + 0.037 bdrms predict resid, r gen residsq=resid2 regress residsq on log(lotsize), log(sqrft), bdrms resdsq=0.510 0.007 log(lotsize)-0.063 log(sqrft)+0.017 bdrms F=1.41 p-value=24.51% nR2=88*0.048=4.224, p
18、-value=23.83% So, we cant reject the null hypothesis and there is no heteroskedasticity.,18,The White Test,The Breusch-Pagan test will detect any linear forms of heteroskedasticity The White test allows for nonlinearities by using squares and crossproducts of all the xs, ie, k=3 2= d0 + d1 x1+ d2x2
19、+d3 x3 + d4 x12+d5x22 +d6x32+d7x1x2+d8x1x3+d9x2x3+v Still just using an F or LM to test whether all the xj, xj2, and xjxh are jointly significant, This can get to be unwieldy pretty quickly,19,Alternate form of the White test,Consider that the fitted values from OLS, , are a function of all the xs T
20、hus, 2 will be a function of the squares and crossproducts and and 2 can proxy for all of the xj, xj2, and xjxh, so Regress the residuals squared on and 2 and use the R2 to form an F or LM statistic Note only testing for 2 restrictions now The procedure of a special case of white test: regress y on
21、x1,x2,xk. We get the residual i Calculate , 2 (predict ybar,xb. Gen ybarsq=ybar2) regress 2 on , 2 . And test the joint zero hypotheses of the regressors Use F statistic or LM test to test the null hypothesis of homoskedasiticity.,20,Example: white test in the log housing price equation,log(prce) =5
22、.611+0.168log(lotsize) + 0.700log(sqrft) + 0.037 bdrms predict resid, r predict lpbar gen residsq=resid2 gen lpbarsq=lpbar2 regress residsq on lpbar lpbarsq resdsq=23.778 3.714lpbar +0.145lpbarsq F=1.73 p-value=18.30% nR2=88*0.0392=3.4496, p-value=17.82% We still get the same result as BP test, and
23、there is no heteroskedasticity,21,Corrections for Heteroskedasticity,22,Corrections for Heteroskedasticity,Known variances Var(ui|x)=si2 The original model is y =b0 + b1x1 + bkxk+ u Two sides divided by si at the same time The new disturbance is ui*=ui/si ,then var(ui*)=var(ui/si)=var(ui)/si2=1 So t
24、he new model y/si =b0/si + b1x1/si + bkxk/si+ u/si, that is, y* =b0* + b1x1* + bkxk*+ u* We can estimate the new model with OLS, this is called WLS But, usually, we dont know the variances.,23,Case of form being known up to a multiplicative constant,Suppose the heteroskedasticity can be modeled as V
25、ar(u|x) = s2h(x), where the trick is to figure out what h(x) hi looks like E(ui/hi|x) = 0, because hi is only a function of x, and Var(ui/hi|x) = s2, because we know Var(u|x) = s2hi So, if we divided our whole equation by hi we would have a model where the error is homoskedastic,24,Example: Simple S
26、avings Function,Using data saving.raw, the OLS regression is svI = -124.84 + 0.147 incI The WLS regression is sv*I = -124.95wb + 0.172 inc*I (480.86) (0.057) n=100 R2=0.2259 Where, wb = 1/sqrt (inci). you can write it as svi= -124.95 + 0.172 inci,25,Generalized Least Squares,Estimating the transform
27、ed equation by OLS is an example of generalized least squares (GLS) GLS will be BLUE in this case,(because the transformed equation will meet the Gauss-Markov assumption) GLS is a weighted least squares (WLS) procedure where each squared residual is weighted by the inverse of Var(ui|xi),26,More on W
28、LS,27,More on WLS, cont.,28,Summary of WLS,WLS is great if we know what Var(ui|xi) looks like In most cases, wont know form of heteroskedasticity Example where do is if data is aggregated, but model is individual level Want to weight each aggregate observation by the inverse of the number of individ
29、uals,29,Feasible GLS,More typical is the case where you dont know the form of the heteroskedasticity In this case, you need to estimate h(xi) Typically, we start with the assumption of a fairly flexible model, such as Var(u|x) = s2exp(d0 + d1x1 + + dkxk) Since we dont know the d, must estimate,30,Fe
30、asible GLS (continued),Our assumption implies that u2 = s2exp(d0 + d1x1 + + dkxk)v Where E(v|x) = 1, then if E(v) = 1 ln(u2) = a0 + d1x1 + + dkxk + e Where E(e) = 1 and e is independent of x Now, we know that is an estimate of u, so we can estimate this by OLS,31,Feasible GLS (continued),Now, an est
31、imate of h is obtained as = exp(), and the inverse of this is our weight So, what did we do? Run the original OLS model, save the residuals, , square them and take the log Regress ln(2) on all of the independent variables and get the fitted values, Do WLS using 1/exp() as the weight,32,Example of FG
32、LS: Demand for Cigarettes (Smoke.raw),What determine the demand of people? Model cgs = -3.64 + 0.88 log(income) 0.75 log(cigpric) 0.50 educ + 0.77 age 0.009 age2 2.83 restaurn Use Breusch-Pagan test the heteroskedasticity: Get 2 and reg 2 on all independent variables Get F=5.55 p-value=0 Or, LM=8070
33、.04=32.8 p-value =0.000014 reg ln(2) on all the independent variables and get the fitted value Transforming all the data with 1/e, and regress the transformed equation without constant. cgs = 5.63 + 1.295 log(income) 2.94 log(cigpric) 0.463 educ + 0.482 age 0.0056 age2 3.461 restaurn The income effe
34、ct is now statistically significant and larger in magnitude. The estimates changed somewhat, but the basic story is still the same. Cigarette smoking is negatively related to schooling, has a quadratic relationship with age, and is negatively affected by restaurant smoking restrictions.,33,Variance with Heteroskedasticity,34,Variance with Heteroskedasticity,35,Robust Standard Errors,Now that we have a consistent estimate of the variance, the square root can be used as a standard error for inference Typically call these robust standa
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 齿轮技术员岗位面试问题及答案
- 智能教学设备运维师岗位面试问题及答案
- 知识图谱工程师岗位面试问题及答案
- 湖南省邵东三中2025届高一下化学期末监测试题含解析
- 2025届新疆昌吉市第九中学高一化学第二学期期末学业水平测试试题含解析
- 第六单元名著导读《水浒传》基本知识点梳理+2025-2026学年统编版语文九年级上册
- 中子星吸积现象-洞察及研究
- 桐庐退役警犬管理办法
- 北京社区规约管理办法
- 材料安装合同管理办法
- 2025全员安全生产责任制范本
- 林业行政执法培训
- 电大考试试题及答案商法
- 广西壮族自治区柳州市上进联考2024-2025学年高一下学期6月期末联合考试数学试题(含答案)
- 高中英语必背3500单词表完整版
- 大连农商银行2024年招聘172人管理单位遴选500模拟题附带答案详解
- 安徽省工伤职工停工留薪期分类目录
- 2019-2020学年湖南长沙长郡中学高一入学分班考试数学卷(常用)
- 职业安全卫生知识竞赛题
- SLAP损伤的治疗课件
- 广东省外语艺术职业学院后勤服务项目检查评分标准
评论
0/150
提交评论