




版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
1、1Multiple Regression Analysis: Further Issues y = b0 + b1x1 + b2x2 + . . . bkxk + u2.6.1 数据的测度单位对OLS统计量的影响nChanging the scale of the y variable will lead to a corresponding change in the scale of the coefficients and standard errors, so no change in the significance or interpretationnChanging the sc
2、ale of one x variable will lead to a change in the scale of that coefficient and standard error, so no change in the significance or interpretation3. 1111122112221122122,var,1/,/11nniiiiiiniiiniinxiijjjjjrurjjurxxyyr yrxxyySSERsSSTyytseSSTRseSSRSSRqFc seSSRnkSERSSRnkbbbbbbbb.n因变量或自变量以对数形式出现,改
3、变度量单位不会影响斜率系数,只会改变截距项。 11logloglogiic ycy 11logloglogjjc xcx8.Beta CoefficientsnOccasional youll see reference to a “standardized coefficient” or “beta coefficient” which has a specific meaningnIdea is to replace y and each x variable with a standardized version i.e. subtract mean and divide by stan
4、dard deviationnCoefficient reflects standard deviation of y for a one standard deviation change in x 9.10.Beta Coefficients (cont)1 12211j.where denote the z-score of , is the z score of ,and so on. Andb(/) for 1,., are called beta coefficients.ykkyjyjzb zb zb zerrorzy zxjkb11.nExample 6.1 :Effects
5、of Pollution on Housing Prices(数据名:HPRICE2)Stata 命令语句: reg price nox crime rooms dist stradio,beta (标准化后的回归分析)12.6.2 对函数形式的进一步讨论n OLS can be used for relationships that are not strictly linear in x and y by using nonlinear functions of x and y will still be linear in the parametersn Can take the nat
6、ural log of x, y or bothn Can use quadratic forms of xn Can use interactions of x variables13.Interpretation of Log Modelsnln(y) = b0 + b1ln(x) + u b1 is the elasticity of y with respect to xnln(y) = b0 + b1x + u b1 is approximately the percentage change in y given a 1 unit change in x ny = b0 + b1l
7、n(x) + u b1 is approximately the change in y for a 100 percent change in x14.Why use log models?n1.使用自然对数使得对系数的解释颇具有吸引力,可以直接以弹性的形式体现出来。n2.斜率系数不随测量单位的变化而变化。n3.取对数后,即使不能消除异方差的影响,但可以使之有所缓解。n4.取对数通常会缩小变量的取值范围,在某些情况下还相当可观。n5.缺点:变量不能取零和负值;更难预测原变量的值,原模型使我们预测log(y),而不是y。15.Quadratic Models(二次式模型)(二次式模型)n Fo
8、r a model of the form y = b0 + b1x + b2x2 + u we cant interpret b1 alone as measuring the change in y with respect to x, we need to take into account b2 as well, since1212122, so 20,maxmin2yxxyxxywhenthat isxy reach itsi orvaluexbbbbbb 16.More on Quadratic Modelsn Suppose that the coefficient on x i
9、s positive and the coefficient on x2 is negativen Then y is increasing in x at first, but will eventually turn around and be decreasing in x21*212at be willpoint turning the0 and 0For bbbbx17.More on Quadratic Modelsn Suppose that the coefficient on x is negative and the coefficient on x2 is positiv
10、en Then y is decreasing in x at first, but will eventually turn around and be increasing in x0 and 0 when as same theis which ,2at be willpoint turning the0 and 0For 2121*21bbbbbbx18.How to describe decreasing effect19.How to describe increasing effect20.二次项模型案例:污染对住房价格的影响(数据名:HPRICE2)2log(price) 13
11、.39 0.902log(nox) 0.087log(dist) 0.545rooms 0.0620.048roomsstratio (0.57)(0.115) (0.043) (0.165) (0.013) (0.006)N=506 ,R2=0.603 log(price)(0.54520.062) room sroom s% (price)100( 0.5452 0.062)rooms rooms 100( 0.5452 12.4)rooms rooms 在room*=0.545/(2*0.062)4.4的右边,增加一个卧室对价格的百分比变化具有递增的影响。21.n比如,rooms从5增加
12、到6会导致价格提高约为 -54.5+12.4*5=7.5%; rooms从6增加到7会导致价格提高约为-54.5+12.4*6=19.9%。这是一个很强的递增影响。STATA命令语句:gen rooms2=rooms*rooms gen ldist=log(dist)reg lprice lnox ldist rooms rooms2 stratiodisplay -1*_brooms/(2*_brooms2)(求转折点)display 100*(_brooms+2*_brooms2*6(求rooms从6增加到7会导致价格提高的百分比)。22.Interaction Terms(有交互作用项的
13、模型)(有交互作用项的模型)n For a model of the form y = b0 + b1x1 + b2x2 + b3x1x2 + u we cant interpret b1 alone as measuring the change in y with respect to x1, we need to take into account b3 as well, since 132112, so to summarizethe effect of on we typicallyevaluate the above at yxxxyxbb23.n交互效应通常需将模型重新参数化:n
14、原模型:01 1223 12yxxx xbbbb2b 是x2=0时,X2对y的偏效应,这通常没有什么意义,我们转而将模型重新参数化为:01 1221122()(x)yxxx其中, 和 分别是x1和x2的总体均值。很容易计算出:我们立即得到在均值的偏效应。 122231bb 24.nExample: Effects of Attendance on Final Exam Performance(数据名:ATTEND.DTA)natndrte系数为负,是否意味着听课对期末考试分数具有负面影响? b1仅考虑了priGPA=0时的影响。natndrte和priGPAatndrte系数估计值t值不显著,
15、是否意味着两者对期末考试分数无影响? F检验的p值为0.014.25.nAtndrte对stndfnl的偏效应: 其含义是:在priGPA的平均水平(2.59)上,atndrte提高10个百分点,使stndfnl比期末考试平均分数高出0.078倍。 0.00670.0056 2.590.00781161162.59,2.59bbbb01660612.592.59stndfnlatndrtepriGPA atndrteuatndrtepriGPAatndrteubbbbb10.0078 0.00263t26.STATA命令语句:nsum priGPAngen priGPA2=priGPA*pri
16、GPAngen ACT2=ACT*ACTngen priatn=priGPA*atndrtenreg stndfnl atndrte priGPA priGPA2 ACT2 priatn27.6.3 拟合优度和回归元选择的进一步探讨n Recall that the R2 will always increase as more variables are added to the modelnThe adjusted R2 takes into account the number of variables in a model, and may decrease 2222221111111
17、1111uySSR nRSST nSSRnkRSSTnSSTnRnnk 28.n调整R方的作用:为在一个模型中另外增加自变量施加了惩罚。 随着一个新的自变量加入回归方程,SSR下降,但回归中的自由度df=n-k-1也下降。因此,SSR/(n-k-1)可能上升,也可能下降。n作为一个结论有: 在回归中增加一个新变量,当且仅当新变量的t统计量在绝对值上大于1,调整R方才会有所提高; 在回归中增加一组变量时,当且仅当这组新变量联合显著性的F统计量大于1,调整R方才会有所提高。29.Adjusted R-Squared (cont)n Its easy to see that the adjusted
18、 R2 is just (1 R2)(n 1) / (n k 1), but most packages will give you both R2 and adj-R2n You can compare the fit of 2 models (with the same y) by comparing the adj-R2n You cannot use the adj-R2 to compare models with different ys (e.g. y vs. ln(y)30.Using adjusted R-squared to choose between nonnested
19、 models.20.6211R 20.6226R 220.061,0.03RR220.148,0.09RR31.391732982salarySST66.72lsalarySST32.Controlling too many factors in regression analysis回归分析中控制了过多的因素nImportant not to fixate too much on adj-R2 and lose sight of theory and common sensenIf economic theory clearly predicts a variable belongs, g
20、enerally leave it innDont want to include a variable that prohibits a sensible interpretation of the variable of interest remember ceteris paribus interpretation of multiple regression33.u在研究啤酒税对交通死亡率影响的回归模型中, 是否应该将人均啤酒消费量变量包括在模型之中?u在保持beercons不变的情况下,死亡率因tax提高 1个百分点而导致的差异。这一说法是否有意义?34.Adding regress
21、ors to reduce the error of variance-增加回归元以减少误差方差 n在回归中增加一个新的自变量会加剧多重共线性问题;另一方面,从误差项中取出一些因素作为解释变量可以减少误差方差。n应该将那些影响y而又与所有我们关心的自变量都无关的自变量包括进来。35.6.4 预测和残差分析n Suppose we want to use our estimates to obtain a specific prediction.n First, suppose that we want an estimate of E(y|x1=c1,xk=ck) = 0 = b0+b1c1+
22、 + bkckn This is easy to obtain by substituting the xs in our estimated model with cs , but what about a standard error?n Really just a test of a linear combination36.Predictions (cont)n Can rewrite as b0 = 0 b1c1 bkckn Substitute in to obtain y = 0 + b1 (x1 - c1) + + bk (xk - ck) + u n So, if you r
23、egress yi on (xij - cij) the intercept will give the predicted value and its standard errorn Note that the standard error will be smallest when the cs equal the means of the xs37.Example: CI for predicted college GPA2.7 1.96 0.020: 2.66 2.74CI01200,030,05satsathsperchsperchsizehsize38.Predictions (c
24、ont)n This standard error for the expected value is not the same as a standard error for an new outcome on yn We need to also take into account the variance in the unobserved error. 39. 00000001 1000001 10000110001 10000012200202000 , so kkkkkkkkeyyxxuyyxxE yEExExxxE eE uVar eVar yVar usVar yye ese
25、yebbbbbbbbbbbbbj的方差有两个来源:第一个是 的抽样误差,来自我们对20var1jjsecnynb的估计。因为,所以与成比例。第二个是总体误差的方差,它不随样本容量的变化而变化。40. 00000CLM10.95,95%j000.0250.025ueese enktP -tese etyb在经典线性模型()假定下,和都是正态分布,所以也是正态分布的。服从一个自由度为的 分布。则 的一个的预测区间: 000.025ytse e41.42.Prediction interval 0025.00000100for interval prediction 95% a have we gi
26、ven that so ,esetyyyyeteseeknnUsually the estimate of s2 is much larger than the variance of the prediction, thusnThis prediction interval will be a lot wider than the simple confidence interval for the prediction43.Residual AnalysisnInformation can be obtained from looking at the residuals (i.e. pr
27、edicted vs. observed)nExample: Regress price of cars on characteristics big negative residuals indicate a good dealnExample: Regress average earnings for students from a school on student characteristics big positive residuals indicate greatest value-addediiiuyy44.n例如,HPRICE1.RAW的住房价格模型中。01238181811
28、20.206pricelotsizesqrftbdrmsuupricepricebbbb 45.46.Predicting y in a log modelnSimple exponentiation of the predicted ln(y) will underestimate the expected value of ynInstead need to scale this up by an estimate of the expected value of exp(u)47. 01 101 101 101 1200121logexpexpexp|expexpIn this case can predict y as followsexp2 exp
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 皖豫联盟体2025届物理高二下期末经典试题含解析
- 新疆乌鲁木齐市天山区兵团第二中学2024-2025学年高二下数学期末教学质量检测模拟试题含解析
- 部队药品及疫苗采购及仓储服务合同
- 某自然博物馆插班生入学协议及自然科学教育服务合同
- 仓储企业仓单质押贷款业务合同范本
- 车辆质押贷款及售后服务合同
- 2024年攀枝花市仁和区向招考社区工作者笔试真题
- 简版房屋租赁合同(17篇)
- 湖南中烟工业有限责任公司招聘考试真题2024
- 能源知识竞赛复习测试有答案(一)
- 2025年小学一年级语文考试趣味试题及答案
- 社会科学领域课题研究报告范文
- 成人脓毒症患者医学营养治疗指南(2025版)
- 生物工程细胞培养技术试题
- 2024年山东枣庄技师学院招聘考试真题
- 静脉采血室工作制度
- 液压缸设计模板
- 2024年全国高中数学联赛(四川预赛)试题含答案
- 2024北京西城区初一(下)期末道法试题和答案
- 《基于STM32单片机健康监测模块的设计与实现》7200字(论文)
- 静脉留置针留置护理
评论
0/150
提交评论