Multiple Regression_第1页
Multiple Regression_第2页
Multiple Regression_第3页
Multiple Regression_第4页
Multiple Regression_第5页
已阅读5页,还剩10页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

1、3Multiple Regression3.1IntroductionThe bivariate regression model analysed so far is quite restrictive as it only allows Y to be influenced by a single regressor X. A moments reflection about the two examples that we have been using shows how limited this model is: why shouldnt salary be influenced

2、by both post-school education and experience, and ought we to consider determinants of consumption other than just income, such as inflation and interest rates?The question that we address now is how should we go about estimating the parameters of the model,One possibility might be to regress Y on s

3、eparately, thus obtaining the bivariate regression slope estimatesbut two intercept estimatesThis non-uniqueness of the intercept estimates is just one problem of this approach. An arguably much more serious defect is that, in general, and as we will show later, are biased estimates of :To obtain BL

4、U estimates, we must estimate a multiple regression by OLS. The PRF now becomesand assumption 4 generalises toOLS estimates are obtained by minimisingwith respect to . This yields three normal equations, which can be solved to obtainIt can also be shown thatThese formulae contain , an estimate of wh

5、ich is now given bywhereIt then follows that(Note that the degrees of freedom are because we now have two regressors). Hypothesis tests and confidence intervals can be constructed in an analogous fashion to simple, bivariate, regression.However, with two regressors, further hypotheses are of interes

6、t. One in particular is the joint hypothesiswhich is to be tested against the alternative that are not both zero. This can be tested using the method introduced previously, being based on the statisticwhere (=RSS)andNow, analogous to in bivariate regression, we can define the coefficient of MULTIPLE

7、 determination as , and some algebra then yieldsSalaries, education and experience yet againReturning to our salaries example, multiple regression yields the following estimates, now denoting education as and experience as and noting that the common denominator isThe variances and standard errors of

8、 the slope estimates are thusFor the variance and hence standard error of the intercept, we needso thatandThe t-ratios for testing the individual hypotheses and are,andSince , can be rejected, and cannot be rejected at the 5% level. strictly cannot be rejected, but it can be at the 5.02% level (whic

9、h is known as the p- or prob-value of the test).Furthermore,andAs , we can reject . The regressors explain 94.4% of the variation in Y.The estimated multiple regression has some interesting features that are worth comparing with the two bivariate regressions:andFirst, the estimates of and are close

10、to each other in the multiple and Y on regressions, but the estimates and of are very different: indeed, they are of different signs and the former is insignificant. Consequently, it appears that , post-school education, is not a significant factor in determining salaries after all, and only experie

11、nce counts. Education appears significant when experience is excluded, so that it is acting as a proxy and the Y on regression is spurious.Second, in the multiple regression we have only two degrees of freedom, so information is extremely limited. This is why we need t-ratios in excess of 4.3 for co

12、efficients to be significant at the 5% level and this can be difficult to achieve. It is often wise to choose larger significance levels (and hence smaller critical values) for small sample sizes and, conversely, low significance levels for very large samples. Further, note that the F-statistic reje

13、cts even though cannot be rejected on a t-test: just one of the coefficients need to be non-zero for to be rejected.The spurious nature of the Y on regression can be explained algebraically. Comparing the formulae for and ,we see that only if , i.e., only if are uncorrelated (, in which case we will

14、 also have ). If the two estimates are identical, multiple regression collapses to a pair of simple regressions. In general, though,Thussince the last term in the formula for has zero expectation from assumption 4. Hence, if and are the same sign, and the bias is positive, whereas if they are of opp

15、osite sign the bias is negative. Two related points are worth noting: (i) is the same sign as the correlation between , and (ii) is the slope coefficient in the regression of . We can thus explain why we obtained the results that we did from simple regression: are positively correlated () and, if is

16、 actually zero and is positive, , so that obtaining when is consistent with theory.3.2Regression with k Explanatory VariablesLet us now consider the general multiple regression case when we have k regressors:OLS estimates are BLUE under our set of assumptions. We do not provide formulae as they are

17、impossible to obtain algebraically without using a matrix formulation (which we shall present later in the course) and hence can only realistically be calculated using an appropriate econometric computer package. Nevertheless, all standard results carry forward with minor alterations to reflect the

18、number of regressors, e.g.,We have referred to the quantity as the degrees of freedom of the regression. Why is this? Suppose , so that the normal equations areThese three equations fix the values of three residuals, so that only are free to vary. Thus, if there are regressors then there are NO degr

19、ees of freedom and the regression technique breaks down (in practice, it becomes problematic well before this limit is reached: recall the salary example!)Including an additional variable in the regression cannot increase the RSS, for it will always explain some part of the variation of Y, even if o

20、nly a tiny (and insignificant) amount. Hence, from its definition, will increase towards 1 as more regressors are added, even though they may be unnecessary. To adjust for this effect, we can define the R-bar-squared statistic will only increase when an additional regressor is included if the t-rati

21、o associated with the regressor exceeds unity, and it can even go negative!3.3Hypothesis Tests in Multiple RegressionThere are a variety of hypotheses in multiple regression models that we might wish to consider. All can be treated within the general framework of the F-test by interpreting the null

22、hypothesis as a set of (linear) restrictions imposed on the regression model which we wish to test to find out whether they are acceptable:where: RSS from the unrestricted regression, i.e., the regression estimated under the alternative hypothesis (without restrictions).: RSS from the restricted reg

23、ression, i.e., the regression estimated under the null hypothesis (with restrictions imposed).r: the number of restrictions imposed by the null hypothesis.An equivalent form of the test statistic, which may be easier to compute from regression output, iswhere are the s from the unrestricted and rest

24、ricted regressions respectively.Some examples of hypotheses that might be investigated are the following:(a) We might be interested in testing whether a subset of the coefficients are zero, e.g.,i.e., that the last r regressors are irrelevant (the theory that suggests including them in the model is

25、false). Here the restricted regression is one that contains the first regressors (note that the ordering of the regressors is our choice):(b) A more complicated type of restriction is where the coefficients obey a linear restriction of the general formwhere are constants. An example of this type of

26、restriction, which occurs quite regularly in economics, is two coefficients summing to zero, e.g., . This is obtained from the general form by setting . To construct a test of this hypothesis we have to be able to estimate the restricted regression. Suppose for simplicity. The restricted model is th

27、en, since the hypothesis implies ,orso that the restricted model is the regression of Y on . An equivalent test is the t-ratio on in the regression of Y on X* and - why? Modelling food expenditureLet us now consider a detailed example of multiple regression modelling. Here we use data provided by Do

28、ugherty, Introduction to Econometrics, to model the determinants of food expenditure in the U.S. from 1959 to 1983. The assumed model is that the aggregate expenditure on food, Y, depends upon aggregate personal income, Z, aggregate personal taxation, T, and the relative price of food, P. (Note that

29、 we have given distinct letters to the regressors for easier recognition and t subscripts are used below because we are dealing with time series data).The estimated regression, using annual observations, isThe t-ratios in show that each individual regressor is significant and , being so close to 1,

30、confirms the overall significance of the regression (indeed, the associated F-statistic is 1012!) The residual sum of squares, , is given so that it can be used for subsequent calculations, and the residual standard error, , is reported rather than the residual variance, because, being in the same u

31、nits of measurement as Y, it is easier to interpret.Suppose that an alternative theory suggests that food expenditure is dependent upon income alone, i.e., that . The restricted regression isThe F-statistic testing isThus the restriction is unacceptable and the alternative theory is therefore invali

32、d. Note the large changes in the coefficient estimates in the restricted regression compared to those in the original regression they have roughly halved in size, providing another example of how omitting important variables, here T and P, can seriously bias the coefficients of the remaining regress

33、ors!Now, note that in the unrestricted regression , which suggests testing the hypothesis . The easiest way of doing this is to impose the restriction directly onto the regressionso that we now have a new, combined regressor . Estimating this regression yieldsThe F-statistic for testing isWe see tha

34、t the restriction fits almost perfectly. is estimated much more precisely (its standard error is now 0.003) and are almost unchanged.Is there an economic rationale for this restriction? Indeed there is! The tax variable is defined asso thati.e., Y depends (partially) on personal disposable income (P

35、DI) and not on personal income and tax seperately.3.4MulticollinearityWhen estimating multiple regressions, there is an important problem that is often encountered that has no counterpart in simple regression: multicollinearity. We investigate this problem by employing another simulation example. Su

36、ppose that the true PRF isand that the two regressors are correlated. To design this correlation , the following generating process for was used:and was drawn from a uniform distribution between 0 and 10. The choice of error variance, , is at our disposal. We now investigate what happens when we red

37、uce : i.e., what happens when we make the relationship between tighter and tighter.The regression estimates as is reduced are shown in the table below. The coefficient estimates get increasingly volatile and their standard errors blow up. Yet the fit of the regression hardly alters: for , , and, alt

38、hough the estimates are all individually insignificant, implies a strong relationship between Y and the regressors: are therefore jointly, but not individually, significant. What is going on here?We can analyse the problem analytically using the standard least squares formulae40.8248.56(3.42)3.28(0.94)2.64(0.38)0.87230.89216.15(4.11)4.02(1.22)2.03(0.55)0.

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论