概率与统计(英文)chapter 12 Simple Linear Regression_第1页
概率与统计(英文)chapter 12 Simple Linear Regression_第2页
概率与统计(英文)chapter 12 Simple Linear Regression_第3页
概率与统计(英文)chapter 12 Simple Linear Regression_第4页
概率与统计(英文)chapter 12 Simple Linear Regression_第5页
已阅读5页,还剩22页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

1、12 Simple Linear Regression and CorrelationIntroduction Regression analysis is the part of statistics that deals with investigation of the relationship between two or more variables relation in a nondeterministic fashion. In this chapter, we generalize the linear relation y= 0+1x to a linear probabi

2、listic relationship, develop procedures for making inferences about the parameters of the model, and obtain a quantitative measure of the extent to which the two variables are related.12.1 The Simple Linear Regression ModelWhose value is fixed by the experimenter will be denote by x.x may called Ind

3、ependent variable, Predictor variable, Explanatory variableFor fixed x, the second variable y will be random, we called it Dependent variable, Response variable, Explained variableThe correlation relationship between variablesx=the age of childy=the size of vocabularyx=5y=100,200,300,400,500,1000 xE

4、y10ORxy10Example 12.1: Visual and musculoskeletal problems associated with the use of visual display terminals (VDTs) have become rather common in recent years. Some researchers have focused on vertical gaze direction as a source of eye strain and irritation. This direction is known to be closely re

5、lated to ocular surface area (OSA), so a method of measuring OSA is needed. The accompanying representative data on y=OSA (cm2) and x=width of the palprebal fissure is from the article “Analysis of Ocular Surface Area for Comfortable VDT Workstation Layout”. The order in which observations were obta

6、ined was not given, so for convenience they are listed in increasing order of x values.The first step in regression analysis-Scatter plotixiyiixiyi10.41.02161.153.1820.421.21171.23.7630.480.88181.253.6840.510.98191.253.8250.571.52201.283.2160.61.83211.34.2770.71.5221.343.1280.751.8231.373.9990.751.7

7、4241.43.75100.781.63251.434.1110.842261.464.18120.952.8271.493.77130.992.48281.554.34141.032.47291.584.21151.123.05301.64.92Example 12.2 Forest growth and decline phenomena throughout the world have attracted considerable public and scientific interest. The article “ Relationship among crown conditi

8、on, growth, and stand nutrition in seven northern sugarbushes” included a scatter plot y= mean crown dieback(%), one indicator of growth retardation, and x=soil PH (High PH corresponds to more acidic soil), from which the following observations were taken:xyxy3.33.37.37.33.93.96.66.63.43.410.810.84

9、410103.43.413.113.14.14.19.29.23.53.510.410.44.24.212.412.43.63.65.85.84.34.32.32.33.63.69.39.34.44.44.34.33.73.712.412.44.54.53 33.73.714.914.95 51.61.63.83.811.211.25.15.11 13.83.88 8Scatter plotA Linear Probabilistic Motel For the deterministic model y= 0+1x , the actual observed value of y is a

10、linear function of x. The appropriate generalization of this to a probabilistic model assumes that the expected value of Y is a linear function of x, but that for fixed x, the variable Y differs from its expected value by a random amount.The Simple Linear Regression ModelThere exist parameters 0,1,

11、and 2 such that for any fixed value of the independent variable x, the dependent variable is related to x through the model equation Y= 0+1x+ (12.1)The quantity in the model equation is a random variable, assumed to be normally distributed with E()=0 and V()= 2.),(iiyxixiyxy10iy 12.2 Estimating Mode

12、l ParametersPrinciple of Least SquaresThe vertical deviation of the point (xi,yi) from the line y=b0+b1x is height of point-height of line=yi-(b0+b1x)0101xy10The sum of squared vertical deviations from the points (x1,y1) (x2,y2) , (xn,yn) to the line is then f(b0,b1)=yi-(b0+b1xi)2. The point estimat

13、es of 0and1, denoted by and called the least squares estimates, are those values that minimize f(b0,b1). That is, and are such that for any b0 ,b1The estimated regression line or least squares line is then the line whose equation is),(),(1010bbff niiiyyQ12)(niiixy1210)(0)(21100niiixyQ0)(21101niiiixx

14、yQniiniiyxn1110niiiniiniiyxxx112110The normal equations:iniiniiniininiiyxnyxxnxi11112211)(niiniixnyn111011niixnx11Let niiyny11niixxxxS12)(niiyyyyS12)(niiixyyyxxS1)(2112)(1niinixnxi2112)(1niiniynyiniiniiiniiyxnyx1111The least squares estimate of the coefficientsof the true regression line are 10andTh

15、e regression equation is xy10 xxxyniniiiniiniiniiSSxnxyxnyxi12211111)(niiniixnyn111011xy1 Example 12.4 No-fines concrete, made from a uniformly graded coarse aggregate and a cement-water paste, is beneficial in areas prone to excessive rainfall because of its excellent drainage properties. Consider

16、the following representative data, displayed in a tabular format convenient for calculating the values of the summary statistics.ObsObs. .x xy yx2x2xyxyy2y21 1999928.828.8980198012851.22851.2829.44829.442 2101.1101.127.927.910221.2110221.212820.692820.69778.41778.413 3102.7102.7272710547.2910547.292

17、772.92772.97297294 410310325.225.210609106092595.62595.6635.04635.045 5105.4105.422.822.811109.1611109.162403.122403.12519.84519.846 610710721.521.511449114492300.52300.5462.25462.257 7108.7108.720.920.911815.6911815.692271.832271.83436.81436.818 8110.8110.819.619.612276.6412276.642171.682171.68384.

18、16384.169 9112.1112.117.117.112566.4112566.411916.911916.91292.41292.411010112.4112.418.918.912633.7612633.762124.362124.36357.21357.211111113.6113.6161612904.9612904.961817.61817.62562561212113.8113.816.716.712950.4412950.441900.461900.46278.89278.891313115.1115.1131313248.0113248.011496.31496.3169

19、1691414115.4115.413.613.613317.1613317.161569.441569.44184.96184.96151512012010.810.8144001440012961296116.64116.64sumsum1640.11640.1299.8299.8179849.7179849.732308.5932308.596430.066430.06Solution:niix1niiiyx11 .1640niiy18 .299niix1273.179849niiy1206.643059.32308niixnx11niiyny1134.109986667.19niixx

20、xxS12)(niiixyyyxxS1)(2112)(1niinixnxiniiniiiniiyxnyx1111196.521151 .164073.1798492542.471158 .2991 .164059.32308xxxySSxy110905. 0196.521542.47191.118)34.109)(90473066. 0(986667.19xy905. 091.118The equation of the estimated regression line isEstimating 2Definition:The fitted (or predicted) value are

21、obtained by successively substituting into the equation of the estimated regression line: . The residuals are the vertical deviations from the estimated line.nyyy,.,21,1101xynxxx,.,21,.,2102xy,10nnxy2102)()(SSEiiiixyyyThe error sum of squares, denoted by SSE, is 2)(SSRyyiThe regression sum of square

22、s, denoted by SSR, is2)(SSTyyiThe total sum of squares, denoted by SST, isAnd the estimate of 2 is 2)(2222nyynSSEsii2102)()(SSEiiiixyyyThe error sum of squares, denoted by SSE, is 2)(SSRyyiThe regression sum of squares, denoted by SSR, is2)(SSTyyiThe total sum of squares, denoted by SST, isSSRSSESST

23、SSRSSESSTDefinition: The coefficient of determination, denoted by r2, is given by It is interpreted as the proportion of observed y variation that can be explained by the simple linear regression model (attributed to an approximate linear relationship between y and x).21SSErSST The sample correlatio

24、n coefficient rDefinition:The sample correlation coefficient for the n pairs (x1,y1), (x1,y1), , (xn,yn) is yyxxxyiixySSSyyxxSr22)()(220121121222211111()()1()()()nniiiinyyiinniixyiiyyyyxxyyyyxySSESSRrSSTSSTSyyyxxyxxSSSS S Properties of rThe most important properties of r are as follows:1) The value

25、of r does not depend on which of the two variables under study is labeled x and which is labeled y.2) The value of r is independent of the units in which x and y are measured.3) -1r1.4) r=1 if and only if all (xi,yi) pairs lie on a straight line with positive slope, and r=-1 iff all (xi,yi) pairs li

26、e on a straight line with negative slope.5) The square of the sample correlation coefficient gives the value of the coefficient of determination that would result from fitting the simple linear regression model-in symbols, (r)2=r2.Example 12.9 The scatter plot of the no-fines concrete data in figure

27、 12.8 certainly portends a very high r2 value. 909917.1180Solution:90473066. 018 .299iy59.32308iiyx06.64302iy06.438057333.438158 .29906.64302SST44.114388.11)59.32308)(90473066. 0()8 .299)(909917.118(06.6430SSESo, the coefficient of determination is then 974. 0026. 0106.43844.1112rExercise P509 18 Th

28、e following summary statistics were obtained from a study that used regression analysis to investigate the relationship between pavement deflection and surface temperature of the pavement at various locations on a state highway. Here x= temperature ( 。F) and y=deflection adjustment factor (y0): 8518. 7,645.987,25.139037,68.10,1425,1522iiiiiiyyxxyxna. Compute , and the equation of the estimated regression line. Graph the estimated line. b. What is the estimate of expected change in the deflecti

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论