Linear Regression.ppt_第1页
Linear Regression.ppt_第2页
Linear Regression.ppt_第3页
Linear Regression.ppt_第4页
Linear Regression.ppt_第5页
已阅读5页,还剩30页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

1、Correlation and Linear Regression,Microbiology 3053 Microbiological Procedures,Correlation,Correlation analysis is used when you have measured two continuous variables and want to quantify how consistently they vary together The stronger the correlation, the more likely to accurately estimate the va

2、lue of one variable from the other Direction and magnitude of correlation is quantified by Pearsons correlation coefficient, r Perfectly negative (-1.00) to perfectly positive (1.00) No relationship (0.00),Correlation,The closer r = |1|, the stronger the relationship R=0 means that knowing the value

3、 of one variable tells us nothing about the value of the other Correlation analysis uses data that has already been collected Archival Data not produced by experimentation Correlation does not show cause and effect but may suggest such a relationship,Correlation Causation,There is a strong, positive

4、 correlation between the number of churches and bars in a town smoking and alcoholism (consider the relationship between smoking and lung cancer) students who eat breakfast and school performance marijuana usage and heroin addiction (vs heroin addiction and marijuana usage),Visualizing Correlation,S

5、catterplots are used to illustrate correlation analysis Assignment of axes does not matter (no independent and dependent variables) Order in which data pairs are plotted does not matter In strict usage, lines are not drawn through correlation scatterplots,Correlations,Linear Regression,Used to measu

6、re the relationship between two variables Prediction and a cause and effect relationship Does one variable change in a consistent manner with another variable? x = independent variable (cause) y = dependent variable (effect) If it is not clear which variable is the cause and which is the effect, lin

7、ear regression is probably an inappropriate test,Linear Regression,Calculated from experimental data Independent variable is under the control of the investigator (exact value) Dependent variable is normally distributed Differs from correlation, where both variables are normally distributed and sele

8、cted at random by investigator Regression analysis with more than one independent variable is termed multiple (linear) regression,Linear Regression,Best fit line based on the sum of the squares of the distance of the data points from the predicted values (on the line),Linear Regression,y = a + bx wh

9、ere a = y intercept (point where x = 0 and the line passes through the y-axis) b = slope of the line (y2-y1/x2-x1) The slope indicates the nature of the correlation Positive = y increases as x increases Negative = y decreases as x increases 0 = no correlation Same as Pearsons correlation No relation

10、ship between the variables,Correlation Coefficient (r),Shows the strength of the linear relationship between two variables, symbolized by r The closer the data points are to the line, the closer the regression value is to 1 or -1 r varies between -1 (perfect negative correlation) to 1 (perfect posit

11、ive correlation) 0 - 0.2 no or very weak association 0.2 -0.4 weak association 0.4 -0.6 moderate association 0.6 - 0.8 strong association 0.8 - 1.0 very strong to perfect association null hypothesis is no association (r = 0) Salkind, N. J. (2000) Statistics for people who think they hate statistics.

12、 Thousand Oaks, CA: Sage,Coefficient of Determination (r2),Used to estimate the extent to which the dependent variable (y) is under the influence of the independent variable (x) r2 (the square of the correlation coefficient) Varies from 0 to 1 r2 = 1 means that the value of y is completely dependent

13、 on x (no error or other contributing factors) r2 1 indicates that the value of y is influenced by more than the value of x,Coefficient of Determination,A measurement of the proportion of variance of y explained by its dependence on x Remainder (1 - r2) is the variance of y that is not explained by

14、x (i.e., error or other factors) e.g., if r2 = 0.84, it shows a strong, positive relationship between the variables and shows that the value of x is used to predict 84% of the variability of y (and 16% is due to other factors) r2 can be calculated for correlation analysis by squaring r but Not a mea

15、sure of variation of y explained by variation in x Variation in y is associated with the variance of x (and vice versa),Assumptions of Linear Regression,Independent variable (x) is selected by investigator (not random) and has no associated variance For every value of x, values of y have a normal di

16、stribution Observed values of y differ from the mean value of y by an amount called a residual. (Residuals are normally distributed.) The variances of y for all values of x are equal (homoscedasticity) Observations are independent (Each individual in the sample is only measured once.),Linear Regress

17、ion Data,Anscombe, F. J. 1973. Graphs in Statistical Analysis. The American Statistician 27(1):17-21.,The numbers alone do not guarantee that the data have been fitted well!,Linear Regression Data,Linear Regression Data,Figure 1: Acceptable regression model with observations distributed evenly aroun

18、d the regression line,Figure 2: Strong curvature suggests that linear regression may not be appropriate (an additional variable may be required),Linear Regression Data,Figure 3: A single outlier alters the slope of the line. The point may be erroneous but if not, a different test may be necessary,Fi

19、gure 4: Actually a regression line connecting only two points. If the rightmost point was different, the regression line would shift.,What if were not sure if linear regression is appropriate?,Residuals,Homoscedastic,Heteroscedastic,Variance appears random Good regression model,“Funnel” shaped and m

20、ay be bowed Suggests that a transformation and inclusion of additional variables may be warranted,Helsel, D.R., and R.M. Hirsh. 2002. Statistical Methods in Water Resources. USGS (/pubs/twri/twri4a3/),Outliers,Values that appear very different from others in the data set Rule of

21、thumb: an outlier is more than three standard deviations from mean Three causes Measurement or recording error Observation from a different population A rare event from within the population Outliers need to be considered and not simply dismissed May indicate important phenomenon e.g., ozone hole da

22、ta (outliers removed automatically by analysis program, delaying observation about 10 years),Outliers,Helsel, D.R., and R.M. Hirsh. 2002. Statistical Methods in Water Resources. USGS (/pubs/twri/twri4a3/),When is Linear Regression Appropriate?,Data should be interval or ratio The

23、 dependent and independent variables should be identifiable The relationship between variables should be linear (if not, a transformation might be appropriate) Have you chosen the values of the independent variable? Does the residual plot show a random spread (homoscedastic) and does the normal probability plot display a straight line (or does a histogram of residuals show a normal distribution)?,(Normal Probability Plot

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论