




已阅读5页,还剩40页未读, 继续免费阅读
版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
迴 歸 分 析,相關係數(Correlation),描述兩個變數X、Y之間的線性相關 Example: data1中的身高及體重,2,如何量化這樣的線性關係呢? Correlation! Linear correlation!,相關係數(Correlation),By definition, the correlation between X and Y is Its estimate, Pearsons correlation coefficient,3,相關係數(Correlation),ro: positively correlated r0: negatively correlated r=0: no linear correlation r=0不代表、Y之間沒有關係,有可能只是他們之間的關係不是線性的 畫圖還是必要的,4,相關係數(Correlation),R程式:cor(x,y,method = c(“pearson“, “kendall“, “spearman“) ) x: 數值向量或是矩陣 y: 數值向量,當x是矩陣的時候,可以不需輸入,5,相關係數(Correlation),若想進一步檢定 vs. 檢定統計量 95% confidence interval:,6,相關係數(Correlation),R程式:cor.test(x, y, alternative = c(“two.sided“, “less“, “greater“), method = c(“pearson“, “kendall“, “spearman“), exact = NULL, conf.level = 0.95, continuity = FALSE, .) x: 數值向量 y: 數值向量 exact: T或F,表示是否計算exact p-value continuity: 是否需要進行連續校正,7,所以身高與體重有統計顯著的正相關,Practice,8,請畫出在Surgical data中,liver與clot的散佈圖。請問由圖中,可以看出liver與clot的關係嗎? 請計算liver與clot的相關係數。 請檢定liver與clot之相關係數是否為0。 Q: 除了看相關性的強度,能不能看彼此如何影響?Regression!,Linear Regression,Step1: 血壓的分布,該分布是否男女有別; Step2:血壓是否和體重有線性相關; Step3:該線性關係如何描述; Step4:如何描述血壓和體重、性別、等等的關係。 Y: response variable, dependent variable (say, bp) X: covariate, explanatory variable, independent variable (say, weight),9,Linear Regression,Q: how does X affect Y? Can we fit a line in the scatter plot? In fact, we should say , where is called error, is normal with zero mean and variance 2.,10,Regression model -simple linear regression,11,直線上的點是估的,叫fitted values, 這是已知體重X之後,期望的血壓值,是期望值,故人稱 regress toward the mean; 這和觀察值不同,有sampling variation,Estimate coefficients,How to find (intercept) and (slope)? Least Squares! Minimize residual sum of squares Take derivative,12,“residual” is the difference between fitted and observed values; Y軸的差,Estimate coefficients,Rearrange the terms, get normal equations Solving the normal equations, we get estimates,13,Are these LSE good?,Are they unbiased? Standard errors of these estimates?,14,Unbiased,Are these LSE good?,In statistics, to ask “Are these estimates good?” is the same as asking “Are they close to the true values?” They are good in the sense that they are unbiased. They are best linear unbiased estimators (BLUE) Gauss-Markov theorem: Under the conditions of regression model (mean, constant variance, uncorrelated errors), the least squares estimators are unbiased and have minimum variance among all unbiased linear estimators.,15,Estimation of variance,can be estimated by Therefore,16,Linear regression using R,R程式:lm(formula, data, .) formula: yx,其中y是response,x是covariate,17,3.943=70.8432/17.9663,Linear regression,Confidence interval of and ? Use t-distribution with df=n-2 Testing if the coefficient =0? If =0? Use t with df=n-2 An increase of 1kg in Weight leads to an increase of 0.7167 in Bp. If someone weighs 70kg, then his/her bp is estimated by 70.84 + 0.7270 = 121.24 - interpolation,18,Linear regression,Meaningful when estimating bp with 120kg? not really, outside the range of the data, dangerous extrapolation Regression does not imply causality. It simply reflects the regression relation between X (weight) and Y (bp). This regression does not say X causes Y. Can we use bp to predict weight? yes, if weight is the variable of interest,19,Practice,想知道在Surgical data中,clot如何影響liver,請建立liver與clot之迴歸模式。 如何解釋此模型呢? 請問clot對liver的影響是顯著的嗎?,20,Homework,想知道在Surgical data中,enzyme如何影響SVtime,請建立enzyme與SVtime之迴歸模式。 如何解釋此模型呢? 請問enzyme對SVtime的影響是顯著的嗎?,21,How good is the regression?,How good does the line explain all the variation in y? How good does the fitted correlation of (X,Y) explain Y? 因為 定義判斷係數(coefficient of determination): Pearsons correlation coefficient In simple linear regression,22,total deviation in responses around the grand mean,deviation of observations around fitted line,deviation of fitted values around the grand mean,SSTO,SSE,SSR,percentage of variation explained by regression line,Example,23,R20.4149,AVOVA table of regression,24,SSE,SSR,Practice,在Surgical data中,模式為liverclot 請問在此模型中,判斷係數為多少,25,Diagnostics,26,基本假設:殘差平均為0,相差變異數相同,殘差之間不相關,看看殘差的分佈情況,看殘差和index的關係(應該要沒關係),殘差應該要和fitted value無關,殘差應該要與解釋變數無關,Diagnostics,If,27,Randomly scattered around zero!,From minus to positive! Model may not be proper. Time effect? (If x=time),Linearity 有問題試試polynomial 或transform X?,Constant var有問題;若X值大則var大;試試加別的X或是weighted LS?,Example,28,Q-Q plot,如果殘差服從常態分配,那麼除了它的長條圖像常態之外,它的排名的值和實際母體同排名的值像不像呢? The quantile of the residual versus the normal quantile:,29,將殘差標準化,再排序,第2/6(=0.33)分位的quantile是-1.33即P(ei-1.33)=2/6,算出排序的名次,對常態來說,第2/6(=0.33)分位的quantile是-0.43; 即P(Z-0.43)=2/6 =33%,對常態來說第0.26分位的quantile是-0.64; 即P(Z-0.64) =26%,Plot these two columns,Q-Q plot,If close to a X=Y straight line, then residuals close to normality! R程式:qqnorm(model1$”residuals”),30,殘差中排名4/6的殘差值和N(0,1)中累積機率為4/6的值,Q-Q plot,31,Y is right skewed,Y is left skewed,Diagnostics in R,32,Diagnostics,plots to examine The linear effect of each predictor: or Constant variance: Independence of samples: or Normality assumption: Q-Q plot Other important predictors? Say : Are there outliers: , scatter plot, If Yes, examine if it is true outlier, or gross error. If Yes, more data near this point. If No, delete the data point before regression analysis. 6fitted model23145,33,Practice,在Surgical data中,模式為liverclot 請問此模式符合迴歸的假設嗎?,34,Multiple linear regression,Extension of SLR, including more than one predictors in the model,35,Linear?,Linear?,Difference?,Multiple linear regression,Model: : regression coefficients : observed data are independent In matrix form,36,Multiple linear regression,哪些term可以放到X中呢? Predictors: 如例子中的weight, age, sex Transformations of predictors Polynomials: and Dummy variables and factors Interactions and other combinations of predictors:,37,Example,38,Inference of regression coefficients,和SLR時一樣,用最小平方法 satisfy Gauss-Markov Thm,39,Inference of regression coefficients,和在SLR中相同,我們想要估計 的confidence interval, 或是進行檢定,需要先估計出 Recall,
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 南大校区围堰工程施工方案
- 民宿管理面试题库及答案
- 2025年教师招聘之《小学教师招聘》题库必刷100题含答案详解【典型题】
- 教师招聘之《小学教师招聘》综合提升试卷及答案详解【考点梳理】
- 2025年教师招聘之《幼儿教师招聘》每日一练试卷附参考答案详解(夺分金卷)
- 2025年教师招聘之《幼儿教师招聘》每日一练试卷附参考答案详解(能力提升)
- 教师招聘之《小学教师招聘》综合提升练习试题含答案详解【黄金题型】
- 2025年艾梅乙培训试题(含答案)
- 共青餐饮联合整改措施
- 教师招聘之《幼儿教师招聘》考前冲刺练习试题含答案详解(巩固)
- 第一单元-第2课-《国色之韵》课件人教版初中美术八年级上册
- 地坪承包合同范本3篇
- 中学校长在2025年秋季学期开学典礼上致辞:六个“成长关键词”耕耘当下遇见未来
- (2025年标准)猪场租赁协议书
- 交通规划中智能交通信号控制技术应用2025年研究报告
- 公共空间设计培训课件
- 2025年公安部交管局三力测试题库及答案
- 2025年邮政集团招聘考试复习资料与预测题
- 2025年秋期部编版四年级上册小学语文教学计划+教学进度表
- 医学统计学SPSS
- 海上避碰规则PPT课件
评论
0/150
提交评论