StatisticsandR2010R语言.ppt_第1页
StatisticsandR2010R语言.ppt_第2页
StatisticsandR2010R语言.ppt_第3页
StatisticsandR2010R语言.ppt_第4页
StatisticsandR2010R语言.ppt_第5页
已阅读5页,还剩66页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

1、1,漫谈生物统计和R的应用,李欣海 中国科学院动物研究所,2,Contents of my class,History and development of biostatistics 生物统计学的历史和发展 Probability distribution 概率分布 Hypothesis testing 假设检验 Analysis of variance (ANOVA) 方差分析 Simple linear regression and correlation 简单线性回归和相关 Analysis of covariance (ANCOVA) 协方差分析 Data transformatio

2、n and Nonparametric statistics 数据转化和非参数检验 Multivariate analysis 多元统计分析 Generalized linear model 广义线形模型 Sample survey and recent development of biostatistics and modeling 抽样调查和现代生物统计与模型进展,Sokal, R. R. and F. J. Rohlf. 1995. Biometry: the principles and practice of statistics in biological research. T

3、hird Edition. W. H. Freeman and Co.: New York. 887 pp.,Zar, J. H. 1999. Biostatistical Analysis. Fourth Edition. Prentice Hall: New Jersey, 663 pp.,Barnett, Vic. 2002. Sample Survey: Principles abline(v = 0, lty = 2) head(trees) use=sample(0:1,length(trees$Girth),rep=TRUE) data1=cbind(trees, use) na

4、mes(data1); table(use); summary(data1$Girth); sd(data1$Girth) fit = glm(useGirth+Height+Volume,data=data1,family=binomial() summary(fit) # display results confint(fit) # 95% CI for the coefficients exp(coef(fit) # exponentiated coefficients exp(confint(fit) # 95% CI for exponentiated coefficients pr

5、ed = predict(fit, type=response) # predicted values res = residuals(fit, type=deviance) # residuals x11(); plot(pred,res) plot(data1$Girth, pred) op = par(mfrow = c(2, 2), pty = s); plot(fit) source(D:/models/R_code/logistic/logistic_functions.txt) x11();logit.roc.plot(logit.roc(fit,80),55,R code fo

6、r logistic regression - lrm,#Fit a logistic model containing predictors age, blood.pressure, sex #and cholesterol, with age fitted with a smooth 5-knot restricted cubic #spline function and a different shape of the age relationship for males and females. library(Design) n - 1000 # define sample size

7、 set.seed(17) # so can reproduce the results age - rnorm(n, 50, 10); blood.pressure - rnorm(n, 120, 15) cholesterol - rnorm(n, 200, 25); sex - factor(sample(c(female,male), n,TRUE) label(age) - Age # label is in Hmisc label(cholesterol) - Total Cholesterol label(blood.pressure) - Systolic Blood Pres

8、sure label(sex) - Sex units(cholesterol) - mg/dl # uses units.default in Hmisc units(blood.pressure) - mmHg # Specify population model for log odds that Y=1 L - .4*(sex=male) + .045*(age-50) + (log(cholesterol - 10)-5.2)*(-2*(sex=female) + 2*(sex=male) # Simulate binary y to have Prob(y=1) = 1/1+exp

9、(-L) y - ifelse(runif(n) plogis(L), 1, 0) ddist - datadist(age, blood.pressure, cholesterol, sex); options(datadist=ddist) fit - lrm(y blood.pressure + sex * (age + rcs(cholesterol,4),x=TRUE, y=TRUE) op - par(mfrow = c(2, 2), pty = s); plot(fit) anova(fit) plot(fit, age=NA, sex=NA) plot(fit, age=20:

10、70, sex=male) # need if datadist not used,56,R code for logistic regression an example for habitat prediction,x=seq(116,120, by=0.1) #long y=seq(36,40, by=0.1) #lat Geo = expand.grid(x,y) names(Geo) = c(Lon,Lat) use=sample(0:1,length(Geo$Lat),rep=TRUE) #present/absent elev=rnorm(length(Geo$Lat),1000

11、,200) #elevation aspect=sample(1:100,length(Geo$Lat),rep=TRUE)/100 temperature=rnorm(length(Geo$Lat),20,5)*1000/elev distr=cbind(Geo,use,elev,aspect,temperature) #the database head(distr) fit = glm(useelev+aspect+temperature,data=distr,family=binomial() summary(fit) logit_current = predict(fit, type

12、=response) # predicted values hist(logit_current) pred_current = exp(logit_current)/(1+exp(logit_current) hist(pred_current) coplot(pred_current elev | Lon*Lat, data=distr, overlap = 0, number = c(8,8) result=cbind(distr,pred_current) head(result); attach(distr); x11() #plot current distribution plo

13、t(116:120, 36:40, type = n, xlab = Longitude, ylab = Latitude) for (i in 1:length(distr$Lat) if(pred_currenti0.62) points(Loni, Lati, col = red, cex=1.5, pch=19) if(pred_currenti0.62) points(Loni, Lati, col = blue, cex=1.5, pch=19),57,R code for logistic regression an example for habitat prediction

14、in future,#predict future distribution distr_future=distr distr_future$temperature=distr_future$temperature+2 # a globle warming scenario logit_future = predict(fit ,newdata = distr_future) hist(logit_future) pred_future = exp(logit_future)/(1+exp(logit_future) hist(pred_future) #plot future distrib

15、ution x11() plot(116:120, 36:40, type = n, xlab = Longitude, ylab = Latitude) for (i in 1:length(distr_future$Lat) if(pred_futurei0.5) points(Loni, Lati, col = red, cex = 1.5, pch=19) if(pred_futurei0.5) points(Loni, Lati, col = blue, cex = 1.5, pch=19),58,Multivariate analysis,59,R code for princip

16、al component analysis,trees fit = princomp(trees, cor=TRUE) summary(fit) # print variance accounted for loadings(fit) # pc loadings plot(fit,type=lines) # scree plot fit$scores # the principal components biplot(fit),60,PCA: General methodology,From k original variables: x1,x2,.,xk: Produce k new var

17、iables: y1,y2,.,yk: y1 = a11x1 + a12x2 + . + a1kxk y2 = a21x1 + a22x2 + . + a2kxk . yk = ak1x1 + ak2x2 + . + akkxk,such that: yks are uncorrelated (orthogonal) y1 explains as much as possible of original variance in data set y2 explains as much as possible of remaining variance etc.,yks are Principa

18、l Components,PCA,61,Principal Components Analysis,PCA,62,Principal Components Analysis,a11,a12,.,a1k is 1st Eigenvector of correlation/covariance matrix, and coefficients of first principal component a21,a22,.,a2k is 2nd Eigenvector of correlation/covariance matrix, and coefficients of 2nd principal

19、 component ak1,ak2,.,akk is kth Eigenvector ofcorrelation/covariance matrix, and coefficients of kth principal component,PCA,63,Principal Components Analysis,Score of ith unit on jth principal component yi,j = aj1xi1 + aj2xi2 + . + ajkxik,PCA,64,PCA Scores,PCA,65,Principal Components Analysis,Amount

20、 of variance accounted for by: 1st principal component, 1, 1st eigenvalue 2nd principal component, 2, 2nd eigenvalue . 1 2 3 4 . Average j = 1 (correlation matrix),PCA,66,Principal Components Analysis:Eigenvalues,PCA,67,PCA: Terminology,jth principal component is jth eigenvector ofcorrelation/covari

21、ance matrix coefficients, ajk, are elements of eigenvectors and relate original variables (standardized if using correlation matrix) to components scores are values of units on components (produced using coefficients) amount of variance accounted for by component is given by eigenvalue, j proportion

22、 of variance accounted for by component is given by j / j loading of kth original variable on jth component is given by ajkj -correlation between variable and component,PCA,68,PCA: Potential Problems,Lack of Independence NO PROBLEM Lack of Normality Normality desirable but not essential Lack of Prec

23、ision Precision desirable but not essential Many Zeroes in Data Matrix Problem (use Correspondence Analysis),PCA,69,Bayesian statistics,Suppose that our prior density for p is denoted by g(p). If we regard a “success” as sleeping at least eight hours and we take a random sample with s successes and

24、f failures, then the likelihood function is given by L(p) = ps (1p)f , 0p1 The posterior density for p, by Bayes rule, is obtained, up to a proportionality constant, by multiplying the prior density by the likelihood. g(p|data) g(p)L(p).,Albert, Jim. Bayesian computation with R. 2009. Springer.,70,B

25、ayes computation with R,# an example of the percentage of heavy sleepers library(LearnBayes) library(lattice) p = seq(0.05, 0.95, by = 0.1) prior = c(2, 4, 8, 8, 4, 2, 1, 1, 1, 1) prior = prior/sum(prior) plot(p, prior, type = h, ylab=Prior Probability) data = c(11, 16) #a sample with 11 successes and 17 fails post = pdisc(p, prior, data) round(cbind(p, prior, post),2) PRIOR=data.frame(prior,p,prior) POST=data.frame(posterior,p,post) names(PRIOR)=c(Type,P

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论