




版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
1、1,漫谈生物统计和R的应用,李欣海 中国科学院动物研究所,2,Contents of my class,History and development of biostatistics 生物统计学的历史和发展 Probability distribution 概率分布 Hypothesis testing 假设检验 Analysis of variance (ANOVA) 方差分析 Simple linear regression and correlation 简单线性回归和相关 Analysis of covariance (ANCOVA) 协方差分析 Data transformatio
2、n and Nonparametric statistics 数据转化和非参数检验 Multivariate analysis 多元统计分析 Generalized linear model 广义线形模型 Sample survey and recent development of biostatistics and modeling 抽样调查和现代生物统计与模型进展,Sokal, R. R. and F. J. Rohlf. 1995. Biometry: the principles and practice of statistics in biological research. T
3、hird Edition. W. H. Freeman and Co.: New York. 887 pp.,Zar, J. H. 1999. Biostatistical Analysis. Fourth Edition. Prentice Hall: New Jersey, 663 pp.,Barnett, Vic. 2002. Sample Survey: Principles abline(v = 0, lty = 2) head(trees) use=sample(0:1,length(trees$Girth),rep=TRUE) data1=cbind(trees, use) na
4、mes(data1); table(use); summary(data1$Girth); sd(data1$Girth) fit = glm(useGirth+Height+Volume,data=data1,family=binomial() summary(fit) # display results confint(fit) # 95% CI for the coefficients exp(coef(fit) # exponentiated coefficients exp(confint(fit) # 95% CI for exponentiated coefficients pr
5、ed = predict(fit, type=response) # predicted values res = residuals(fit, type=deviance) # residuals x11(); plot(pred,res) plot(data1$Girth, pred) op = par(mfrow = c(2, 2), pty = s); plot(fit) source(D:/models/R_code/logistic/logistic_functions.txt) x11();logit.roc.plot(logit.roc(fit,80),55,R code fo
6、r logistic regression - lrm,#Fit a logistic model containing predictors age, blood.pressure, sex #and cholesterol, with age fitted with a smooth 5-knot restricted cubic #spline function and a different shape of the age relationship for males and females. library(Design) n - 1000 # define sample size
7、 set.seed(17) # so can reproduce the results age - rnorm(n, 50, 10); blood.pressure - rnorm(n, 120, 15) cholesterol - rnorm(n, 200, 25); sex - factor(sample(c(female,male), n,TRUE) label(age) - Age # label is in Hmisc label(cholesterol) - Total Cholesterol label(blood.pressure) - Systolic Blood Pres
8、sure label(sex) - Sex units(cholesterol) - mg/dl # uses units.default in Hmisc units(blood.pressure) - mmHg # Specify population model for log odds that Y=1 L - .4*(sex=male) + .045*(age-50) + (log(cholesterol - 10)-5.2)*(-2*(sex=female) + 2*(sex=male) # Simulate binary y to have Prob(y=1) = 1/1+exp
9、(-L) y - ifelse(runif(n) plogis(L), 1, 0) ddist - datadist(age, blood.pressure, cholesterol, sex); options(datadist=ddist) fit - lrm(y blood.pressure + sex * (age + rcs(cholesterol,4),x=TRUE, y=TRUE) op - par(mfrow = c(2, 2), pty = s); plot(fit) anova(fit) plot(fit, age=NA, sex=NA) plot(fit, age=20:
10、70, sex=male) # need if datadist not used,56,R code for logistic regression an example for habitat prediction,x=seq(116,120, by=0.1) #long y=seq(36,40, by=0.1) #lat Geo = expand.grid(x,y) names(Geo) = c(Lon,Lat) use=sample(0:1,length(Geo$Lat),rep=TRUE) #present/absent elev=rnorm(length(Geo$Lat),1000
11、,200) #elevation aspect=sample(1:100,length(Geo$Lat),rep=TRUE)/100 temperature=rnorm(length(Geo$Lat),20,5)*1000/elev distr=cbind(Geo,use,elev,aspect,temperature) #the database head(distr) fit = glm(useelev+aspect+temperature,data=distr,family=binomial() summary(fit) logit_current = predict(fit, type
12、=response) # predicted values hist(logit_current) pred_current = exp(logit_current)/(1+exp(logit_current) hist(pred_current) coplot(pred_current elev | Lon*Lat, data=distr, overlap = 0, number = c(8,8) result=cbind(distr,pred_current) head(result); attach(distr); x11() #plot current distribution plo
13、t(116:120, 36:40, type = n, xlab = Longitude, ylab = Latitude) for (i in 1:length(distr$Lat) if(pred_currenti0.62) points(Loni, Lati, col = red, cex=1.5, pch=19) if(pred_currenti0.62) points(Loni, Lati, col = blue, cex=1.5, pch=19),57,R code for logistic regression an example for habitat prediction
14、in future,#predict future distribution distr_future=distr distr_future$temperature=distr_future$temperature+2 # a globle warming scenario logit_future = predict(fit ,newdata = distr_future) hist(logit_future) pred_future = exp(logit_future)/(1+exp(logit_future) hist(pred_future) #plot future distrib
15、ution x11() plot(116:120, 36:40, type = n, xlab = Longitude, ylab = Latitude) for (i in 1:length(distr_future$Lat) if(pred_futurei0.5) points(Loni, Lati, col = red, cex = 1.5, pch=19) if(pred_futurei0.5) points(Loni, Lati, col = blue, cex = 1.5, pch=19),58,Multivariate analysis,59,R code for princip
16、al component analysis,trees fit = princomp(trees, cor=TRUE) summary(fit) # print variance accounted for loadings(fit) # pc loadings plot(fit,type=lines) # scree plot fit$scores # the principal components biplot(fit),60,PCA: General methodology,From k original variables: x1,x2,.,xk: Produce k new var
17、iables: y1,y2,.,yk: y1 = a11x1 + a12x2 + . + a1kxk y2 = a21x1 + a22x2 + . + a2kxk . yk = ak1x1 + ak2x2 + . + akkxk,such that: yks are uncorrelated (orthogonal) y1 explains as much as possible of original variance in data set y2 explains as much as possible of remaining variance etc.,yks are Principa
18、l Components,PCA,61,Principal Components Analysis,PCA,62,Principal Components Analysis,a11,a12,.,a1k is 1st Eigenvector of correlation/covariance matrix, and coefficients of first principal component a21,a22,.,a2k is 2nd Eigenvector of correlation/covariance matrix, and coefficients of 2nd principal
19、 component ak1,ak2,.,akk is kth Eigenvector ofcorrelation/covariance matrix, and coefficients of kth principal component,PCA,63,Principal Components Analysis,Score of ith unit on jth principal component yi,j = aj1xi1 + aj2xi2 + . + ajkxik,PCA,64,PCA Scores,PCA,65,Principal Components Analysis,Amount
20、 of variance accounted for by: 1st principal component, 1, 1st eigenvalue 2nd principal component, 2, 2nd eigenvalue . 1 2 3 4 . Average j = 1 (correlation matrix),PCA,66,Principal Components Analysis:Eigenvalues,PCA,67,PCA: Terminology,jth principal component is jth eigenvector ofcorrelation/covari
21、ance matrix coefficients, ajk, are elements of eigenvectors and relate original variables (standardized if using correlation matrix) to components scores are values of units on components (produced using coefficients) amount of variance accounted for by component is given by eigenvalue, j proportion
22、 of variance accounted for by component is given by j / j loading of kth original variable on jth component is given by ajkj -correlation between variable and component,PCA,68,PCA: Potential Problems,Lack of Independence NO PROBLEM Lack of Normality Normality desirable but not essential Lack of Prec
23、ision Precision desirable but not essential Many Zeroes in Data Matrix Problem (use Correspondence Analysis),PCA,69,Bayesian statistics,Suppose that our prior density for p is denoted by g(p). If we regard a “success” as sleeping at least eight hours and we take a random sample with s successes and
24、f failures, then the likelihood function is given by L(p) = ps (1p)f , 0p1 The posterior density for p, by Bayes rule, is obtained, up to a proportionality constant, by multiplying the prior density by the likelihood. g(p|data) g(p)L(p).,Albert, Jim. Bayesian computation with R. 2009. Springer.,70,B
25、ayes computation with R,# an example of the percentage of heavy sleepers library(LearnBayes) library(lattice) p = seq(0.05, 0.95, by = 0.1) prior = c(2, 4, 8, 8, 4, 2, 1, 1, 1, 1) prior = prior/sum(prior) plot(p, prior, type = h, ylab=Prior Probability) data = c(11, 16) #a sample with 11 successes and 17 fails post = pdisc(p, prior, data) round(cbind(p, prior, post),2) PRIOR=data.frame(prior,p,prior) POST=data.frame(posterior,p,post) names(PRIOR)=c(Type,P
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 面向2025的工业互联网平台云计算资源动态分配能耗管理策略报告
- 2022年贵阳市三年级语文第三单元考试试卷
- 2022年敦化市小学三年级语文第一单元考试试卷
- 结婚试卷题目及答案
- 血管病变组织学诊断进展-洞察及研究
- 产业政策影响评估-第2篇-洞察及研究
- 基于物联网技术的2025年校园安全监控解决方案应用报告
- 历史情调题目及答案
- 2025版仓储物流冬季劳务扫雪合同标准
- 二零二五年度公共设施招标合同编制与实施办法
- 人教版高中物理选择性必修第二册第一章安培力与洛伦兹力
- 中国脑出血诊治指南(2023年)-1
- GB 16869-2005鲜、冻禽产品
- 第五章-航空气象知识课件
- 化学化工化学实验医疗器材
- 学校“三重一大”事项决策制度
- 超星尔雅学习通《大学英语口语》章节测试含答案
- 变压器原理分类及应用
- 指数函数、对数函数、幂函数的图像及性质
- 【环境课件】环境生物工程-(NXPowerLite)
- 流动资金自动测算表(内自带计算公式)
评论
0/150
提交评论