《基因差异表达分析》PPT课件.ppt_第1页
《基因差异表达分析》PPT课件.ppt_第2页
《基因差异表达分析》PPT课件.ppt_第3页
《基因差异表达分析》PPT课件.ppt_第4页
《基因差异表达分析》PPT课件.ppt_第5页
已阅读5页,还剩78页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

1、Differential Expression Analysis of Microarray Data,Chunsheng Han, Ph.D. Division of Bioinformatics State Key Laboratory of Reproductive Biology Institute of Zoology, CAS April 20, 2011,Lecture 8 Bioinformatics,Outline,Microarray data: Warming up! T-test and other tests SAM: Significance Analysis of

2、 Microarrays Rank Products and a real application,Data, data, data: the affymetrix system,Scan probe array image data .dat file created,Compute cell intensity data from the image data intensity data .cel file created,Analyze expression cell intensity data expression probe analysis data .chp file cre

3、ated,.DAT File,.CEL file,Raw data, not background corrected,.CHP File,Background substraction,IM: Ideal MM,Performance on corrupted data,Discrimination score R,R = (PM - MM) / (PM + MM) If PM MM, then R 1 If PM = MM, then R = 0,Detection Call Significance,One-Sided Wilcoxons Signed Rank Test,p value

4、,Where is the from?,R,Differential Expression: The affymetrix report,How is it calculated?,SPVi,j = PVi,j + log2(nfi*sfi),Change Call,Decide the baseline array and experiment array,Calculate PM MM of the two arrays,One-Sided Wilcoxons Signed Rank Test,Generate a change p value,Make a change call, ,O

5、utline,Microarray data: Warming up! T-test and other tests SAM: Significance Analysis of Microarrays Rank Products and a real application,Population 1,Population 2,1,2,Sample1,Sample2,The t-distribution,Founder WS Gosset (1876 to 1937) Wrote under the pseudonym “Student” Mostly worked in tea (t) tim

6、e ? Hence known as Students t test. Preferable when the n 60 Certainly if n 30,Is there a difference?,between youmeans, who is meaner?,T-test,Test for single mean Whether the sample mean is equal to the predefined population mean ? 2. Test for difference in means Whether the CD4 level of patients ta

7、king treatment A is equal to CD4 level of patients taking treatment B ? 3. Test for paired observation Whether the treatment conferred any significant benefit ?,Setting Up the Hypothesis,H0: 1 2 H1: 1 2,H0: 1 -2 = 0 H1: 1 - 2 0,H0: 1 = 2 H1: 1 2,H0: 1 2,H0: 1 - 2 0 H1: 1 - 2 0,H0: 1 - 2 H1: 1 - 2 0,

8、OR,OR,OR,Left Tail,Right Tail,Two Tail,H1: 1 2,Mean systolic BP in nephritis is significantly higher than of normal person,100 110 120 130 140,0.05,Mean systolic BP in nephritis is significantly different from that of normal person,0.025,0.025,100 110 120 130 140,Statistical Analysis,control group m

9、ean,treatment group mean,Is there a difference?,What does difference mean?,medium variability,high variability,low variability,The mean difference is the same for all three cases,What does difference mean?,medium variability,high variability,low variability,Which one shows the greatest difference?,W

10、hat does difference mean?,a statistical difference is a function of the difference between means relative to the variability a small difference between means with large variability could be due to chance like a signal-to-noise ratio,low variability,Which one shows the greatest difference?,So we esti

11、mate,low variability,signal,noise,difference between group means,variability of groups,=,XT - XC,SE(XT - XC),=,=,t-value,_,_,_,_,Determining the p-Value,.95,t,0,f(t),-1.96,1.96,.025,Assumptions,Normal distribution,Equal variance,Random sampling,t-Statistic,When the sampled population is normally dis

12、tributed, the t statistic is Student t distributed with n-1 degrees of freedom.,T- test for single mean,The following are the weight (mg) of each of 20 rats drawn at random from a large stock. Is it likely that the mean weight of these 20 rats are similar to the mean weight ( 24 mg) of the whole sto

13、ck ?,9 18 21 26 18 22 27 19 22 29 19 24 30 16 20 24 32,Steps for test for single mean,Questioned to be answered Is the Mean weight of the sample of 20 rats is 24 mg? N=20, =21.0 mg, sd=5.91 , =24.0 mg 2. Null Hypothesis The mean weight of rats is 24 mg. That is, The sample mean is equal to populatio

14、n mean. 3. Test statistics - t (n-1) df 4. Comparison with theoretical value if tab t (n-1) cal t (n-1) accept Ho, 5. Inference,t test for single mean,Test statistics n=20, =21.0 mg, sd=5.91 , =24.0 mg t = t .05, 19 = 2.093 Accept H0 if t = 2.093,Inference : There is no evidence that the sample is t

15、aken from the population with mean weight of 24 gm,t,X,X,S,n,S,n,S,n,n,df,n,n,P,1,2,1,2,2,1,1,2,2,2,2,1,2,1,2,1,1,1,1,2,Hypothesized Difference (usually zero when testing for equal means),(,),),(,(,),(,),(,),(,),n1,n2,_,_,T-test for difference in means,Recall that for single samples:,For related sam

16、ples:,where:,and,Wilcoxon Rank Sum Test,When we test a hypothesis about the difference between two independent population means, we do so using the difference between two sample means. When the two sample variances are tested and found not to be equal we cannot pool the sample variances thus we cann

17、ot use the t-test for independent samples. Instead, we use the Wilcoxon Rank Sum Test.,Wilcoxon Rank Sum Test,The Z test and the t test are “parametric tests” that is, they answer a question about the difference between populations by comparing sample statistics (e.g., X1 and X2) and making an infer

18、ence to the population parameters (1 and 2). The Wilcoxon, in contrast, allows inferences about whole populations,Wilcoxon Rank Sum Test,1. Wilcoxon with both n1 and n2 10 2. Wilcoxon with both n1 and n2 10 3. Examples,Wilcoxon,Note that distribution B is shifted to the right of distribution A,Small

19、 samples, independent groups,Wilcoxon Rank Sum Test first, combine the two samples and rank order all the observations. smallest number has rank 1, largest number has rank N (= sum of n1 and n2). separate samples and add up the ranks for the smaller sample. (If n1 = n2, choose either one.) test stat

20、istic : rank sum T for smaller sample.,Wilcoxon for n1 10 and n2 10,bi = PMi - MMi,ei = PMi - MMi,Outline,Microarray data: Warming up! T-test and other tests SAM: Significance Analysis of Microarrays Rank Products and a real application,Significance analysis of microarrays (SAM),SAM can be used to p

21、ick out significant genes based on differential expression between sets of samples. SAM gives estimates of the False Discovery Rate (FDR), which is the proportion of genes likely to have been wrongly identified by chance as being significant. It is a very interactive algorithm allows users to dynami

22、cally change thresholds for significance (through the tuning parameter delta) after looking at the distribution of the test statistic.,Assign experiments to two groups, e.g., in the expression matrix below, assign Experiments 1, 2 and 5 to group A, and experiments 3, 4 and 6 to group B.,2. Question:

23、 Is mean expression level of a gene in group A significantly different from mean expression level in group B?,SAM Two-Class Unpaired,Permutation tests,For each gene, compute d-value (analogous to t-statistic). This is the observed d-value for that gene. ii) Rank the genes in ascending order of their

24、 d-values.,iii) Randomly shuffle the values of the genes between groups A and B, such that the reshuffled groups A and B respectively have the same number of elements as the original groups A and B. Compute the d-value for each randomized gene,Original grouping,Randomized grouping,SAM Two-Class Unpa

25、ired,SAM Two-Class Unpaired,iv) Rank the permuted d-values of the genes in ascending order,v) Repeat steps iii) and iv) many times, so that each gene has many randomized d-values corresponding to its rank from the observed (unpermuted) d-value. Take the average of the randomized d-values for each ge

26、ne. This is the expected d-value of that gene.,vi) Plot the observed d-values vs. the expected d-values,Example,Significant positive genes (i.e., mean expression of group B mean expression of group A),Significant negative genes (i.e., mean expression of group A mean expression of group B),“Observed

27、d = expected d” line,The more a gene deviates from the “observed = expected” line, the more likely it is to be significant. Any gene beyond the first gene in the +ve or ve direction on the x-axis (including the first gene), whose observed exceeds the expected by at least delta, is considered signifi

28、cant.,SAM Two-Class Unpaired,For each permutation of the data, compute the number of positive and negative significant genes for a given delta as explained in the previous slide. The median number of significant genes from these permutations is the median False Discovery Rate. The rationale behind t

29、his is, any genes designated as significant from the randomized data are being picked up purely by chance (i.e., “falsely” discovered). Therefore, the median number picked up over many randomizations is a good estimate of false discovery rate.,Estimate FDR,t1 and t2 will be used as cutoffs. Calculat

30、e the average number of genes that exceed these values in the permutations. Very similar to the Gap Estimation algorithm for clustering, shown in a previous lecture. Estimate the number of falsely significant genes, under H0: Divide by the number of genes called significant,Example,Experimental Desi

31、gne,Gene expression measured by microarrays.,Tusher V G et al. PNAS 2001;98:5116-5121,Where are these genes?,How to choose ?,Omitting s0 caused higher FDR.,Test SAMs validity,10 out of 34 genes found have been reported in the literature as part of the response to IR 19 appear to be involved in the c

32、ell cycle 4 play role in DNA repair Perform Northern Blot- strong correlation found Artificial data sets- some genes induced, background noise,Other Methods- Comparison,R-fold Method: Gene i is significant if r(i)R or r(i)1/R FDR 73%-84% - Unacceptable. Pairwise fold change: At least 12 out of 16 pa

33、irings satisfying the criteria. FDR 60%-71% - Unacceptable. Why doesnt it work?,Fold-change, SAM- Validation,Summary,SAM is a method for identifying genes on a microarray with statistically significant changes in expression. Developed in a context of an actual biological experiment. Assign a score t

34、o each gene, uses permutations to estimate the percentage of genes identified by chance. Comparison to other methods. Robust, can be adopted to a broad range of experimental situations.,Outline,Microarray data: Warming up! T-test and other tests SAM: Significance Analysis of Microarrays Rank Products and a real application,Rank Products,Rank Product: RP = (3/10) * (1/10) * (2/10) * (5/10) intuitive non-pa

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

最新文档

评论

0/150

提交评论