版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
1、Differential Expression Analysis of Microarray Data,Chunsheng Han, Ph.D. Division of Bioinformatics State Key Laboratory of Reproductive Biology Institute of Zoology, CAS April 20, 2011,Lecture 8 Bioinformatics,Outline,Microarray data: Warming up! T-test and other tests SAM: Significance Analysis of
2、 Microarrays Rank Products and a real application,Data, data, data: the affymetrix system,Scan probe array image data .dat file created,Compute cell intensity data from the image data intensity data .cel file created,Analyze expression cell intensity data expression probe analysis data .chp file cre
3、ated,.DAT File,.CEL file,Raw data, not background corrected,.CHP File,Background substraction,IM: Ideal MM,Performance on corrupted data,Discrimination score R,R = (PM - MM) / (PM + MM) If PM MM, then R 1 If PM = MM, then R = 0,Detection Call Significance,One-Sided Wilcoxons Signed Rank Test,p value
4、,Where is the from?,R,Differential Expression: The affymetrix report,How is it calculated?,SPVi,j = PVi,j + log2(nfi*sfi),Change Call,Decide the baseline array and experiment array,Calculate PM MM of the two arrays,One-Sided Wilcoxons Signed Rank Test,Generate a change p value,Make a change call, ,O
5、utline,Microarray data: Warming up! T-test and other tests SAM: Significance Analysis of Microarrays Rank Products and a real application,Population 1,Population 2,1,2,Sample1,Sample2,The t-distribution,Founder WS Gosset (1876 to 1937) Wrote under the pseudonym “Student” Mostly worked in tea (t) tim
6、e ? Hence known as Students t test. Preferable when the n 60 Certainly if n 30,Is there a difference?,between youmeans, who is meaner?,T-test,Test for single mean Whether the sample mean is equal to the predefined population mean ? 2. Test for difference in means Whether the CD4 level of patients ta
7、king treatment A is equal to CD4 level of patients taking treatment B ? 3. Test for paired observation Whether the treatment conferred any significant benefit ?,Setting Up the Hypothesis,H0: 1 2 H1: 1 2,H0: 1 -2 = 0 H1: 1 - 2 0,H0: 1 = 2 H1: 1 2,H0: 1 2,H0: 1 - 2 0 H1: 1 - 2 0,H0: 1 - 2 H1: 1 - 2 0,
8、OR,OR,OR,Left Tail,Right Tail,Two Tail,H1: 1 2,Mean systolic BP in nephritis is significantly higher than of normal person,100 110 120 130 140,0.05,Mean systolic BP in nephritis is significantly different from that of normal person,0.025,0.025,100 110 120 130 140,Statistical Analysis,control group m
9、ean,treatment group mean,Is there a difference?,What does difference mean?,medium variability,high variability,low variability,The mean difference is the same for all three cases,What does difference mean?,medium variability,high variability,low variability,Which one shows the greatest difference?,W
10、hat does difference mean?,a statistical difference is a function of the difference between means relative to the variability a small difference between means with large variability could be due to chance like a signal-to-noise ratio,low variability,Which one shows the greatest difference?,So we esti
11、mate,low variability,signal,noise,difference between group means,variability of groups,=,XT - XC,SE(XT - XC),=,=,t-value,_,_,_,_,Determining the p-Value,.95,t,0,f(t),-1.96,1.96,.025,Assumptions,Normal distribution,Equal variance,Random sampling,t-Statistic,When the sampled population is normally dis
12、tributed, the t statistic is Student t distributed with n-1 degrees of freedom.,T- test for single mean,The following are the weight (mg) of each of 20 rats drawn at random from a large stock. Is it likely that the mean weight of these 20 rats are similar to the mean weight ( 24 mg) of the whole sto
13、ck ?,9 18 21 26 18 22 27 19 22 29 19 24 30 16 20 24 32,Steps for test for single mean,Questioned to be answered Is the Mean weight of the sample of 20 rats is 24 mg? N=20, =21.0 mg, sd=5.91 , =24.0 mg 2. Null Hypothesis The mean weight of rats is 24 mg. That is, The sample mean is equal to populatio
14、n mean. 3. Test statistics - t (n-1) df 4. Comparison with theoretical value if tab t (n-1) cal t (n-1) accept Ho, 5. Inference,t test for single mean,Test statistics n=20, =21.0 mg, sd=5.91 , =24.0 mg t = t .05, 19 = 2.093 Accept H0 if t = 2.093,Inference : There is no evidence that the sample is t
15、aken from the population with mean weight of 24 gm,t,X,X,S,n,S,n,S,n,n,df,n,n,P,1,2,1,2,2,1,1,2,2,2,2,1,2,1,2,1,1,1,1,2,Hypothesized Difference (usually zero when testing for equal means),(,),),(,(,),(,),(,),(,),n1,n2,_,_,T-test for difference in means,Recall that for single samples:,For related sam
16、ples:,where:,and,Wilcoxon Rank Sum Test,When we test a hypothesis about the difference between two independent population means, we do so using the difference between two sample means. When the two sample variances are tested and found not to be equal we cannot pool the sample variances thus we cann
17、ot use the t-test for independent samples. Instead, we use the Wilcoxon Rank Sum Test.,Wilcoxon Rank Sum Test,The Z test and the t test are “parametric tests” that is, they answer a question about the difference between populations by comparing sample statistics (e.g., X1 and X2) and making an infer
18、ence to the population parameters (1 and 2). The Wilcoxon, in contrast, allows inferences about whole populations,Wilcoxon Rank Sum Test,1. Wilcoxon with both n1 and n2 10 2. Wilcoxon with both n1 and n2 10 3. Examples,Wilcoxon,Note that distribution B is shifted to the right of distribution A,Small
19、 samples, independent groups,Wilcoxon Rank Sum Test first, combine the two samples and rank order all the observations. smallest number has rank 1, largest number has rank N (= sum of n1 and n2). separate samples and add up the ranks for the smaller sample. (If n1 = n2, choose either one.) test stat
20、istic : rank sum T for smaller sample.,Wilcoxon for n1 10 and n2 10,bi = PMi - MMi,ei = PMi - MMi,Outline,Microarray data: Warming up! T-test and other tests SAM: Significance Analysis of Microarrays Rank Products and a real application,Significance analysis of microarrays (SAM),SAM can be used to p
21、ick out significant genes based on differential expression between sets of samples. SAM gives estimates of the False Discovery Rate (FDR), which is the proportion of genes likely to have been wrongly identified by chance as being significant. It is a very interactive algorithm allows users to dynami
22、cally change thresholds for significance (through the tuning parameter delta) after looking at the distribution of the test statistic.,Assign experiments to two groups, e.g., in the expression matrix below, assign Experiments 1, 2 and 5 to group A, and experiments 3, 4 and 6 to group B.,2. Question:
23、 Is mean expression level of a gene in group A significantly different from mean expression level in group B?,SAM Two-Class Unpaired,Permutation tests,For each gene, compute d-value (analogous to t-statistic). This is the observed d-value for that gene. ii) Rank the genes in ascending order of their
24、 d-values.,iii) Randomly shuffle the values of the genes between groups A and B, such that the reshuffled groups A and B respectively have the same number of elements as the original groups A and B. Compute the d-value for each randomized gene,Original grouping,Randomized grouping,SAM Two-Class Unpa
25、ired,SAM Two-Class Unpaired,iv) Rank the permuted d-values of the genes in ascending order,v) Repeat steps iii) and iv) many times, so that each gene has many randomized d-values corresponding to its rank from the observed (unpermuted) d-value. Take the average of the randomized d-values for each ge
26、ne. This is the expected d-value of that gene.,vi) Plot the observed d-values vs. the expected d-values,Example,Significant positive genes (i.e., mean expression of group B mean expression of group A),Significant negative genes (i.e., mean expression of group A mean expression of group B),“Observed
27、d = expected d” line,The more a gene deviates from the “observed = expected” line, the more likely it is to be significant. Any gene beyond the first gene in the +ve or ve direction on the x-axis (including the first gene), whose observed exceeds the expected by at least delta, is considered signifi
28、cant.,SAM Two-Class Unpaired,For each permutation of the data, compute the number of positive and negative significant genes for a given delta as explained in the previous slide. The median number of significant genes from these permutations is the median False Discovery Rate. The rationale behind t
29、his is, any genes designated as significant from the randomized data are being picked up purely by chance (i.e., “falsely” discovered). Therefore, the median number picked up over many randomizations is a good estimate of false discovery rate.,Estimate FDR,t1 and t2 will be used as cutoffs. Calculat
30、e the average number of genes that exceed these values in the permutations. Very similar to the Gap Estimation algorithm for clustering, shown in a previous lecture. Estimate the number of falsely significant genes, under H0: Divide by the number of genes called significant,Example,Experimental Desi
31、gne,Gene expression measured by microarrays.,Tusher V G et al. PNAS 2001;98:5116-5121,Where are these genes?,How to choose ?,Omitting s0 caused higher FDR.,Test SAMs validity,10 out of 34 genes found have been reported in the literature as part of the response to IR 19 appear to be involved in the c
32、ell cycle 4 play role in DNA repair Perform Northern Blot- strong correlation found Artificial data sets- some genes induced, background noise,Other Methods- Comparison,R-fold Method: Gene i is significant if r(i)R or r(i)1/R FDR 73%-84% - Unacceptable. Pairwise fold change: At least 12 out of 16 pa
33、irings satisfying the criteria. FDR 60%-71% - Unacceptable. Why doesnt it work?,Fold-change, SAM- Validation,Summary,SAM is a method for identifying genes on a microarray with statistically significant changes in expression. Developed in a context of an actual biological experiment. Assign a score t
34、o each gene, uses permutations to estimate the percentage of genes identified by chance. Comparison to other methods. Robust, can be adopted to a broad range of experimental situations.,Outline,Microarray data: Warming up! T-test and other tests SAM: Significance Analysis of Microarrays Rank Products and a real application,Rank Products,Rank Product: RP = (3/10) * (1/10) * (2/10) * (5/10) intuitive non-pa
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 广东省湛江市雷州市雷州市第五中学集团2025-2026学年七年级上学期11月期中联考语文试题(含答案)(含答案)
- 全国范围内节能减排政策解读:绿色发展理念普及试卷
- 《GB-T 30853-2014牵引电机用铜及铜合金锻环》专题研究报告
- 2026年湖北省抗菌药物临床应用管理办法试题及答案
- 2026年兰州现代职业学院单招职业适应性考试题库附参考答案详解(完整版)
- 2026年南充文化旅游职业学院单招职业技能测试题库含答案详解(b卷)
- 2026年保险职业学院单招职业适应性测试题库含答案详解(典型题)
- 2026年内蒙古交通职业技术学院单招综合素质考试题库附参考答案详解(b卷)
- 2026年华北理工大学轻工学院单招职业倾向性测试题库附答案详解(培优b卷)
- 2026年兰考三农职业学院单招职业技能测试题库附答案详解(研优卷)
- (2025年)焊工(初级)考试题库及答案
- 北京市丰台区2025-2026学年上学期八年级期末英语试卷(原卷+解析)
- 终末期患者恶心呕吐的护理干预策略优化研究
- 2026 年民政局制式离婚协议书正式范本
- 田地种菜出租合同范本
- 2025-2030传统滋补品现代化转型与年轻化营销及投资价值研判
- 神经重症患者的气道管理策略
- 急性前壁再发心肌梗死的护理查房
- 装修避坑知识
- 《风景谈》(教学课件)-统编版高中语文选择性必修下册
- 谈恋爱被骗民事起诉状范本
评论
0/150
提交评论