医学网络平台分析课件Walking the interactome for prioritization of candidate disease genes_第1页
医学网络平台分析课件Walking the interactome for prioritization of candidate disease genes_第2页
医学网络平台分析课件Walking the interactome for prioritization of candidate disease genes_第3页
医学网络平台分析课件Walking the interactome for prioritization of candidate disease genes_第4页
医学网络平台分析课件Walking the interactome for prioritization of candidate disease genes_第5页
已阅读5页,还剩24页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

1、The American Journal of Human Genetics 2008,Walking the Interactome for Prioritization of Candidate Disease Genes,Contents,Material and Methods,3,Abstract,In this work, we present a method for prioritization of candidate genes by use of a global network distance measure, random walk analysis, for de

2、finition of similarities in protein-protein interaction networks. We tested our method on 110 disease-gene families with a total of 783 genes and achieved an area under the ROC curve of up to 98% on simulated linkage intervals of 100 genes surrounding the disease gene, significantly outperforming pr

3、evious methods based on local distance measures. Our results not only provide an improved tool for positional-cloning projects but also add weight to the assumption that phenotypically similar diseases are associated with disturbances of subnetworks within the larger protein interactome that extend

4、beyond the disease proteins themselves.,Introduction,前人的工作:,Most current efforts at disease-gene identification involving linkage analysis or association studies result in a genomic interval of 0.510 cM containing up to 300 genes.Sequencing large numbers of candidate genes remains a time-consuming a

5、nd expensive task, Computational approaches based on functional annotation, gene-expression data, or sequence- based features. PPI. direct neighbors shortest path,本文的工作:,we have investigated the hypothesis that global network-similarity measures are better suited to capture relationships between dis

6、ease proteins than are algorithms based on direct interactions or shortest paths between disease genes.,Introduction,本文的工作:,We have defined 110 disease-gene families comprising genetically heterogeneous disorders, cancer syndromes, and complex (polygenic) diseases, and we have constructed an interac

7、tion network based on a total of 258,314 experimentally verified or predicted protein-protein interactions. We demonstrate that random walk and the related diffusion-kernel methodboth of which capture global relationships within an interaction networkare greatly superior to local distance measures w

8、ithin the interaction network and also outperform other previously published methods. We have made our algorithm freely available on the web, and we also provide predictions for 287 loci from 80 of the disease-gene families described in this work,Material and Methods,Disease-Gene Families Protein-Pr

9、otein Interaction Data Disease-Gene Prediction Random Walk Diffusion Kernel Other Methods Performance Measurement,Material and Methods,Disease-Gene Families (1) genetically heterogeneous disorders(86个) (2) cancer syndromes(12个) (3) complex (polygenic) disorders(12个) The 110 families contained a tota

10、l of 783 genes with 665 distinct genes (Some genes were members of more than one disease family), whereby the largest family contained 41 genes and the smallest only three genes. On average, each family contained seven genes,Material and Methods,Protein-Protein Interaction Data five protein-protein

11、interaction datasets from human, Mus musculus, Drosophila melanogaster, Caenorhabditis elegans, and Saccharomyces cerevisiae were downloaded from Entrez Gene on the 1st of July 2007. These datasets comprise interactions extracted from HPRD, BIND, and BioGrid. Additional interactions were extracted f

12、rom IntACT, and DIP. Interactions from the four nonhuman species were mapped to homologous human genes identified by Inparanoid analysis with a threshold Inparalog score of 0.8 STRING:experiment,predicted by comparative genomics and text mining,we included interactions with a score of at least 0.4.,

13、Material and Methods,Disease-Gene Families Protein-Protein Interaction Data Disease-Gene Prediction Random Walk Diffusion Kernel Other Methods Performance Measurement,Random Walk,Where W is the column-normalized adjacency matrix of the graph and pt is a vector in which the i-th element holds the pro

14、bability of being at node i at time step t.,performing the iteration until the change between pt and pt+1 (measured by the L1 norm) fell below 10-6.,Diffusion Kernel,The matrix L is the Laplacian of the graph, defined as D - A, where A is the adjacency matrix of the interaction graph and D is a diag

15、onal matrix containing the nodes degrees.,for each candidate gene j was assigned in accordance with its score defined as,For a sufficient small the diffusion kernel can be seen as a lazy random walk ,此时 :从当前结点转移到和它相邻的邻居结点的概率 1-di :仍然停留在当前结点的概率 (di being the degree of node i).,Other Methods,Direct in

16、teractions (DI) Shortest parth(SP) PROSPECT ENDEAVOUR,Performance Measurement,we defined the artificial linkage interval to be the set of genes containing the first 100 genes located nearest to the disease gene according to their genomic distance on the same chromosome. leave-one-out crossvalidation

17、 was used for each disease-gene family the formula is Enrichment=50/(rank) for an interval of 100 genes If a particular method assigns an identical score to more than one gene, we assume the worst case, in which the true disease gene is the last to be sequenced from the set of equally ranked genes R

18、OC,Results,Using the network containing all interactions (including text-mining data) and the RWR technique, we ranked all genes of 43 disease-gene families first (50-fold enrichment), On average, we achieved an enrichment score of 44-fold for all 783 disease genes. Leaving out text mining data, the

19、 RWR achieved a mean enrichment of 27-fold for all 110 disease families.,Without STRING text-mining data,mean rank,the true disease gene is the last,In the complete network (without text-mining data), genes have an average of 22.9 direct neighbors. There is a mean path length of 3.7 between random p

20、airs of genes. In 61% of the cases in which the DI method correctly identified the true disease gene, it additionally identified other unrelated genes with a direct interaction to a known disease gene. On the other hand, in only 1.4% of the cases in which the true disease gene was ranked in first pl

21、ace by the RWR method was another, unrelated gene also given the same score.,Results,without STRING text-mining data.,previous-knowledge bias,discovered in 2007 expect minimal publication bias,Red : (disease gene) RWR方法rank in first place Yellow: (non-disease gene) SP方法receive the highest rank,Discussion,为何这样选择100个gene? because of the tendency of similar genes to cluster in chromosomal neighborhoods.,Comparison of mean shortest-path distance from genes other than the disease gene within the 100-gene artificial interval with the corresponding mean distance among 100 randomly chosen g

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论