




版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
1、The American Journal of Human Genetics 2008,Walking the Interactome for Prioritization of Candidate Disease Genes,Contents,Material and Methods,3,Abstract,In this work, we present a method for prioritization of candidate genes by use of a global network distance measure, random walk analysis, for de
2、finition of similarities in protein-protein interaction networks. We tested our method on 110 disease-gene families with a total of 783 genes and achieved an area under the ROC curve of up to 98% on simulated linkage intervals of 100 genes surrounding the disease gene, significantly outperforming pr
3、evious methods based on local distance measures. Our results not only provide an improved tool for positional-cloning projects but also add weight to the assumption that phenotypically similar diseases are associated with disturbances of subnetworks within the larger protein interactome that extend
4、beyond the disease proteins themselves.,Introduction,前人的工作:,Most current efforts at disease-gene identification involving linkage analysis or association studies result in a genomic interval of 0.510 cM containing up to 300 genes.Sequencing large numbers of candidate genes remains a time-consuming a
5、nd expensive task, Computational approaches based on functional annotation, gene-expression data, or sequence- based features. PPI. direct neighbors shortest path,本文的工作:,we have investigated the hypothesis that global network-similarity measures are better suited to capture relationships between dis
6、ease proteins than are algorithms based on direct interactions or shortest paths between disease genes.,Introduction,本文的工作:,We have defined 110 disease-gene families comprising genetically heterogeneous disorders, cancer syndromes, and complex (polygenic) diseases, and we have constructed an interac
7、tion network based on a total of 258,314 experimentally verified or predicted protein-protein interactions. We demonstrate that random walk and the related diffusion-kernel methodboth of which capture global relationships within an interaction networkare greatly superior to local distance measures w
8、ithin the interaction network and also outperform other previously published methods. We have made our algorithm freely available on the web, and we also provide predictions for 287 loci from 80 of the disease-gene families described in this work,Material and Methods,Disease-Gene Families Protein-Pr
9、otein Interaction Data Disease-Gene Prediction Random Walk Diffusion Kernel Other Methods Performance Measurement,Material and Methods,Disease-Gene Families (1) genetically heterogeneous disorders(86个) (2) cancer syndromes(12个) (3) complex (polygenic) disorders(12个) The 110 families contained a tota
10、l of 783 genes with 665 distinct genes (Some genes were members of more than one disease family), whereby the largest family contained 41 genes and the smallest only three genes. On average, each family contained seven genes,Material and Methods,Protein-Protein Interaction Data five protein-protein
11、interaction datasets from human, Mus musculus, Drosophila melanogaster, Caenorhabditis elegans, and Saccharomyces cerevisiae were downloaded from Entrez Gene on the 1st of July 2007. These datasets comprise interactions extracted from HPRD, BIND, and BioGrid. Additional interactions were extracted f
12、rom IntACT, and DIP. Interactions from the four nonhuman species were mapped to homologous human genes identified by Inparanoid analysis with a threshold Inparalog score of 0.8 STRING:experiment,predicted by comparative genomics and text mining,we included interactions with a score of at least 0.4.,
13、Material and Methods,Disease-Gene Families Protein-Protein Interaction Data Disease-Gene Prediction Random Walk Diffusion Kernel Other Methods Performance Measurement,Random Walk,Where W is the column-normalized adjacency matrix of the graph and pt is a vector in which the i-th element holds the pro
14、bability of being at node i at time step t.,performing the iteration until the change between pt and pt+1 (measured by the L1 norm) fell below 10-6.,Diffusion Kernel,The matrix L is the Laplacian of the graph, defined as D - A, where A is the adjacency matrix of the interaction graph and D is a diag
15、onal matrix containing the nodes degrees.,for each candidate gene j was assigned in accordance with its score defined as,For a sufficient small the diffusion kernel can be seen as a lazy random walk ,此时 :从当前结点转移到和它相邻的邻居结点的概率 1-di :仍然停留在当前结点的概率 (di being the degree of node i).,Other Methods,Direct in
16、teractions (DI) Shortest parth(SP) PROSPECT ENDEAVOUR,Performance Measurement,we defined the artificial linkage interval to be the set of genes containing the first 100 genes located nearest to the disease gene according to their genomic distance on the same chromosome. leave-one-out crossvalidation
17、 was used for each disease-gene family the formula is Enrichment=50/(rank) for an interval of 100 genes If a particular method assigns an identical score to more than one gene, we assume the worst case, in which the true disease gene is the last to be sequenced from the set of equally ranked genes R
18、OC,Results,Using the network containing all interactions (including text-mining data) and the RWR technique, we ranked all genes of 43 disease-gene families first (50-fold enrichment), On average, we achieved an enrichment score of 44-fold for all 783 disease genes. Leaving out text mining data, the
19、 RWR achieved a mean enrichment of 27-fold for all 110 disease families.,Without STRING text-mining data,mean rank,the true disease gene is the last,In the complete network (without text-mining data), genes have an average of 22.9 direct neighbors. There is a mean path length of 3.7 between random p
20、airs of genes. In 61% of the cases in which the DI method correctly identified the true disease gene, it additionally identified other unrelated genes with a direct interaction to a known disease gene. On the other hand, in only 1.4% of the cases in which the true disease gene was ranked in first pl
21、ace by the RWR method was another, unrelated gene also given the same score.,Results,without STRING text-mining data.,previous-knowledge bias,discovered in 2007 expect minimal publication bias,Red : (disease gene) RWR方法rank in first place Yellow: (non-disease gene) SP方法receive the highest rank,Discussion,为何这样选择100个gene? because of the tendency of similar genes to cluster in chromosomal neighborhoods.,Comparison of mean shortest-path distance from genes other than the disease gene within the 100-gene artificial interval with the corresponding mean distance among 100 randomly chosen g
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 行纪合同正规版4篇
- 新生儿窒息复苏考试试题及答案
- 旅游管理专业试题及答案
- 空中乘务专业试题及答案
- 诊所转让协议4篇
- 设备采购交货合同2篇
- 2025年口腔医学专业考试试卷及答案
- 2025年口腔医学事业单位招聘考试真题模拟卷及答案解析
- 预防接种知识考试试卷及答案含建卡核查流程要点
- 管理专业考研试题及答案
- 中城汽车(山东)有限公司审计报告
- 董事会基础知识培训总结课件
- 2025版煤矿安全规程宣贯培训课件
- (教科2024版)科学三年级上册2.1 水到哪里去了 课件(新教材)
- (2025秋新版)青岛版科学三年级上册全册教案
- 上锁挂牌管理培训课件
- 节能减排培训课件
- 葡萄冷藏保鲜技术规程
- 顾客联络服务 人工与智能客户服务协同要求 编制说明
- 以人为本的医院护理服务体系构建
- 新课标(水平三)体育与健康《篮球》大单元教学计划及配套教案(18课时)
评论
0/150
提交评论