




版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
1、The American Journal of Human Genetics 2008,Walking the Interactome for Prioritization of Candidate Disease Genes,Contents,Material and Methods,3,Abstract,In this work, we present a method for prioritization of candidate genes by use of a global network distance measure, random walk analysis, for de
2、finition of similarities in protein-protein interaction networks. We tested our method on 110 disease-gene families with a total of 783 genes and achieved an area under the ROC curve of up to 98% on simulated linkage intervals of 100 genes surrounding the disease gene, significantly outperforming pr
3、evious methods based on local distance measures. Our results not only provide an improved tool for positional-cloning projects but also add weight to the assumption that phenotypically similar diseases are associated with disturbances of subnetworks within the larger protein interactome that extend
4、beyond the disease proteins themselves.,Introduction,前人的工作:,Most current efforts at disease-gene identification involving linkage analysis or association studies result in a genomic interval of 0.510 cM containing up to 300 genes.Sequencing large numbers of candidate genes remains a time-consuming a
5、nd expensive task, Computational approaches based on functional annotation, gene-expression data, or sequence- based features. PPI. direct neighbors shortest path,本文的工作:,we have investigated the hypothesis that global network-similarity measures are better suited to capture relationships between dis
6、ease proteins than are algorithms based on direct interactions or shortest paths between disease genes.,Introduction,本文的工作:,We have defined 110 disease-gene families comprising genetically heterogeneous disorders, cancer syndromes, and complex (polygenic) diseases, and we have constructed an interac
7、tion network based on a total of 258,314 experimentally verified or predicted protein-protein interactions. We demonstrate that random walk and the related diffusion-kernel methodboth of which capture global relationships within an interaction networkare greatly superior to local distance measures w
8、ithin the interaction network and also outperform other previously published methods. We have made our algorithm freely available on the web, and we also provide predictions for 287 loci from 80 of the disease-gene families described in this work,Material and Methods,Disease-Gene Families Protein-Pr
9、otein Interaction Data Disease-Gene Prediction Random Walk Diffusion Kernel Other Methods Performance Measurement,Material and Methods,Disease-Gene Families (1) genetically heterogeneous disorders(86个) (2) cancer syndromes(12个) (3) complex (polygenic) disorders(12个) The 110 families contained a tota
10、l of 783 genes with 665 distinct genes (Some genes were members of more than one disease family), whereby the largest family contained 41 genes and the smallest only three genes. On average, each family contained seven genes,Material and Methods,Protein-Protein Interaction Data five protein-protein
11、interaction datasets from human, Mus musculus, Drosophila melanogaster, Caenorhabditis elegans, and Saccharomyces cerevisiae were downloaded from Entrez Gene on the 1st of July 2007. These datasets comprise interactions extracted from HPRD, BIND, and BioGrid. Additional interactions were extracted f
12、rom IntACT, and DIP. Interactions from the four nonhuman species were mapped to homologous human genes identified by Inparanoid analysis with a threshold Inparalog score of 0.8 STRING:experiment,predicted by comparative genomics and text mining,we included interactions with a score of at least 0.4.,
13、Material and Methods,Disease-Gene Families Protein-Protein Interaction Data Disease-Gene Prediction Random Walk Diffusion Kernel Other Methods Performance Measurement,Random Walk,Where W is the column-normalized adjacency matrix of the graph and pt is a vector in which the i-th element holds the pro
14、bability of being at node i at time step t.,performing the iteration until the change between pt and pt+1 (measured by the L1 norm) fell below 10-6.,Diffusion Kernel,The matrix L is the Laplacian of the graph, defined as D - A, where A is the adjacency matrix of the interaction graph and D is a diag
15、onal matrix containing the nodes degrees.,for each candidate gene j was assigned in accordance with its score defined as,For a sufficient small the diffusion kernel can be seen as a lazy random walk ,此时 :从当前结点转移到和它相邻的邻居结点的概率 1-di :仍然停留在当前结点的概率 (di being the degree of node i).,Other Methods,Direct in
16、teractions (DI) Shortest parth(SP) PROSPECT ENDEAVOUR,Performance Measurement,we defined the artificial linkage interval to be the set of genes containing the first 100 genes located nearest to the disease gene according to their genomic distance on the same chromosome. leave-one-out crossvalidation
17、 was used for each disease-gene family the formula is Enrichment=50/(rank) for an interval of 100 genes If a particular method assigns an identical score to more than one gene, we assume the worst case, in which the true disease gene is the last to be sequenced from the set of equally ranked genes R
18、OC,Results,Using the network containing all interactions (including text-mining data) and the RWR technique, we ranked all genes of 43 disease-gene families first (50-fold enrichment), On average, we achieved an enrichment score of 44-fold for all 783 disease genes. Leaving out text mining data, the
19、 RWR achieved a mean enrichment of 27-fold for all 110 disease families.,Without STRING text-mining data,mean rank,the true disease gene is the last,In the complete network (without text-mining data), genes have an average of 22.9 direct neighbors. There is a mean path length of 3.7 between random p
20、airs of genes. In 61% of the cases in which the DI method correctly identified the true disease gene, it additionally identified other unrelated genes with a direct interaction to a known disease gene. On the other hand, in only 1.4% of the cases in which the true disease gene was ranked in first pl
21、ace by the RWR method was another, unrelated gene also given the same score.,Results,without STRING text-mining data.,previous-knowledge bias,discovered in 2007 expect minimal publication bias,Red : (disease gene) RWR方法rank in first place Yellow: (non-disease gene) SP方法receive the highest rank,Discussion,为何这样选择100个gene? because of the tendency of similar genes to cluster in chromosomal neighborhoods.,Comparison of mean shortest-path distance from genes other than the disease gene within the 100-gene artificial interval with the corresponding mean distance among 100 randomly chosen g
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 加快医疗器械临床试验的审批流程和管理
- 2025至2030中国甲基丙烯酸正丙酯行业产业运行态势及投资规划深度研究报告
- 产学研合作推动时空数据挖掘教学与研究的互动发展
- 企业参与智慧城市公共服务的激励措施探讨
- 企业级网络安全事件响应流程研究
- 探索教育游戏化从理论到实践的综评
- 技术工人培训课件
- 教育心理学在校园欺凌预防中的作用
- 影楼销售技法培训课件
- 探索智能教学系统在职业技能培训中的应用
- 尾矿工安全培训
- 2025年中国工商银行招聘笔试备考题库(带答案详解)
- 2025年 云南省危险化学品经营单位安全管理人员考试练习题附答案
- 美发师五级试题及答案
- Q-GDW10250-2025 输变电工程建设安全文明施工规程
- 2024-2025学年四年级(下)期末数学试卷及答案西师大版2
- 2025-2030年中国钕铁硼永磁材料行业市场现状供需分析及投资评估规划分析研究报告
- 2025-2030年中国高导磁芯行业深度研究分析报告
- 宣城市宣州区“政聘企培”人才引进笔试真题2024
- 新课标(水平三)体育与健康《篮球》大单元教学计划及配套教案(18课时)
- 《生物安全培训》课件-2024鲜版
评论
0/150
提交评论