【生物课件】an algorithmic approach to peptide sequencing via tandem mass ..._第1页
【生物课件】an algorithmic approach to peptide sequencing via tandem mass ..._第2页
【生物课件】an algorithmic approach to peptide sequencing via tandem mass ..._第3页
【生物课件】an algorithmic approach to peptide sequencing via tandem mass ..._第4页
【生物课件】an algorithmic approach to peptide sequencing via tandem mass ..._第5页
已阅读5页,还剩70页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

1,An Algorithmic Approach to Peptide Sequencing via Tandem Mass Spectrometry,Ming-Yang KaoDepartment of Computer ScienceNorthwestern UniversityEvanston, IllinoisU. S. A.,2,Collaborators of This Project,University of Southern CaliforniaTing Chen Harvard Medical SchoolGeorge M. ChurchJohn RushMatthew Tepel,3,Perspectives,A key goal of bioinformatics: To study biological systems based on global knowledge of genomes, transcriptomes, and proteomes.Genome: entire sets of materials in the chromosomes.Transcriptome: entire sets of gene transcripts.Proteome: entire sets of proteins.,4,Perspectives,A key goal of bioinformatics: To study biological systems based on global knowledge of genomes, transcriptomes, and proteomes.Genome: entire sets of materials in the chromosomes.Transcriptome: entire sets of gene transcripts.Proteome: entire sets of proteins.,this talks focus,5,Proteomics,Proteome: all proteins encoded within a genomehalf millions distinct proteins (temporal, spatial, modifications)30,000 human genesmRNA and protein expressions may not correlateProteomics: study of protein expression by biological systemsrelative abundance and stability; post-translational modificationsfluctuations as a response to environment and altered cellular needscorrelations between protein expression and disease stateprotein-protein interactions, protein complexesTechnologies:2D gel electrophoresis mass spectrometryyeast two-hybrid systemprotein chips,this talks focus,6,A Key Step of Proteomics,How to sequence proteins?How to sequence protein peptides? (this talks focus),7,Outline of This Talk,Problem Formulation (Biology)Problem Formulation (Computer Science)Basic Computational TechniquesImproved Computational Complexity and More Robust AlgorithmsConclusions,8,Outline of This Talk (1),Problem Formulation (Biology)Problem Formulation (Computer Science)Basic Computational TechniquesImproved Computational Complexity and More Robust AlgorithmsConclusions,9,Protein Identification: HPLC-MS-MS,Mass/ChargeTandem Mass Spectrum,Mass/Charge,Proteins,Peptides,One Peptide,B-ions / Y-ions,10,Protein Identification: HPLC-MS-MS,Mass/ChargeTandem Mass Spectrum,Mass/Charge,Proteins,Peptides,One Peptide,B-ions / Y-ions,11,Peptide Fragmentation and Ionization,B-ion,Y-ion,Complementary: Mass(B-ion)+Mass(Y-ion) = Mass(peptide)+4H+O,12,B-ions and Y-ions,Fragmentation,13,Tandem Mass Spectrum,Mass / Charge,Abundance (100%),200,50,88.033,100,400,175.113,274.112,361.121,430.213,448.225,14,Raw Tandem Mass Spectrum,15,Prediction from Raw Tandem Mass Spectrum,16,Protein Database Search,Find the peptide sequences in a protein database that optimally fit the spectrum.It does not work if the target peptide sequence is not in the database.It does not work if there is an unknown modification at some amino acid.It is very slow because it must search the entire database.E.g., SEQUEST, Yates, Univ. of Washington.,17,De Novo Peptide Sequencing Problem,Input: (1) the mass W of an unknown target peptide, and (2) a set S of the masses of some or all b-ions and y-ions of the peptide.Output: a peptide P such that (1) mass(P)=W and (2) S is a subset of all the ion masses of P.,Mass / Charge,Abundance (100%),50,100,274.112,361.121,Peptide Mass 429.212 Daltons,P = SWR,Mass(P) = 429.212,Ions(P) = 88.033, 175.113, 274.112, 361.121, 430.213, 448.225,18,Tandem Mass Spectrum,Mass / Charge,Abundance (100%),200,50,88.033,100,400,175.113,274.112,361.121,430.213,448.225,Peptide Mass 429.212 Daltons,19,Amino Acid Mass Table,20,Feature 1,All B-ions form a forward mass ladder.,Mass / Charge,Abundance (100%),200,50,88.033,100,400,175.113,274.112,361.121,430.213,448.225,S,W,R,Peptide Mass 429.212 Daltons,b1,b2,b3,1,21,Feature 2,All Y-ions form a reverse mass ladder.,Mass / Charge,Abundance (100%),200,50,88.033,100,400,175.113,274.112,361.121,430.213,448.225,S,W,R,R,W,S,Peptide Mass 429.212 Daltons,y1,y2,y3,19,22,Basic Difficulty #1,It is unknown whether an ion is a B-ion or an Y-ion.,Mass / Charge,Abundance (100%),200,50,88.033,100,400,175.113,274.112,361.121,430.213,448.225,Peptide Mass 429.212 Daltons,23,Basic Difficulty #2,There are missing ions.,Mass / Charge,Abundance (100%),200,50,100,400,274.112,361.121,Ion 1,Ion 2,Peptide Mass 429.212 Daltons,24,Feature 3 (to our Rescue),Complementary Ion Pairs: b1/y2 and b2/y1,Mass / Charge,Abundance (100%),200,50,88.033,100,400,175.113,274.112,361.121,430.213,448.225,S,W,R,R,W,S,Peptide Mass 429.212 Daltons,y1,y2,y3,b1,b2,b3,25,Outline of This Talk (2),Problem Formulation (Biology)Problem Formulation (Computer Science)Basic Computational TechniquesImproved Computational Complexity and More Robust AlgorithmsConclusions,26,Formulating the Computational Problem,T = an alphabet of 20 characters a1,a2,a20.two special characters: alpha and beta.the mass of alpha = 1, the mass of beta = 19, the mass of ai is mi.A peptide sequence is x1,x2,x3,xn-1,xn, where each xi is from T.A b-ion is x0,x1,x2,xi for some 1 = i = n, where x0 = alpha.A y-ion is xi,xn-2,xn-1,xn,xn+1 for some 1 = i = n, where xn+1 = beta.,27,De Novo Peptide Sequencing Problem,Input: (1) the mass W of an unknown target peptide, and (2) a set S of the masses of some or all b-ions and y-ions of the peptide.Output: a peptide P such that (1) mass(P)=W and (2) S is a subset of all the ion masses of P.,Mass / Charge,Abundance (100%),50,100,274.112,361.121,Peptide Mass 429.212 Daltons,P = SWR,Mass(P) = 429.212,Ions(P) = 88.033, 175.113, 274.112, 361.121, 430.213, 448.225,28,Amino Acid Mass Table,29,Outline of This Talk (3),Problem Formulation (Biology)Problem Formulation (Computer Science)Basic Computational TechniquesImproved Computational Complexity and More Robust AlgorithmsConclusions,30,peptide mass W,tandem mass spectrum S,NC-spectrum graph,Find feasible paths to order the masses in S to identify all the b-ions and y-ions consistent with S.,Basic Computing Scheme,Convert feasible paths into legal peptide sequences,31,NC-Spectrum Graph: Nodes (1),0,429.22,N0,C0,mass of this peptide,32,NC-Spectrum Graph: Nodes (2),mass of this peptide,0,429.22,N0,C0,174.11,273.11,mass( ) + mass( ) = mass(P) + 18,Ion # 1 (274.11),Assumption 1:If Ion 1 is an y-ionC1: a b-ion node,Assumption 2:If Ion 1 is a b-ionN1: a b-ion node,C1,N1,33,NC-Spectrum Graph: Nodes (3),0,429.22,N0,C0,174.11,273.11,mass( ) + mass( ) = mass(P) + 18,Ion # 2 (88.10),87.10,360.12,C1,N1,C2,N2,34,NC-Spectrum Graph: Edges (1),0,429.22,N0,C0,174.11,273.11,87.10,360.12,C1,N1,C2,N2,Mass(S) = 87.08.,S,35,NC-Spectrum Graph: Edges (2),0,429.22,N0,C0,174.11,273.11,87.10,360.12,C1,N1,C2,N2,Mass(S) = 87.08.,S,Mass(W) = 186.21,W,36,NC-Spectrum Graph: Edges (3),0,429.22,N0,C0,174.11,273.11,87.10,360.12,C1,N1,C2,N2,Mass(S) = 87.08.,S,Mass(W) = 186.21,W,S+W,Mass(S+W) = 273.29,37,NC-Spectrum Graph: Edges (4),0,429.22,N0,C0,174.11,273.11,87.10,360.12,C1,N1,C2,N2,Mass(S) = 87.08.,S,Mass(W) = 186.21,W,S+W,Mass(S+W) = 273.29,R,Mass(R) = 156.19,38,NC-Spectrum Graph,0,429.22,N0,C0,174.11,273.11,87.10,360.12,C1,N1,C2,N2,39,NC-Spectrum Graph: Paths = Sequences,0,429.22,N0,C0,174.11,273.11,87.10,360.12,C1,N1,C2,N2,S,W,R,b-ions,40,NC-Spectrum Graph: A Feasible Path (1),0,429.22,N0,C0,174.11,273.11,87.10,360.12,C1,N1,C2,N2,Definition: A feasible path is a path from N0 to C0 that goes through exactly one node for each pair (either Nj or Cj).,a feasible path,S,W,R,b-ions,41,NC-Spectrum Graph: A Feasible Path (2),0,429.22,N0,C0,174.11,273.11,87.10,360.12,C1,N1,C2,N2,Definition: A feasible path is a path from N0 to C0 that goes through exactly one node for each pair (either Nj or Cj).,a feasible path,S,S,GVV,b-ions,y-ions,42,NC-Spectrum Graph: Not A Feasible Path (1),0,429.22,N0,C0,174.11,273.11,87.10,360.12,C1,N1,C2,N2,Definition: A feasible path is a path from N0 to C0 that goes through exactly one node for each pair (either Nj or Cj).,not a feasible path:miss ion #2,43,NC-Spectrum Graph: Not A Feasible Path (2),0,429.22,N0,C0,174.11,273.11,87.10,360.12,C1,N1,C2,N2,Definition: A feasible path is a path from N0 to C0 that goes through exactly one node for each pair (either Nj or Cj).,not a feasible path:(2) repeat ion #1,44,NC-Spectrum Graph: Not A Feasible Path (3),0,429.22,N0,C0,174.11,273.11,87.10,360.12,C1,N1,C2,N2,Definition: A feasible path is a path from N0 to C0 that goes through exactly one node for each pair (either Nj or Cj).,not a feasible path:miss ion #2repeat ion #1,45,Reformulating the De Novo Peptide Sequencing Problem,Input: an NC-spectrum graph G.Output: a feasible path from N0 to C0.,46,Observations,A longest path does not always go through exactly one of each pair of nodes.It is an NP-hard problem if the spectrum graph is a general directed graph.,47,Basic Algorithm,Input: a peptide mass W and a tandem mass spectrum S.Output: a feasible peptide sequence.Steps:Compute the nodes of the NC-spectrum graph G.Compute the edges of G.Compute a feasible path P in G.Convert P into a feasible sequence.,48,Basic Algorithm (1),Input: a peptide mass W and a tandem mass spectrum S.Output: a feasible peptide sequence.Steps:Compute the nodes of the NC-spectrum graph G.Compute the edges of G.Compute a feasible path P in G.Convert P into a feasible sequence.,49,Compute the Nodes of the NC-Spectrum Graph,Step 2. Rename the nodes from left to right as X0, Xk,Yk,Y0,0,429.22,X0,Y0,174.11,273.11,87.10,360.12,X2,Y2,Y1,X1,0,429.22,N0,C0,174.11,273.11,87.10,360.12,C1,N1,C2,N2,Step 1. Compute the nodes and place them in the increasing order of masses.,Observation: Xi and Yi form a complementary pair of nodes Ni and Ci for ion i.Running Time: O(k), where k = # of masses in the spectrum.,50,Basic Algorithm (2),Input: a peptide mass W and a tandem mass spectrum S.Output: a feasible peptide sequence.Steps:Compute the nodes of the NC-spectrum graph G.Compute the edges of G. inverse of each otherCompute a feasible path P in G.Convert P into a feasible sequence.,51,Compute the Edges of the NC-Spectrum Graph,0,429.22,X0,Y0,174.11,273.11,87.10,360.12,X2,Y2,Y1,X1,Basic Question: Given a mass u, is there a protein sequence with that mass?Solution: dynamic programming via a Boolean array E( ).precision = 0.01.Boolean array length L = peptide mass W / precision.Boolean array E(u/0.01) = 1 if u is the mass of a peptide; otherwise 0. dynamic programming E(j) = 1 if only E(j mi) =1 for some amino acid mass mi.Running Time: (1) Computing E() takes O(L) time; or O(L/log L) via 4-Russian preprocessing. (2) Computing the edges takes O(k2) time.,52,Basic Algorithm (3),Input: a peptide mass W and a tandem mass spectrum S.Output: a feasible peptide sequence.Steps:Compute the nodes of the NC-spectrum graph G.Compute the edges of G.Compute a feasible path P in G.Convert P into a feasible sequence.,53,Compute a Feasible Path (1),0,429.22,X0,Y0,87.10,360.12,Y1,X1,0,429.22,X0,Y0,174.11,273.11,87.10,360.12,X2,Y2,Y1,X1,Recursion: Use the feasible paths of X0, Xi,Yj,Y0 to computethe feasible paths of X0, Xi, Xi+1,Yj+1,Yj,Y0.Dynamic Programming: M(i,j) = 1 if there exist a path PL from X0 to Xi and a path PR from Yj to Y0 such that PL and PR together contain exactly one of Xq and Yq for each q = 0, , maxi,j.Observation: There is a feasible path if and only if (1) for some i and k, there is an edge e from Xi to Yk and M(i,k) = 1, or(2) for some k and j, there is an edge e from Xk to Yj and M(k,j) = 1,54,Compute a Feasible Path (2),Dynamic Programming: M(i,j) = 1 if there exist a path PL from X0 to Xi and a path PR from Yj to Y0 such that PL and PR together contain exactly one of Xq and Yq for each q = 0, , maxi,j.Observation: There is a feasible path if and only if (1) for some i and k, there is an edge e from Xi to Yk and E(i,k) = 1, or(2) for some k and j, there is an edge e from Xk to Yj and E(k,j) = 1,X0,Y0,Yk,Xi,PL,PR,e,X0,Y0,Xk,PL,PR,e,Yj,55,Compute a Feasible Path (3),Dynamic Programming: M(i,j) = 1 if there exist a path PL from X0 to Xi and a path PR from Yj to Y0 such that PL and PR together contain exactly one of Xq and Yq for each q = 0, , maxi,j.Base Case: M(0,0), M(0,1), M(1,0).Recurrence: If M(i,j-1) = 1 and edge(Xi, Xj) = 1, then M(j,j-1) = 1.If M(i,j-1) = 1 and edge(Yj, Yj-1) = 1, then M(i,j) = 1.If M(j-1,i) = 1 and edge(Xj-1, Xj) = 1, then M(j,i) = 1.If M(j-1,i) = 1 and edge(Yj, Yi) = 1, then M(j-1,j) = 1.Idea: Extend PL and PR by one edge at a time.,56,Compute a Feasible Path (4),Dynamic Programming: M(i,j) = 1 if there exist a path PL from X0 to Xi and a path PR from Yj to Y0 such that PL and PR together contain exactly one of Xq and Yq for each q = 0, , maxi,j.Recurrence: If M(i,j-1) = 1 and edge(Xi, Xj) = 1, then M(j,j-1) = 1.If M(i,j-1) = 1 and edge(Yj, Yj-1) = 1, then M(i,j) = 1.If M(j-1,i) = 1 and edge(Xj-1, Xj) = 1, then M(j,i) = 1.If M(j-1,i) = 1 and edge(Yj, Yi) = 1, then M(j-1,j) = 1.,X0,Y0,Yj-1,Xi,PL,PR,e,Xj,Yj,X0,Y0,Yj-1,Xi,PL,PR,e,Xj,Yj,57,Compute a Feasible Path (5),Dynamic Programming: M(i,j) = 1 if there exist a path PL from X0 to Xi and a path PR from Yj to Y0 such that PL and PR together contain exactly one of Xq and Yq for each q = 0, , maxi,j.Recurrence: If M(i,j-1) = 1 and edge(Xi, Xj) = 1, then M(j,j-1) = 1.If M(i,j-1) = 1 and edge(Yj, Yj-1) = 1, then M(i,j) = 1.If M(j-1,i) = 1 and edge(Xj-1, Xj) = 1, then M(j,i) = 1.If M(j-1,i) = 1 and edge(Yj, Yi) = 1, then M(j-1,j) = 1.Computational Complexity: O(k2).,58,Algorithmic Result #1: Finding a Feasible Path,Input: an NC-Spectrum Graph G=(V,E)Output: a feasible path in G.Computational Complexity: O(|V|2) time & O(|V|2) space.,59,Outline of This Talk (4),Problem Formulation (Biology)Problem Formulation (Computer Science)Basic Computational TechniquesImproved Computational Complexity and More Robust AlgorithmsConclusions,60,Algorithmic Result #2: Finding a Feasible Path (Improved),Input: an NC-spectrum graph G=(V,E).Output: A feasible path can be found in O(|V|+|E|) time.Idea: Speed up via pre-processing.,61,Amino Acid Modifications,A modification is an amino acid with slightly different atoms (and thus a different mass) from the typical molecule. Importance of modifications: Amino acid modifications are related to functions. For example, a protein is active when phosphorylated and inactive when de-phosphorylated.,62,Modification in the Tandem Mass Spectrum,Mass / Charge,Abundance (100%),200,50,100,400,S,W+d,R,R,W+d,S,63,Spectrum Graph: As Before,64,Spectrum Graph: A Modified Feasible Path,Idea: One mass change leads to one missing edge.,65,Algorithmic Result #3: Finding One Modification,A modification is an amino acid with slightly different atoms (and thus a different mass) from the typical molecule.Theorem: Finding the position of the modification takes O(|V|+|E|) space and O(|V| |E|) time.,66,Algorithmic Result #4: Noisy Data,Define a scoring function s():s(edge) = function(mass).s(node) = function(abundance).Redefine the problem: Find the maximum score paththat goes through at most one node for each ion. Solution: dynamic programming in O(|V|+|E|) space and O(|V| |E|) time.,67,Outline of This Talk (5),Problem Formulation (Biology)Problem Formulation (Computer Science)Basic Computational TechniquesImproved Computational Complexity and More Robust AlgorithmsConclusions,68,Further Difficulties for Tandem Mass Spectrum Interpretation,Each ion has a couple of isotopic forms.Other ions (a or z) may appear.Some ions may lose a water or an ammonia.Multiple ion charges.Noise.Amino acid modifications.,69,Further Research Directions,Efficient algorithms to deal with more modifications in conjunction with data noise.Efficient algorithms to combine de novo peptide sequencing with peptide database search.Efficient algorithms to assess statistical significance of feasible peptide sequences.Efficient algorithms to deal with multiple peptides.Practical implementation; speed-up via preprocessing.More ,70,Further Research Directions,Looking for top-rate graduate students for this project (and other projects).Immediate and expedited admission for the coming fall semester.,71,我劝一个草率结婚的朋友离婚。她平静的告诉我,如果说当初鲁莽结婚是个错误。那么,现在草率离婚是一错再错。这位朋友后来还是离婚了,大家一致认为她的行为很理性。同样的故事正在互联网搜索巨头谷歌身上发生,但是谷歌选择了草率“离婚”。饮鸩止渴由于急于抑制苹果iphone手机翻天覆地的产业冲击,谷歌采取急功近利的粗糙型开放策略。饮鸩止渴的策略一时取得了成果。市场研究公司尼尔森最近公布的数据显示,在通过Verizon Wireless、AT&T和Sprint Nextel三大运营商经销后,谷歌Android手机在美国市场上的销售量已经超过iPhone。另一家市场研究公司iSuppli甚至认为,全球范围内使用Android操作系统的手机数量将在 2012年超过苹果iPhone。 表面繁华的背后,是Android生态系统的一团糟,谷歌正在为自己的粗放型开放策略买单。用户对谷歌手机的态度从开始的好奇、后来的犹豫,变成强烈的批评。“大多Android手机程序都是垃圾,乱七八糟的”,一位手机发烧友迅速投奔了iphone的阵营:“同样的植物人大战僵尸游戏,在谷歌手机和iphone手机上的体验简直没法比”混乱,还是混乱。一切一切的乱象,折射出谷歌已经失去对Android生态系统的控制。这一切的根源,我的判断是开放策略初期过于宽松,导致失去控制权。混乱的生态系统表现在用户手机上,就是应用程序的混乱和粗燥。一错再错为此,谷歌开始采取对策。最近,有国内厂商称新的Android3.0开始关闭应用程序的API(应用程序编程接口),统一Android界面。这意味着,谷歌将放弃其初始开放策略,开始封闭管理。粗看之下,谷歌认识到自己的错误。既然是过度开放导致的错误,那么收紧开放尺度是很自然的逻辑,无懈可击。但我认为,谷歌仓促收紧开放策略仍然是个错误。打个比喻。如果过度开放的政策是草率结婚,那么草率的封闭就是草率离婚。这么判断的原因很简单,谷歌把Android开放出去的那一天,Android已经不属于谷歌。谷歌没有认识到这一点,还以为Android只是自己的。合作伙伴对谷歌封闭政策的反应加强了我的判断结论。经济观察报报道,国内第一家生产基于Android平台手机的设计公司创杨通信,近日已经被迫出售。创杨通信负责人给出的出售理由是,“因为不愿意甘当炮灰而选择放弃。”按照目前Android3.0将统一界面的想法,未来的手机市场将出现毫无差异化的产品。这对于企业来说,几乎意味着不可避免的价格战。利润空间的微薄,导致合作伙伴生存环境恶劣。于是大量退出几乎是一种必然。除了为合作伙伴找到新的利益空间,谷歌还将面临开放阵营精神层面的声讨,这对谷歌的挑战会更大。如果说谷歌为了自己竞争的私利利用了开放,赢得了名声。那么,谷歌不能一脚把开放踢开,他现在还需要为这种名声买单。如果只顾自己收网,谷歌会被面临铺天盖地的道德谴责。谷歌,希望你准备好了,三思而行。木桶效应就是指一个水桶无论有多高,它盛水的高度取决于其中最低的那块木板。这在选购手机的时候也同样适用。尤其是很多用户在购买手机的时候, 都会专注于某一个参数,比如要求处理器主频要高达1GHz,但一部手机的整机表现是由多个因素组成,所以在购买手机的时候一定要从整体的角度上来看一部手 机的性除了处理器主频以外,其实还有很多影响整机表现的元素,比如运行内存(RAM)、机身内存(ROM)、操作系统、厂商对系统的优化都会有所影 响。不过在很多用户眼中,这几项却远没有处理器主频重要。而如果忽略这几项的话,可能买到一个主频很高,但整机性能却仍然不令人满意的机型。用户在选机过程中切勿只关注一个硬件参数,这样很可能并不能买到一款理想的机型,就算买的手机主频再高,运行内存低的话仍然无法同时运行数个程 序、机身内存小的话则无法把太多的软件安装到机器上,这对整机耗电和运行速度方面都有所影响、而厂商优化更是决定着一部手机的稳定性和部分运行速度。综上 所述,大家在选购手机的时候一定要综合考虑一款手机的硬件规格。除此之外,也不要把硬件看的太过重要,就比如苹果iPhon

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论