




已阅读5页,还剩65页未读, 继续免费阅读
版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
1,An Algorithmic Approach to Peptide Sequencing via Tandem Mass Spectrometry,Ming-Yang KaoDepartment of Computer ScienceNorthwestern UniversityEvanston, IllinoisU. S. A.,2,Collaborators of This Project,University of Southern CaliforniaTing Chen Harvard Medical SchoolGeorge M. ChurchJohn RushMatthew Tepel,3,Perspectives,A key goal of bioinformatics: To study biological systems based on global knowledge of genomes, transcriptomes, and proteomes.Genome: entire sets of materials in the chromosomes.Transcriptome: entire sets of gene transcripts.Proteome: entire sets of proteins.,4,Perspectives,A key goal of bioinformatics: To study biological systems based on global knowledge of genomes, transcriptomes, and proteomes.Genome: entire sets of materials in the chromosomes.Transcriptome: entire sets of gene transcripts.Proteome: entire sets of proteins.,this talks focus,5,Proteomics,Proteome: all proteins encoded within a genomehalf millions distinct proteins (temporal, spatial, modifications)30,000 human genesmRNA and protein expressions may not correlateProteomics: study of protein expression by biological systemsrelative abundance and stability; post-translational modificationsfluctuations as a response to environment and altered cellular needscorrelations between protein expression and disease stateprotein-protein interactions, protein complexesTechnologies:2D gel electrophoresis mass spectrometryyeast two-hybrid systemprotein chips,this talks focus,6,A Key Step of Proteomics,How to sequence proteins?How to sequence protein peptides? (this talks focus),7,Outline of This Talk,Problem Formulation (Biology)Problem Formulation (Computer Science)Basic Computational TechniquesImproved Computational Complexity and More Robust AlgorithmsConclusions,8,Outline of This Talk (1),Problem Formulation (Biology)Problem Formulation (Computer Science)Basic Computational TechniquesImproved Computational Complexity and More Robust AlgorithmsConclusions,9,Protein Identification: HPLC-MS-MS,Mass/ChargeTandem Mass Spectrum,Mass/Charge,Proteins,Peptides,One Peptide,B-ions / Y-ions,10,Protein Identification: HPLC-MS-MS,Mass/ChargeTandem Mass Spectrum,Mass/Charge,Proteins,Peptides,One Peptide,B-ions / Y-ions,11,Peptide Fragmentation and Ionization,B-ion,Y-ion,Complementary: Mass(B-ion)+Mass(Y-ion) = Mass(peptide)+4H+O,12,B-ions and Y-ions,Fragmentation,13,Tandem Mass Spectrum,Mass / Charge,Abundance (100%),200,50,88.033,100,400,175.113,274.112,361.121,430.213,448.225,14,Raw Tandem Mass Spectrum,15,Prediction from Raw Tandem Mass Spectrum,16,Protein Database Search,Find the peptide sequences in a protein database that optimally fit the spectrum.It does not work if the target peptide sequence is not in the database.It does not work if there is an unknown modification at some amino acid.It is very slow because it must search the entire database.E.g., SEQUEST, Yates, Univ. of Washington.,17,De Novo Peptide Sequencing Problem,Input: (1) the mass W of an unknown target peptide, and (2) a set S of the masses of some or all b-ions and y-ions of the peptide.Output: a peptide P such that (1) mass(P)=W and (2) S is a subset of all the ion masses of P.,Mass / Charge,Abundance (100%),50,100,274.112,361.121,Peptide Mass 429.212 Daltons,P = SWR,Mass(P) = 429.212,Ions(P) = 88.033, 175.113, 274.112, 361.121, 430.213, 448.225,18,Tandem Mass Spectrum,Mass / Charge,Abundance (100%),200,50,88.033,100,400,175.113,274.112,361.121,430.213,448.225,Peptide Mass 429.212 Daltons,19,Amino Acid Mass Table,20,Feature 1,All B-ions form a forward mass ladder.,Mass / Charge,Abundance (100%),200,50,88.033,100,400,175.113,274.112,361.121,430.213,448.225,S,W,R,Peptide Mass 429.212 Daltons,b1,b2,b3,1,21,Feature 2,All Y-ions form a reverse mass ladder.,Mass / Charge,Abundance (100%),200,50,88.033,100,400,175.113,274.112,361.121,430.213,448.225,S,W,R,R,W,S,Peptide Mass 429.212 Daltons,y1,y2,y3,19,22,Basic Difficulty #1,It is unknown whether an ion is a B-ion or an Y-ion.,Mass / Charge,Abundance (100%),200,50,88.033,100,400,175.113,274.112,361.121,430.213,448.225,Peptide Mass 429.212 Daltons,23,Basic Difficulty #2,There are missing ions.,Mass / Charge,Abundance (100%),200,50,100,400,274.112,361.121,Ion 1,Ion 2,Peptide Mass 429.212 Daltons,24,Feature 3 (to our Rescue),Complementary Ion Pairs: b1/y2 and b2/y1,Mass / Charge,Abundance (100%),200,50,88.033,100,400,175.113,274.112,361.121,430.213,448.225,S,W,R,R,W,S,Peptide Mass 429.212 Daltons,y1,y2,y3,b1,b2,b3,25,Outline of This Talk (2),Problem Formulation (Biology)Problem Formulation (Computer Science)Basic Computational TechniquesImproved Computational Complexity and More Robust AlgorithmsConclusions,26,Formulating the Computational Problem,T = an alphabet of 20 characters a1,a2,a20.two special characters: alpha and beta.the mass of alpha = 1, the mass of beta = 19, the mass of ai is mi.A peptide sequence is x1,x2,x3,xn-1,xn, where each xi is from T.A b-ion is x0,x1,x2,xi for some 1 = i = n, where x0 = alpha.A y-ion is xi,xn-2,xn-1,xn,xn+1 for some 1 = i = n, where xn+1 = beta.,27,De Novo Peptide Sequencing Problem,Input: (1) the mass W of an unknown target peptide, and (2) a set S of the masses of some or all b-ions and y-ions of the peptide.Output: a peptide P such that (1) mass(P)=W and (2) S is a subset of all the ion masses of P.,Mass / Charge,Abundance (100%),50,100,274.112,361.121,Peptide Mass 429.212 Daltons,P = SWR,Mass(P) = 429.212,Ions(P) = 88.033, 175.113, 274.112, 361.121, 430.213, 448.225,28,Amino Acid Mass Table,29,Outline of This Talk (3),Problem Formulation (Biology)Problem Formulation (Computer Science)Basic Computational TechniquesImproved Computational Complexity and More Robust AlgorithmsConclusions,30,peptide mass W,tandem mass spectrum S,NC-spectrum graph,Find feasible paths to order the masses in S to identify all the b-ions and y-ions consistent with S.,Basic Computing Scheme,Convert feasible paths into legal peptide sequences,31,NC-Spectrum Graph: Nodes (1),0,429.22,N0,C0,mass of this peptide,32,NC-Spectrum Graph: Nodes (2),mass of this peptide,0,429.22,N0,C0,174.11,273.11,mass( ) + mass( ) = mass(P) + 18,Ion # 1 (274.11),Assumption 1:If Ion 1 is an y-ionC1: a b-ion node,Assumption 2:If Ion 1 is a b-ionN1: a b-ion node,C1,N1,33,NC-Spectrum Graph: Nodes (3),0,429.22,N0,C0,174.11,273.11,mass( ) + mass( ) = mass(P) + 18,Ion # 2 (88.10),87.10,360.12,C1,N1,C2,N2,34,NC-Spectrum Graph: Edges (1),0,429.22,N0,C0,174.11,273.11,87.10,360.12,C1,N1,C2,N2,Mass(S) = 87.08.,S,35,NC-Spectrum Graph: Edges (2),0,429.22,N0,C0,174.11,273.11,87.10,360.12,C1,N1,C2,N2,Mass(S) = 87.08.,S,Mass(W) = 186.21,W,36,NC-Spectrum Graph: Edges (3),0,429.22,N0,C0,174.11,273.11,87.10,360.12,C1,N1,C2,N2,Mass(S) = 87.08.,S,Mass(W) = 186.21,W,S+W,Mass(S+W) = 273.29,37,NC-Spectrum Graph: Edges (4),0,429.22,N0,C0,174.11,273.11,87.10,360.12,C1,N1,C2,N2,Mass(S) = 87.08.,S,Mass(W) = 186.21,W,S+W,Mass(S+W) = 273.29,R,Mass(R) = 156.19,38,NC-Spectrum Graph,0,429.22,N0,C0,174.11,273.11,87.10,360.12,C1,N1,C2,N2,39,NC-Spectrum Graph: Paths = Sequences,0,429.22,N0,C0,174.11,273.11,87.10,360.12,C1,N1,C2,N2,S,W,R,b-ions,40,NC-Spectrum Graph: A Feasible Path (1),0,429.22,N0,C0,174.11,273.11,87.10,360.12,C1,N1,C2,N2,Definition: A feasible path is a path from N0 to C0 that goes through exactly one node for each pair (either Nj or Cj).,a feasible path,S,W,R,b-ions,41,NC-Spectrum Graph: A Feasible Path (2),0,429.22,N0,C0,174.11,273.11,87.10,360.12,C1,N1,C2,N2,Definition: A feasible path is a path from N0 to C0 that goes through exactly one node for each pair (either Nj or Cj).,a feasible path,S,S,GVV,b-ions,y-ions,42,NC-Spectrum Graph: Not A Feasible Path (1),0,429.22,N0,C0,174.11,273.11,87.10,360.12,C1,N1,C2,N2,Definition: A feasible path is a path from N0 to C0 that goes through exactly one node for each pair (either Nj or Cj).,not a feasible path:miss ion #2,43,NC-Spectrum Graph: Not A Feasible Path (2),0,429.22,N0,C0,174.11,273.11,87.10,360.12,C1,N1,C2,N2,Definition: A feasible path is a path from N0 to C0 that goes through exactly one node for each pair (either Nj or Cj).,not a feasible path:(2) repeat ion #1,44,NC-Spectrum Graph: Not A Feasible Path (3),0,429.22,N0,C0,174.11,273.11,87.10,360.12,C1,N1,C2,N2,Definition: A feasible path is a path from N0 to C0 that goes through exactly one node for each pair (either Nj or Cj).,not a feasible path:miss ion #2repeat ion #1,45,Reformulating the De Novo Peptide Sequencing Problem,Input: an NC-spectrum graph G.Output: a feasible path from N0 to C0.,46,Observations,A longest path does not always go through exactly one of each pair of nodes.It is an NP-hard problem if the spectrum graph is a general directed graph.,47,Basic Algorithm,Input: a peptide mass W and a tandem mass spectrum S.Output: a feasible peptide sequence.Steps:Compute the nodes of the NC-spectrum graph G.Compute the edges of G.Compute a feasible path P in G.Convert P into a feasible sequence.,48,Basic Algorithm (1),Input: a peptide mass W and a tandem mass spectrum S.Output: a feasible peptide sequence.Steps:Compute the nodes of the NC-spectrum graph G.Compute the edges of G.Compute a feasible path P in G.Convert P into a feasible sequence.,49,Compute the Nodes of the NC-Spectrum Graph,Step 2. Rename the nodes from left to right as X0, Xk,Yk,Y0,0,429.22,X0,Y0,174.11,273.11,87.10,360.12,X2,Y2,Y1,X1,0,429.22,N0,C0,174.11,273.11,87.10,360.12,C1,N1,C2,N2,Step 1. Compute the nodes and place them in the increasing order of masses.,Observation: Xi and Yi form a complementary pair of nodes Ni and Ci for ion i.Running Time: O(k), where k = # of masses in the spectrum.,50,Basic Algorithm (2),Input: a peptide mass W and a tandem mass spectrum S.Output: a feasible peptide sequence.Steps:Compute the nodes of the NC-spectrum graph G.Compute the edges of G. inverse of each otherCompute a feasible path P in G.Convert P into a feasible sequence.,51,Compute the Edges of the NC-Spectrum Graph,0,429.22,X0,Y0,174.11,273.11,87.10,360.12,X2,Y2,Y1,X1,Basic Question: Given a mass u, is there a protein sequence with that mass?Solution: dynamic programming via a Boolean array E( ).precision = 0.01.Boolean array length L = peptide mass W / precision.Boolean array E(u/0.01) = 1 if u is the mass of a peptide; otherwise 0. dynamic programming E(j) = 1 if only E(j mi) =1 for some amino acid mass mi.Running Time: (1) Computing E() takes O(L) time; or O(L/log L) via 4-Russian preprocessing. (2) Computing the edges takes O(k2) time.,52,Basic Algorithm (3),Input: a peptide mass W and a tandem mass spectrum S.Output: a feasible peptide sequence.Steps:Compute the nodes of the NC-spectrum graph G.Compute the edges of G.Compute a feasible path P in G.Convert P into a feasible sequence.,53,Compute a Feasible Path (1),0,429.22,X0,Y0,87.10,360.12,Y1,X1,0,429.22,X0,Y0,174.11,273.11,87.10,360.12,X2,Y2,Y1,X1,Recursion: Use the feasible paths of X0, Xi,Yj,Y0 to computethe feasible paths of X0, Xi, Xi+1,Yj+1,Yj,Y0.Dynamic Programming: M(i,j) = 1 if there exist a path PL from X0 to Xi and a path PR from Yj to Y0 such that PL and PR together contain exactly one of Xq and Yq for each q = 0, , maxi,j.Observation: There is a feasible path if and only if (1) for some i and k, there is an edge e from Xi to Yk and M(i,k) = 1, or(2) for some k and j, there is an edge e from Xk to Yj and M(k,j) = 1,54,Compute a Feasible Path (2),Dynamic Programming: M(i,j) = 1 if there exist a path PL from X0 to Xi and a path PR from Yj to Y0 such that PL and PR together contain exactly one of Xq and Yq for each q = 0, , maxi,j.Observation: There is a feasible path if and only if (1) for some i and k, there is an edge e from Xi to Yk and E(i,k) = 1, or(2) for some k and j, there is an edge e from Xk to Yj and E(k,j) = 1,X0,Y0,Yk,Xi,PL,PR,e,X0,Y0,Xk,PL,PR,e,Yj,55,Compute a Feasible Path (3),Dynamic Programming: M(i,j) = 1 if there exist a path PL from X0 to Xi and a path PR from Yj to Y0 such that PL and PR together contain exactly one of Xq and Yq for each q = 0, , maxi,j.Base Case: M(0,0), M(0,1), M(1,0).Recurrence: If M(i,j-1) = 1 and edge(Xi, Xj) = 1, then M(j,j-1) = 1.If M(i,j-1) = 1 and edge(Yj, Yj-1) = 1, then M(i,j) = 1.If M(j-1,i) = 1 and edge(Xj-1, Xj) = 1, then M(j,i) = 1.If M(j-1,i) = 1 and edge(Yj, Yi) = 1, then M(j-1,j) = 1.Idea: Extend PL and PR by one edge at a time.,56,Compute a Feasible Path (4),Dynamic Programming: M(i,j) = 1 if there exist a path PL from X0 to Xi and a path PR from Yj to Y0 such that PL and PR together contain exactly one of Xq and Yq for each q = 0, , maxi,j.Recurrence: If M(i,j-1) = 1 and edge(Xi, Xj) = 1, then M(j,j-1) = 1.If M(i,j-1) = 1 and edge(Yj, Yj-1) = 1, then M(i,j) = 1.If M(j-1,i) = 1 and edge(Xj-1, Xj) = 1, then M(j,i) = 1.If M(j-1,i) = 1 and edge(Yj, Yi) = 1, then M(j-1,j) = 1.,X0,Y0,Yj-1,Xi,PL,PR,e,Xj,Yj,X0,Y0,Yj-1,Xi,PL,PR,e,Xj,Yj,57,Compute a Feasible Path (5),Dynamic Programming: M(i,j) = 1 if there exist a path PL from X0 to Xi and a path PR from Yj to Y0 such that PL and PR together contain exactly one of Xq and Yq for each q = 0, , maxi,j.Recurrence: If M(i,j-1) = 1 and edge(Xi, Xj) = 1, then M(j,j-1) = 1.If M(i,j-1) = 1 and edge(Yj, Yj-1) = 1, then M(i,j) = 1.If M(j-1,i) = 1 and edge(Xj-1, Xj) = 1, then M(j,i) = 1.If M(j-1,i) = 1 and edge(Yj, Yi) = 1, then M(j-1,j) = 1.Computational Complexity: O(k2).,58,Algorithmic Result #1: Finding a Feasible Path,Input: an NC-Spectrum Graph G=(V,E)Output: a feasible path in G.Computational Complexity: O(|V|2) time & O(|V|2) space.,59,Outline of This Talk (4),Problem Formulation (Biology)Problem Formulation (Computer Science)Basic Computational TechniquesImproved Computational Complexity and More Robust AlgorithmsConclusions,60,Algorithmic Result #2: Finding a Feasible Path (Improved),Input: an NC-spectrum graph G=(V,E).Output: A feasible path can be found in O(|V|+|E|) time.Idea: Speed up via pre-processing.,61,Amino Acid Modifications,A modification is an amino acid with slightly different atoms (and thus a different mass) from the typical molecule. Importance of modifications: Amino acid modifications are related to functions. For example, a protein is active when phosphorylated and inactive when de-phosphorylated.,62,Modification in the Tandem Mass Spectrum,Mass / Charge,Abundance (100%),200,50,100,400,S,W+d,R,R,W+d,S,63,Spectrum Graph: As Before,64,Spectrum Graph: A Modified Feasible Path,Idea:
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 美术表情创意课件
- 2025年电商内容营销策略优化:种草经济下的品牌战略研究报告
- 2025年事业单位工勤技能-湖北-湖北广播电视天线工五级(初级工)历年参考题库含答案解析
- 2025年事业单位工勤技能-海南-海南铸造工四级(中级工)历年参考题库含答案解析
- 2025年零售门店数字化智能化门店导购系统技术应用与用户体验优化案例研究报告
- 2025年事业单位工勤技能-河南-河南计量检定工二级(技师)历年参考题库典型考点含答案解析
- 2025年事业单位工勤技能-江苏-江苏工程测量员四级(中级工)历年参考题库含答案解析(5套)
- 2025年事业单位工勤技能-广西-广西政务服务办事员一级(高级技师)历年参考题库含答案解析
- 2025年事业单位工勤技能-广东-广东防疫员四级(中级工)历年参考题库典型考点含答案解析
- 2025年事业单位工勤技能-广东-广东机械冷加工三级(高级工)历年参考题库典型考点含答案解析
- 2025年度运输业安全生产知识竞赛试题(附答案)
- 光伏居间的合同8篇
- GB/T 45418-2025配电网通用技术导则
- 医疗风险防控培训课件
- 机械设计部绩效考核制度
- 诊疗规范培训课件
- 《KANO模型培训》课件
- 复苏室患者的交接流程
- DB21-T 2523-2015矿山地质环境恢复治理规程
- 新能源集控中心建设方案
- 《中国老年糖尿病诊疗指南(2024版)》解读课件
评论
0/150
提交评论