SequencingviaTandemMass.ppt_第1页
SequencingviaTandemMass.ppt_第2页
SequencingviaTandemMass.ppt_第3页
SequencingviaTandemMass.ppt_第4页
SequencingviaTandemMass.ppt_第5页
已阅读5页,还剩65页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

1,AnAlgorithmicApproachtoPeptideSequencingviaTandemMassSpectrometry,Ming-YangKaoDepartmentofComputerScienceNorthwesternUniversityEvanston,IllinoisU.S.A.,2,CollaboratorsofThisProject,UniversityofSouthernCaliforniaTingChenHarvardMedicalSchoolGeorgeM.ChurchJohnRushMatthewTepel,3,Perspectives,Akeygoalofbioinformatics:Tostudybiologicalsystemsbasedonglobalknowledgeofgenomes,transcriptomes,andproteomes.Genome:entiresetsofmaterialsinthechromosomes.Transcriptome:entiresetsofgenetranscripts.Proteome:entiresetsofproteins.,4,Perspectives,Akeygoalofbioinformatics:Tostudybiologicalsystemsbasedonglobalknowledgeofgenomes,transcriptomes,andproteomes.Genome:entiresetsofmaterialsinthechromosomes.Transcriptome:entiresetsofgenetranscripts.Proteome:entiresetsofproteins.,thistalksfocus,5,Proteomics,Proteome:allproteinsencodedwithinagenomehalfmillionsdistinctproteins(temporal,spatial,modifications)30,000humangenesmRNAandproteinexpressionsmaynotcorrelateProteomics:studyofproteinexpressionbybiologicalsystemsrelativeabundanceandstability;post-translationalmodificationsfluctuationsasaresponsetoenvironmentandalteredcellularneedscorrelationsbetweenproteinexpressionanddiseasestateprotein-proteininteractions,proteincomplexesTechnologies:2Dgelelectrophoresismassspectrometryyeasttwo-hybridsystemproteinchips,thistalksfocus,6,AKeyStepofProteomics,Howtosequenceproteins?Howtosequenceproteinpeptides?(thistalksfocus),7,OutlineofThisTalk,ProblemFormulation(Biology)ProblemFormulation(ComputerScience)BasicComputationalTechniquesImprovedComputationalComplexityandMoreRobustAlgorithmsConclusions,8,OutlineofThisTalk(1),ProblemFormulation(Biology)ProblemFormulation(ComputerScience)BasicComputationalTechniquesImprovedComputationalComplexityandMoreRobustAlgorithmsConclusions,9,ProteinIdentification:HPLC-MS-MS,Mass/ChargeTandemMassSpectrum,Mass/Charge,Proteins,Peptides,OnePeptide,B-ions/Y-ions,10,ProteinIdentification:HPLC-MS-MS,Mass/ChargeTandemMassSpectrum,Mass/Charge,Proteins,Peptides,OnePeptide,B-ions/Y-ions,11,PeptideFragmentationandIonization,B-ion,Y-ion,Complementary:Mass(B-ion)+Mass(Y-ion)=Mass(peptide)+4H+O,12,B-ionsandY-ions,Fragmentation,13,TandemMassSpectrum,Mass/Charge,Abundance(100%),200,50,88.033,100,400,175.113,274.112,361.121,430.213,448.225,14,RawTandemMassSpectrum,15,PredictionfromRawTandemMassSpectrum,16,ProteinDatabaseSearch,Findthepeptidesequencesinaproteindatabasethatoptimallyfitthespectrum.Itdoesnotworkifthetargetpeptidesequenceisnotinthedatabase.Itdoesnotworkifthereisanunknownmodificationatsomeaminoacid.Itisveryslowbecauseitmustsearchtheentiredatabase.E.g.,SEQUEST,Yates,Univ.ofWashington.,17,DeNovoPeptideSequencingProblem,Input:(1)themassWofanunknowntargetpeptide,and(2)asetSofthemassesofsomeorallb-ionsandy-ionsofthepeptide.Output:apeptidePsuchthat(1)mass(P)=Wand(2)SisasubsetofalltheionmassesofP.,Mass/Charge,Abundance(100%),50,100,274.112,361.121,PeptideMass429.212Daltons,P=SWR,Mass(P)=429.212,Ions(P)=88.033,175.113,274.112,361.121,430.213,448.225,18,TandemMassSpectrum,Mass/Charge,Abundance(100%),200,50,88.033,100,400,175.113,274.112,361.121,430.213,448.225,PeptideMass429.212Daltons,19,AminoAcidMassTable,20,Feature1,AllB-ionsformaforwardmassladder.,Mass/Charge,Abundance(100%),200,50,88.033,100,400,175.113,274.112,361.121,430.213,448.225,S,W,R,PeptideMass429.212Daltons,b1,b2,b3,1,21,Feature2,AllY-ionsformareversemassladder.,Mass/Charge,Abundance(100%),200,50,88.033,100,400,175.113,274.112,361.121,430.213,448.225,S,W,R,R,W,S,PeptideMass429.212Daltons,y1,y2,y3,19,22,BasicDifficulty#1,ItisunknownwhetheranionisaB-ionoranY-ion.,Mass/Charge,Abundance(100%),200,50,88.033,100,400,175.113,274.112,361.121,430.213,448.225,PeptideMass429.212Daltons,23,BasicDifficulty#2,Therearemissingions.,Mass/Charge,Abundance(100%),200,50,100,400,274.112,361.121,Ion1,Ion2,PeptideMass429.212Daltons,24,Feature3(toourRescue),ComplementaryIonPairs:b1/y2andb2/y1,Mass/Charge,Abundance(100%),200,50,88.033,100,400,175.113,274.112,361.121,430.213,448.225,S,W,R,R,W,S,PeptideMass429.212Daltons,y1,y2,y3,b1,b2,b3,25,OutlineofThisTalk(2),ProblemFormulation(Biology)ProblemFormulation(ComputerScience)BasicComputationalTechniquesImprovedComputationalComplexityandMoreRobustAlgorithmsConclusions,26,FormulatingtheComputationalProblem,T=analphabetof20charactersa1,a2,a20.twospecialcharacters:alphaandbeta.themassofalpha=1,themassofbeta=19,themassofaiismi.Apeptidesequenceisx1,x2,x3,xn-1,xn,whereeachxiisfromT.Ab-ionisx0,x1,x2,xiforsome1=i=n,wherex0=alpha.Ay-ionisxi,xn-2,xn-1,xn,xn+1forsome1=i=n,wherexn+1=beta.,27,DeNovoPeptideSequencingProblem,Input:(1)themassWofanunknowntargetpeptide,and(2)asetSofthemassesofsomeorallb-ionsandy-ionsofthepeptide.Output:apeptidePsuchthat(1)mass(P)=Wand(2)SisasubsetofalltheionmassesofP.,Mass/Charge,Abundance(100%),50,100,274.112,361.121,PeptideMass429.212Daltons,P=SWR,Mass(P)=429.212,Ions(P)=88.033,175.113,274.112,361.121,430.213,448.225,28,AminoAcidMassTable,29,OutlineofThisTalk(3),ProblemFormulation(Biology)ProblemFormulation(ComputerScience)BasicComputationalTechniquesImprovedComputationalComplexityandMoreRobustAlgorithmsConclusions,30,peptidemassW,tandemmassspectrumS,NC-spectrumgraph,FindfeasiblepathstoorderthemassesinStoidentifyalltheb-ionsandy-ionsconsistentwithS.,BasicComputingScheme,Convertfeasiblepathsintolegalpeptidesequences,31,NC-SpectrumGraph:Nodes(1),0,429.22,N0,C0,massofthispeptide,32,NC-SpectrumGraph:Nodes(2),massofthispeptide,0,429.22,N0,C0,174.11,273.11,mass()+mass()=mass(P)+18,Ion#1(274.11),Assumption1:IfIon1isany-ionC1:ab-ionnode,Assumption2:IfIon1isab-ionN1:ab-ionnode,C1,N1,33,NC-SpectrumGraph:Nodes(3),0,429.22,N0,C0,174.11,273.11,mass()+mass()=mass(P)+18,Ion#2(88.10),87.10,360.12,C1,N1,C2,N2,34,NC-SpectrumGraph:Edges(1),0,429.22,N0,C0,174.11,273.11,87.10,360.12,C1,N1,C2,N2,Mass(S)=87.08.,S,35,NC-SpectrumGraph:Edges(2),0,429.22,N0,C0,174.11,273.11,87.10,360.12,C1,N1,C2,N2,Mass(S)=87.08.,S,Mass(W)=186.21,W,36,NC-SpectrumGraph:Edges(3),0,429.22,N0,C0,174.11,273.11,87.10,360.12,C1,N1,C2,N2,Mass(S)=87.08.,S,Mass(W)=186.21,W,S+W,Mass(S+W)=273.29,37,NC-SpectrumGraph:Edges(4),0,429.22,N0,C0,174.11,273.11,87.10,360.12,C1,N1,C2,N2,Mass(S)=87.08.,S,Mass(W)=186.21,W,S+W,Mass(S+W)=273.29,R,Mass(R)=156.19,38,NC-SpectrumGraph,0,429.22,N0,C0,174.11,273.11,87.10,360.12,C1,N1,C2,N2,39,NC-SpectrumGraph:Paths=Sequences,0,429.22,N0,C0,174.11,273.11,87.10,360.12,C1,N1,C2,N2,S,W,R,b-ions,40,NC-SpectrumGraph:AFeasiblePath(1),0,429.22,N0,C0,174.11,273.11,87.10,360.12,C1,N1,C2,N2,Definition:AfeasiblepathisapathfromN0toC0thatgoesthroughexactlyonenodeforeachpair(eitherNjorCj).,afeasiblepath,S,W,R,b-ions,41,NC-SpectrumGraph:AFeasiblePath(2),0,429.22,N0,C0,174.11,273.11,87.10,360.12,C1,N1,C2,N2,Definition:AfeasiblepathisapathfromN0toC0thatgoesthroughexactlyonenodeforeachpair(eitherNjorCj).,afeasiblepath,S,S,GVV,b-ions,y-ions,42,NC-SpectrumGraph:NotAFeasiblePath(1),0,429.22,N0,C0,174.11,273.11,87.10,360.12,C1,N1,C2,N2,Definition:AfeasiblepathisapathfromN0toC0thatgoesthroughexactlyonenodeforeachpair(eitherNjorCj).,notafeasiblepath:mission#2,43,NC-SpectrumGraph:NotAFeasiblePath(2),0,429.22,N0,C0,174.11,273.11,87.10,360.12,C1,N1,C2,N2,Definition:AfeasiblepathisapathfromN0toC0thatgoesthroughexactlyonenodeforeachpair(eitherNjorCj).,notafeasiblepath:(2)repeation#1,44,NC-SpectrumGraph:NotAFeasiblePath(3),0,429.22,N0,C0,174.11,273.11,87.10,360.12,C1,N1,C2,N2,Definition:AfeasiblepathisapathfromN0toC0thatgoesthroughexactlyonenodeforeachpair(eitherNjorCj).,notafeasiblepath:mission#2repeation#1,45,ReformulatingtheDeNovoPeptideSequencingProblem,Input:anNC-spectrumgraphG.Output:afeasiblepathfromN0toC0.,46,Observations,Alongestpathdoesnotalwaysgothroughexactlyoneofeachpairofnodes.ItisanNP-hardproblemifthespectrumgraphisageneraldirectedgraph.,47,BasicAlgorithm,Input:apeptidemassWandatandemmassspectrumS.Output:afeasiblepeptidesequence.Steps:ComputethenodesoftheNC-spectrumgraphG.ComputetheedgesofG.ComputeafeasiblepathPinG.ConvertPintoafeasiblesequence.,48,BasicAlgorithm(1),Input:apeptidemassWandatandemmassspectrumS.Output:afeasiblepeptidesequence.Steps:ComputethenodesoftheNC-spectrumgraphG.ComputetheedgesofG.ComputeafeasiblepathPinG.ConvertPintoafeasiblesequence.,49,ComputetheNodesoftheNC-SpectrumGraph,Step2.RenamethenodesfromlefttorightasX0,Xk,Yk,Y0,0,429.22,X0,Y0,174.11,273.11,87.10,360.12,X2,Y2,Y1,X1,0,429.22,N0,C0,174.11,273.11,87.10,360.12,C1,N1,C2,N2,Step1.Computethenodesandplacethemintheincreasingorderofmasses.,Observation:XiandYiformacomplementarypairofnodesNiandCiforioni.RunningTime:O(k),wherek=#ofmassesinthespectrum.,50,BasicAlgorithm(2),Input:apeptidemassWandatandemmassspectrumS.Output:afeasiblepeptidesequence.Steps:ComputethenodesoftheNC-spectrumgraphG.ComputetheedgesofG.inverseofeachotherComputeafeasiblepathPinG.ConvertPintoafeasiblesequence.,51,ComputetheEdgesoftheNC-SpectrumGraph,0,429.22,X0,Y0,174.11,273.11,87.10,360.12,X2,Y2,Y1,X1,BasicQuestion:Givenamassu,isthereaproteinsequencewiththatmass?Solution:dynamicprogrammingviaaBooleanarrayE().precision=0.01.BooleanarraylengthL=peptidemassW/precision.BooleanarrayE(u/0.01)=1ifuisthemassofapeptide;otherwise0.dynamicprogrammingE(j)=1ifonlyE(jmi)=1forsomeaminoacidmassmi.RunningTime:(1)ComputingE()takesO(L)time;orO(L/logL)via4-Russianpreprocessing.(2)ComputingtheedgestakesO(k2)time.,52,BasicAlgorithm(3),Input:apeptidemassWandatandemmassspectrumS.Output:afeasiblepeptidesequence.Steps:ComputethenodesoftheNC-spectrumgraphG.ComputetheedgesofG.ComputeafeasiblepathPinG.ConvertPintoafeasiblesequence.,53,ComputeaFeasiblePath(1),0,429.22,X0,Y0,87.10,360.12,Y1,X1,0,429.22,X0,Y0,174.11,273.11,87.10,360.12,X2,Y2,Y1,X1,Recursion:UsethefeasiblepathsofX0,Xi,Yj,Y0tocomputethefeasiblepathsofX0,Xi,Xi+1,Yj+1,Yj,Y0.DynamicProgramming:M(i,j)=1ifthereexistapathPLfromX0toXiandapathPRfromYjtoY0suchthatPLandPRtogethercontainexactlyoneofXqandYqforeachq=0,maxi,j.Observation:Thereisafeasiblepathifandonlyif(1)forsomeiandk,thereisanedgeefromXitoYkandM(i,k)=1,or(2)forsomekandj,thereisanedgeefromXktoYjandM(k,j)=1,54,ComputeaFeasiblePath(2),DynamicProgramming:M(i,j)=1ifthereexistapathPLfromX0toXiandapathPRfromYjtoY0suchthatPLandPRtogethercontainexactlyoneofXqandYqforeachq=0,maxi,j.Observation:Thereisafeasiblepathifandonlyif(1)forsomeiandk,thereisanedgeefromXitoYkandE(i,k)=1,or(2)forsomekandj,thereisanedgeefromXktoYjandE(k,j)=1,X0,Y0,Yk,Xi,PL,PR,e,X0,Y0,Xk,PL,PR,e,Yj,55,ComputeaFeasiblePath(3),DynamicProgramming:M(i,j)=1ifthereexistapathPLfromX0toXiandapathPRfromYjtoY0suchthatPLandPRtogethercontainexactlyoneofXqandYqforeachq=0,maxi,j.BaseCase:M(0,0),M(0,1),M(1,0).Recurrence:IfM(i,j-1)=1andedge(Xi,Xj)=1,thenM(j,j-1)=1.IfM(i,j-1)=1andedge(Yj,Yj-1)=1,thenM(i,j)=1.IfM(j-1,i)=1andedge(Xj-1,Xj)=1,thenM(j,i)=1.IfM(j-1,i)=1a

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论