Dynamic-Programming-Sequence-alignment:动态规划序列比对_第1页
Dynamic-Programming-Sequence-alignment:动态规划序列比对_第2页
Dynamic-Programming-Sequence-alignment:动态规划序列比对_第3页
Dynamic-Programming-Sequence-alignment:动态规划序列比对_第4页
Dynamic-Programming-Sequence-alignment:动态规划序列比对_第5页
已阅读5页,还剩43页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

DynamicProgramming:SequencealignmentCS466SaurabhSinhaDNASequenceComparison:FirstSuccessStoryFindingsequencesimilaritieswithgenesofknownfunctionisacommonapproachtoinferanewlysequencedgene’sfunctionIn1984RussellDoolittleandcolleaguesfoundsimilaritiesbetweencancer-causinggeneandnormalgrowthfactor(PDGF)geneAnormalgrowthgeneswitchedonatthewrongtimecausescancer!CysticFibrosisCysticfibrosis(CF)isachronicandfrequentlyfatalgeneticdiseaseofthebody'smucusglands.CFprimarilyaffectstherespiratorysystemsinchildren.SearchfortheCFgenewasnarrowedto~1Mbp,andtheregionwassequenced.Scannedadatabaseformatchestoknowngenes.AsegmentinthisregionmatchedthegeneforsomeATPbindingprotein(s).Theseproteinsarepartoftheiontransportchannel,andCFinvolvessweatsecretionswithabnormalsodiumcontent!RoleforBioinformaticsGenesimilaritiesbetweentwogeneswithknownandunknownfunctionalertbiologiststosomepossibilitiesComputingasimilarityscorebetweentwogenestellshowlikelyitisthattheyhavesimilarfunctionsDynamicprogrammingisatechniqueforrevealingsimilaritiesbetweengenesMotivatingDynamicProgrammingDynamicprogrammingexample:

ManhattanTouristProblemImagineseekingapath(fromsourcetosink)totravel(onlyeastwardandsouthward)withthemostnumberofattractions(*)intheManhattangridSink***********Source*Imagineseekingapath(fromsourcetosink)totravel(onlyeastwardandsouthward)withthemostnumberofattractions(*)intheManhattangridSink***********Source*Dynamicprogrammingexample:

ManhattanTouristProblemManhattanTouristProblem:FormulationGoal:Findthelongestpathinaweightedgrid.Input:AweightedgridGwithtwodistinctvertices,onelabeled“source”andtheotherlabeled“sink”Output:AlongestpathinG

from“source”to“sink”MTP:AnExample32407333013244564655822501230123jcoordinateicoordinate13sourcesink432401024331122241995152302034MTP:GreedyAlgorithmIsNotOptimal1252152340005303501035512promisingstart,butleadstobadchoices!sourcesink1822MTP:SimpleRecursiveProgramMT(n,m)

if

n=0orm=0

return

MT(n,m)x

MT(n-1,m)+

lengthoftheedgefrom(n-1,m)to(n,m)y

MT(n,m-1)+lengthoftheedgefrom(n,m-1)to(n,m)

return

max{x,y}What’swrongwiththisapproach?Here’swhat’swrongM(n,m)needsM(n,m-1)andM(n-1,m)BothoftheseneedM(n-1,m-1)SoM(n-1,m-1)willbecomputedatleasttwiceDynamicprogramming:thesameideaasthisrecursivealgorithm,butkeepallintermediateresultsinatableandreuse150101isource15S1,0

=5S0,1

=1CalculateoptimalpathscoreforeachvertexinthegraphEachvertex’sscoreisthemaximumofthepriorverticesscoreplustheweightoftherespectiveedgeinbetweenMTP:DynamicProgrammingjMTP:DynamicProgramming(cont’d)1253012012source13584S2,0=8iS1,1

=4S0,2=33-5jMTP:DynamicProgramming(cont’d)125301230123isource1358840581035-59131-5S3,0=8S2,1=9S1,2=13S3,0=8jMTP:DynamicProgramming(cont’d)greedyalg.fails!125-51-5-53053035010-3-501230123isource13858849138912S3,1=9S2,2=12S1,3=8jMTP:DynamicProgramming(cont’d)125-51-5-5330053035010-3-5-5201230123isource13858849138129159jS3,2=9S2,3=15MTP:DynamicProgramming(cont’d)125-51-5-5330053035010-3-5-5201230123isource13858849138129159j0116S3,3=16(showingallback-traces)Done!MTP:RecurrenceComputingthescoreforapoint(i,j)bytherecurrencerelation:si,j=maxsi-1,j+weightoftheedgebetween(i-1,j)and(i,j)si,j-1+weightoftheedgebetween(i,j-1)and(i,j)Therunningtimeisnxmforanbym

grid(n=#ofrows,m=#ofcolumns)

ManhattanIsNotAPerfectGridWhataboutdiagonals?ThescoreatpointBisgivenby:sB=maxofsA1+weightoftheedge(A1,B)sA2+weightoftheedge(A2,B)sA3+weightoftheedge(A3,B)BA3A1A2ManhattanIsNotAPerfectGrid(cont’d)Computingthescoreforpointxisgivenbytherecurrencerelation:sx=maxofsy+weightofvertex(y,x)where

yєPredecessors(x)

Predecessors(x)–setofverticesthathaveedgesleadingtox

TherunningtimeforagraphG(V,E)(VisthesetofallverticesandE

isthesetofalledges)isO(E)sinceeachedgeisevaluatedonceTravelingintheGridBythetimethevertexxisanalyzed,thevaluessy

forallitspredecessorsyshouldbecomputed–otherwiseweareintrouble.WeneedtotraversetheverticesinsomeorderForagrid,cantraverseverticesrowbyrow,columnbycolumn,ordiagonalbydiagonalTraversingtheManhattanGrid3differentstrategies:a)Columnbycolumnb)Rowbyrowc)Alongdiagonalsa)b)c)TraversingaDAGAnumberingofverticesofthegraphiscalledtopologicalorderingoftheDAGifeveryedgeoftheDAGconnectsavertexwithasmallerlabeltoavertexwithalargerlabelHowtoobtainatopologicalordering?AlignmentAligningDNASequencesV=ATCTGATGW=TGCATACn=8m=7ATCTGATGTGCATACV

Wmatchdeletioninsertionmismatchindels4122matches

mismatches

insertions

deletions

Alignment:2xkmatrix(k

m,n)LongestCommonSubsequence(LCS)–AlignmentwithoutMismatches

Giventwosequencesv=v1

v2…vm

andw=w1

w2…wn

TheLCSofv

andwisasequenceofpositionsin

v:1<i1<i2<…<it

<mandasequenceofpositionsin

w:1<j1<j2<…<jt

<nsuchthatit-thletterofvequalstojt-letterofw

andtismaximalLCS:ExampleAT--CTGATC--TGCT--A--Celementsofvelementsofw--A12012233435455666778jcoords:icoords:Matchesshownin

redpositionsinv:positionsinw:2<3<4<6<81<3<5<6<7Everycommonsubsequenceisapathin2-Dgrid00(0,0)

(1,0)(2,1)(2,2)(3,3)(3,4)(4,5)(5,5)(6,6)(7,6)(8,7)ComputingLCSLetvi=prefixofvoflengthi:v1…viandwj

=prefixofwoflengthj:w1…wjThelengthofLCS(vi,wj)iscomputedby:si,j=maxsi-1,jsi,j-1si-1,j-1+1ifvi=wj

LCSProblemasManhattanTouristProblemTGCATAC12345670iATCTGATC012345678jEditGraphforLCSProblemTGCATAC12345670iATCTGATC012345678jEditGraphforLCSProblemTGCATAC12345670iATCTGATC012345678jEverypathisacommonsubsequence.EverydiagonaledgeaddsanextraelementtocommonsubsequenceLCSProblem:FindapathwithmaximumnumberofdiagonaledgesBacktrackingsi,jallowsustocomputethelengthofLCSforviandwjsm,ngivesusthelengthoftheLCSforvandwHowdoweprinttheactualLCS?Ateachstep,wechoseanoptimaldecisionsi,j=max(…)RecordwhichofthechoiceswasmadeinordertoobtainthismaxComputingLCSLetvi=prefixofvoflengthi:v1…viandwj

=prefixofwoflengthj:w1…wjThelengthofLCS(vi,wj)iscomputedby:si,j=maxsi-1,jsi,j-1si-1,j-1+1ifvi=wj

PrintingLCS:BacktrackingPrintLCS(b,v,i,j)

if

i=0orj=0

return

if

bi,j=““

PrintLCS(b,v,i-1,j-1)

print

vi

else

if

bi,j=““

PrintLCS(b,v,i-1,j)

else

PrintLCS(b,v,i,j-1)FromLCStoAlignmentTheLongestCommonSubsequenceproblem—thesimplestformofsequencealignment–allowsonlyinsertionsanddeletions(nomismatches).IntheLCSProblem,wescored1formatchesand0forindelsConsiderpenalizingindelsandmismatcheswithnegativescoresSimplestscoringscheme:

+1:matchpremium

-μ:mismatchpenalty

-σ:indelpenaltySimpleScoringWhenmismatchesarepenalizedby–μ,indelsarepenalizedby–σ,andmatchesarerewardedwith+1,theresultingscoreis:

#matches–μ(#mismatches)–σ(#indels)TheGlobalAlignmentProblemFindthebestalignmentbetweentwostringsunderagivenscoringschemaInput:StringsvandwandascoringschemaOutput:Alignmentofmaximumscore

→=-σ

=1ifmatch=-µifmismatch

si-1,j-1+1ifvi=wjsi,j

=maxsi-1,j-1-µifvi≠wj

si-1,j-σ

si,j-1-σ

{m:mismatchpenaltyσ:indelpenaltyScoringMatricesTogeneralizescoring,considera(4+1)x(4+1)scoringmatrix

δ.Inthecaseofanaminoacidsequencealignment,thescoringmatrixwouldbea(20+1)x(20+1)size.Theadditionof1istoincludethescoreforcomparisonofagapcharacter“-”.Thiswillsimplifythealgorithmasfollows:

si-1,j-1+δ

(vi,wj)si,j=maxsi-1,j+δ

(vi,-)

si,j-1+δ

(-,wj){MakingaScoringMatrixScoringmatricesarecreatedbasedonbiologicalevidence.Alignmentscanbethoughtofastwosequencesthatdifferduetomutations.Someofthesemutationshavelittleeffectontheprotein’sfunction,thereforesomepenalties,δ(vi,wj),willbelessharshthanothers.ScoringMatrix:ExampleAKRANR-1+(-1)+(-2)+5+7+3=11ARNKA5-2-1-1R-7-13N--70K---6NoticethatalthoughRandKaredifferentaminoacids,theyhaveapositivescore.Why?Theyarebothpositivelychargedaminoacidswillnotgreatlychangefunctionofprotein.ConservationAminoacidchangesthattendtopreservethephysico-chemicalpropertiesoftheoriginalresiduePolartopolaraspartateglutamateNonpolartononpolaralaninevalineSimilarlybehavingresiduesleucinetoisoleucineLocalvs.GlobalAlignmentTheGlobalAlignmentProblemtriestofindthelongestpathbetweenvertices(0,0)and(n,m)intheeditgraph.TheLocalAlignmentProblemtriestofindthelongestpathamongpathsbetweenarbitraryvertices(i,j)and(i’,j’)intheeditgraph.Intheeditgraphwithnegatively-scorededges,LocalAlignmentmayscorehigherthanGlobalAlignmentLocalAlignment:ExampleGlobalalignmentLocalalignmentComputea“mini”GlobalAlignmenttogetLocalAlignmentLocalAlignments:Why?Twogenesindifferentspeciesmaybesimilarovershortcon

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论