版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
DynamicProgramming:SequencealignmentCS466SaurabhSinhaDNASequenceComparison:FirstSuccessStoryFindingsequencesimilaritieswithgenesofknownfunctionisacommonapproachtoinferanewlysequencedgene’sfunctionIn1984RussellDoolittleandcolleaguesfoundsimilaritiesbetweencancer-causinggeneandnormalgrowthfactor(PDGF)geneAnormalgrowthgeneswitchedonatthewrongtimecausescancer!CysticFibrosisCysticfibrosis(CF)isachronicandfrequentlyfatalgeneticdiseaseofthebody'smucusglands.CFprimarilyaffectstherespiratorysystemsinchildren.SearchfortheCFgenewasnarrowedto~1Mbp,andtheregionwassequenced.Scannedadatabaseformatchestoknowngenes.AsegmentinthisregionmatchedthegeneforsomeATPbindingprotein(s).Theseproteinsarepartoftheiontransportchannel,andCFinvolvessweatsecretionswithabnormalsodiumcontent!RoleforBioinformaticsGenesimilaritiesbetweentwogeneswithknownandunknownfunctionalertbiologiststosomepossibilitiesComputingasimilarityscorebetweentwogenestellshowlikelyitisthattheyhavesimilarfunctionsDynamicprogrammingisatechniqueforrevealingsimilaritiesbetweengenesMotivatingDynamicProgrammingDynamicprogrammingexample:
ManhattanTouristProblemImagineseekingapath(fromsourcetosink)totravel(onlyeastwardandsouthward)withthemostnumberofattractions(*)intheManhattangridSink***********Source*Imagineseekingapath(fromsourcetosink)totravel(onlyeastwardandsouthward)withthemostnumberofattractions(*)intheManhattangridSink***********Source*Dynamicprogrammingexample:
ManhattanTouristProblemManhattanTouristProblem:FormulationGoal:Findthelongestpathinaweightedgrid.Input:AweightedgridGwithtwodistinctvertices,onelabeled“source”andtheotherlabeled“sink”Output:AlongestpathinG
from“source”to“sink”MTP:AnExample32407333013244564655822501230123jcoordinateicoordinate13sourcesink432401024331122241995152302034MTP:GreedyAlgorithmIsNotOptimal1252152340005303501035512promisingstart,butleadstobadchoices!sourcesink1822MTP:SimpleRecursiveProgramMT(n,m)
if
n=0orm=0
return
MT(n,m)x
MT(n-1,m)+
lengthoftheedgefrom(n-1,m)to(n,m)y
MT(n,m-1)+lengthoftheedgefrom(n,m-1)to(n,m)
return
max{x,y}What’swrongwiththisapproach?Here’swhat’swrongM(n,m)needsM(n,m-1)andM(n-1,m)BothoftheseneedM(n-1,m-1)SoM(n-1,m-1)willbecomputedatleasttwiceDynamicprogramming:thesameideaasthisrecursivealgorithm,butkeepallintermediateresultsinatableandreuse150101isource15S1,0
=5S0,1
=1CalculateoptimalpathscoreforeachvertexinthegraphEachvertex’sscoreisthemaximumofthepriorverticesscoreplustheweightoftherespectiveedgeinbetweenMTP:DynamicProgrammingjMTP:DynamicProgramming(cont’d)1253012012source13584S2,0=8iS1,1
=4S0,2=33-5jMTP:DynamicProgramming(cont’d)125301230123isource1358840581035-59131-5S3,0=8S2,1=9S1,2=13S3,0=8jMTP:DynamicProgramming(cont’d)greedyalg.fails!125-51-5-53053035010-3-501230123isource13858849138912S3,1=9S2,2=12S1,3=8jMTP:DynamicProgramming(cont’d)125-51-5-5330053035010-3-5-5201230123isource13858849138129159jS3,2=9S2,3=15MTP:DynamicProgramming(cont’d)125-51-5-5330053035010-3-5-5201230123isource13858849138129159j0116S3,3=16(showingallback-traces)Done!MTP:RecurrenceComputingthescoreforapoint(i,j)bytherecurrencerelation:si,j=maxsi-1,j+weightoftheedgebetween(i-1,j)and(i,j)si,j-1+weightoftheedgebetween(i,j-1)and(i,j)Therunningtimeisnxmforanbym
grid(n=#ofrows,m=#ofcolumns)
ManhattanIsNotAPerfectGridWhataboutdiagonals?ThescoreatpointBisgivenby:sB=maxofsA1+weightoftheedge(A1,B)sA2+weightoftheedge(A2,B)sA3+weightoftheedge(A3,B)BA3A1A2ManhattanIsNotAPerfectGrid(cont’d)Computingthescoreforpointxisgivenbytherecurrencerelation:sx=maxofsy+weightofvertex(y,x)where
yєPredecessors(x)
Predecessors(x)–setofverticesthathaveedgesleadingtox
TherunningtimeforagraphG(V,E)(VisthesetofallverticesandE
isthesetofalledges)isO(E)sinceeachedgeisevaluatedonceTravelingintheGridBythetimethevertexxisanalyzed,thevaluessy
forallitspredecessorsyshouldbecomputed–otherwiseweareintrouble.WeneedtotraversetheverticesinsomeorderForagrid,cantraverseverticesrowbyrow,columnbycolumn,ordiagonalbydiagonalTraversingtheManhattanGrid3differentstrategies:a)Columnbycolumnb)Rowbyrowc)Alongdiagonalsa)b)c)TraversingaDAGAnumberingofverticesofthegraphiscalledtopologicalorderingoftheDAGifeveryedgeoftheDAGconnectsavertexwithasmallerlabeltoavertexwithalargerlabelHowtoobtainatopologicalordering?AlignmentAligningDNASequencesV=ATCTGATGW=TGCATACn=8m=7ATCTGATGTGCATACV
Wmatchdeletioninsertionmismatchindels4122matches
mismatches
insertions
deletions
Alignment:2xkmatrix(k
m,n)LongestCommonSubsequence(LCS)–AlignmentwithoutMismatches
Giventwosequencesv=v1
v2…vm
andw=w1
w2…wn
TheLCSofv
andwisasequenceofpositionsin
v:1<i1<i2<…<it
<mandasequenceofpositionsin
w:1<j1<j2<…<jt
<nsuchthatit-thletterofvequalstojt-letterofw
andtismaximalLCS:ExampleAT--CTGATC--TGCT--A--Celementsofvelementsofw--A12012233435455666778jcoords:icoords:Matchesshownin
redpositionsinv:positionsinw:2<3<4<6<81<3<5<6<7Everycommonsubsequenceisapathin2-Dgrid00(0,0)
(1,0)(2,1)(2,2)(3,3)(3,4)(4,5)(5,5)(6,6)(7,6)(8,7)ComputingLCSLetvi=prefixofvoflengthi:v1…viandwj
=prefixofwoflengthj:w1…wjThelengthofLCS(vi,wj)iscomputedby:si,j=maxsi-1,jsi,j-1si-1,j-1+1ifvi=wj
LCSProblemasManhattanTouristProblemTGCATAC12345670iATCTGATC012345678jEditGraphforLCSProblemTGCATAC12345670iATCTGATC012345678jEditGraphforLCSProblemTGCATAC12345670iATCTGATC012345678jEverypathisacommonsubsequence.EverydiagonaledgeaddsanextraelementtocommonsubsequenceLCSProblem:FindapathwithmaximumnumberofdiagonaledgesBacktrackingsi,jallowsustocomputethelengthofLCSforviandwjsm,ngivesusthelengthoftheLCSforvandwHowdoweprinttheactualLCS?Ateachstep,wechoseanoptimaldecisionsi,j=max(…)RecordwhichofthechoiceswasmadeinordertoobtainthismaxComputingLCSLetvi=prefixofvoflengthi:v1…viandwj
=prefixofwoflengthj:w1…wjThelengthofLCS(vi,wj)iscomputedby:si,j=maxsi-1,jsi,j-1si-1,j-1+1ifvi=wj
PrintingLCS:BacktrackingPrintLCS(b,v,i,j)
if
i=0orj=0
return
if
bi,j=““
PrintLCS(b,v,i-1,j-1)
vi
else
if
bi,j=““
PrintLCS(b,v,i-1,j)
else
PrintLCS(b,v,i,j-1)FromLCStoAlignmentTheLongestCommonSubsequenceproblem—thesimplestformofsequencealignment–allowsonlyinsertionsanddeletions(nomismatches).IntheLCSProblem,wescored1formatchesand0forindelsConsiderpenalizingindelsandmismatcheswithnegativescoresSimplestscoringscheme:
+1:matchpremium
-μ:mismatchpenalty
-σ:indelpenaltySimpleScoringWhenmismatchesarepenalizedby–μ,indelsarepenalizedby–σ,andmatchesarerewardedwith+1,theresultingscoreis:
#matches–μ(#mismatches)–σ(#indels)TheGlobalAlignmentProblemFindthebestalignmentbetweentwostringsunderagivenscoringschemaInput:StringsvandwandascoringschemaOutput:Alignmentofmaximumscore
→=-σ
=1ifmatch=-µifmismatch
si-1,j-1+1ifvi=wjsi,j
=maxsi-1,j-1-µifvi≠wj
si-1,j-σ
si,j-1-σ
{m:mismatchpenaltyσ:indelpenaltyScoringMatricesTogeneralizescoring,considera(4+1)x(4+1)scoringmatrix
δ.Inthecaseofanaminoacidsequencealignment,thescoringmatrixwouldbea(20+1)x(20+1)size.Theadditionof1istoincludethescoreforcomparisonofagapcharacter“-”.Thiswillsimplifythealgorithmasfollows:
si-1,j-1+δ
(vi,wj)si,j=maxsi-1,j+δ
(vi,-)
si,j-1+δ
(-,wj){MakingaScoringMatrixScoringmatricesarecreatedbasedonbiologicalevidence.Alignmentscanbethoughtofastwosequencesthatdifferduetomutations.Someofthesemutationshavelittleeffectontheprotein’sfunction,thereforesomepenalties,δ(vi,wj),willbelessharshthanothers.ScoringMatrix:ExampleAKRANR-1+(-1)+(-2)+5+7+3=11ARNKA5-2-1-1R-7-13N--70K---6NoticethatalthoughRandKaredifferentaminoacids,theyhaveapositivescore.Why?Theyarebothpositivelychargedaminoacidswillnotgreatlychangefunctionofprotein.ConservationAminoacidchangesthattendtopreservethephysico-chemicalpropertiesoftheoriginalresiduePolartopolaraspartateglutamateNonpolartononpolaralaninevalineSimilarlybehavingresiduesleucinetoisoleucineLocalvs.GlobalAlignmentTheGlobalAlignmentProblemtriestofindthelongestpathbetweenvertices(0,0)and(n,m)intheeditgraph.TheLocalAlignmentProblemtriestofindthelongestpathamongpathsbetweenarbitraryvertices(i,j)and(i’,j’)intheeditgraph.Intheeditgraphwithnegatively-scorededges,LocalAlignmentmayscorehigherthanGlobalAlignmentLocalAlignment:ExampleGlobalalignmentLocalalignmentComputea“mini”GlobalAlignmenttogetLocalAlignmentLocalAlignments:Why?Twogenesindifferentspeciesmaybesimilarovershortcon
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 2026年财务智能伦理道德与数据隐私保护
- 2026年医保定点医药机构管理总结
- 2026年银行业金融机构营业场所安全防范
- 2026年消防应急包配置与使用培训
- 2026年养老院老年人营养配餐培训
- 练习7 《分析小说的情节》同步练习 (含答案解析)2027年高考一轮总复习
- 2026年医用射线装置辐射安全许可证办理
- 2026年ISO9001质量管理体系内审员培训
- 摄影摄像行业合作协议2026
- 2026年生态养殖合作社运营与管理
- 中国抗癌协会脑胶质瘤整合诊疗指南2025版
- 智慧港口等级评价指南集装箱码头(T-CPHA9-2022)
- 2025年肿瘤随访登记培训试题有答案
- 前置胎盘伴出血护理个案
- 高空坠物安全知识培训
- 2025年自然资源局公务员面试技巧与模拟题详解
- 医学人工智能导论
- 2025年银行考试-中信银行运营管理资质认证考试历年参考题库含答案解析(5套典型考题)
- 2025年贵州省中考理科综合(物理化学)试卷真题(含答案详解)
- 药品新品上市管理制度
- DB4403T 508-2024《生产经营单位锂离子电池存储使用安全规范》
评论
0/150
提交评论