




已阅读5页,还剩50页未读, 继续免费阅读
版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
1 /55,回顾,Dynamic Programming Edit Distance(编辑距离) Alignment(比对) Directed Acyclic Graph Edit Graph Backtracking,- T G C A T - A - C,A T - C - T G A T C,2 /55,习题,4,求两条序列的最长共同子序列。【作业】,v = TACGGGTAT,w = GGACGTACG,3 /55,G,G,A,C,G,T,A,C,G,T,A,C,G,G,G,T,A,T,4,Sequence Alignment,5 /55,Outline,Global Alignment Scoring Matrices Local Alignment Alignment with Affine Gap Penalties,6 /55,From LCS to Alignment: Change up the Scoring,The Longest Common Subsequence (LCS) problemthe simplest form of sequence alignment allows only insertions and deletions (no mismatches). In the LCS Problem, we scored 1 for matches and 0 for indels Consider penalizing indels and mismatches with negative scores Simplest scoring schema: +1 : match premium - : mismatch penalty - : indel penalty,- T G C A T - A - C,A T - C - T G A T C,7 /55,Simple Scoring,When mismatches are penalized by , indels are penalized by , and matches are rewarded with +1, the resulting score is: #matches (#mismatches) (#indels),8 /55,The Global Alignment Problem,Find the best alignment between two strings under a given scoring schema Input : Strings v and w and a scoring schema Output : Alignment of maximum score,m : mismatch penalty : indel penalty,9 /55,Scoring Matrices,To generalize scoring, consider a (4+1) x(4+1) scoring matrix . In the case of an amino acid sequence alignment, the scoring matrix would be a (20+1)x(20+1) size. The addition of 1 is to include the score for comparison of a gap character “-”. This will simplify the algorithm as follows:,10 /55,The Blosum62 Scoring Matrix,11 /55,Measuring Similarity,Measuring the extent of similarity between two sequences Based on percent sequence identity Based on conservation,12 /55,Percent Sequence Identity,The extent to which two nucleotide or amino acid sequences are invariant,A C C T G A G A G A C G T G G C A G,70% identical,mismatch,indel,13 /55,Making a Scoring Matrix,Scoring matrices are created based on biological evidence. Alignments can be thought of as two sequences that differ due to mutations. Some of these mutations have little effect on the proteins function, therefore some penalties, (vi , wj), will be less harsh than others.,14 /55,Scoring Matrix: Example,Notice that although R and K are different amino acids, they have a positive score. Why? They are both positively charged amino acids will not greatly change function of protein.,15 /55,Conservation,Amino acid changes that tend to preserve the physico-chemical properties of the original residue Polar to polar aspartate glutamate Nonpolar to nonpolar alanine valine Similarly behaving residues leucine to isoleucine,16 /55,Scoring matrices,Amino acid substitution matrices PAM BLOSUM DNA substitution matrices DNA is less conserved than protein sequences Less effective to compare coding regions at nucleotide level,17 /55,PAM,Point Accepted Mutation (Dayhoff et al.) 1 PAM = PAM1 = 1% average change of all amino acid positions After 100 PAMs of evolution, not every residue will have changed some residues may have mutated several times some residues may have returned to their original state some residues may not changed at all,18 /55,PAMX,PAMx = PAM1x PAM250 = PAM1250 PAM250 is a widely used scoring matrix:,Ala Arg Asn Asp Cys Gln Glu Gly His Ile Leu Lys . A R N D C Q E G H I L K . Ala A 13 6 9 9 5 8 9 12 6 8 6 7 . Arg R 3 17 4 3 2 5 3 2 6 3 2 9 Asn N 4 4 6 7 2 5 6 4 6 3 2 5 Asp D 5 4 8 11 1 7 10 5 6 3 2 5 Cys C 2 1 1 1 52 1 1 2 2 2 1 1 Gln Q 3 5 5 6 1 10 7 3 7 2 3 5 . Trp W 0 2 0 0 0 0 0 0 1 0 1 0 Tyr Y 1 1 2 1 3 1 1 1 3 2 2 1 Val V 7 4 4 4 4 4 4 4 5 4 15 10,19 /55,BLOSUM,Blocks Substitution Matrix Scores derived from observations of the frequencies of substitutions in blocks of local alignments in related proteins Matrix name indicates evolutionary distance BLOSUM62 was created using sequences sharing no more than 62% identity,20 /55,The Blosum62 Scoring Matrix,BLOSUM 62,21 /55,相似度越低的序列,在比对的时候,采用PAM矩阵时,后面的数字越大,采用BLOSUM矩阵时,后面的数字越小。,22 /55,Local vs. Global Alignment,The Global Alignment Problem tries to find the longest path between vertices (0,0) and (n,m) in the edit graph. The Local Alignment Problem tries to find the longest path among paths between arbitrary vertices (i,j) and (i, j) in the edit graph.,23 /55,Local vs. Global Alignment,The Global Alignment Problem tries to find the longest path between vertices (0,0) and (n,m) in the edit graph. The Local Alignment Problem tries to find the longest path among paths between arbitrary vertices (i,j) and (i, j) in the edit graph. In the edit graph with negatively-scored edges, Local Alignmet may score higher than Global Alignment,24 /55,Local vs. Global Alignment (contd),Global Alignment Local Alignmentbetter alignment to find conserved segment,-T-CC-C-AGT-TATGT-CAGGGGACACGA-GCATGCAGA-GAC,| | | | | | | | | | | | | | |,AATTGCCGCC-GTCGT-T-TTCAG-CA-GTTATGT-CAGAT-C,tccCAGTTATGTCAGgggacacgagcatgcagagac,|,aattgccgccgtcgttttcagCAGTTATGTCAGatc,25 /55,Local Alignment: Example,Global alignment,Local alignment,26 /55,Local Alignments: Why?,Two genes in different species may be similar over short conserved regions and dissimilar over remaining regions. Example: Homeobox genes have a short region called the homeodomain that is highly conserved between species. A global alignment would not find the homeodomain because it would try to align the ENTIRE sequence,27 /55,The Local Alignment Problem,Goal: Find the best local alignment between two strings Input : Strings v, w and scoring matrix Output : Alignment of substrings of v and w whose alignment score is maximum among all possible alignment of all possible substrings,28 /55,The Problem with this Problem,Long run time O(n4): - In the grid of size n x n there are n2 vertices (i,j) that may serve as a source. - For each such vertex computing alignments from (i,j) to (i,j) takes O(n2) time. This can be remedied by giving free rides,29 /55,Local Alignment: Example,Global alignment,Local alignment,30 /55,Local Alignment: Example,31 /55,Local Alignment: Example,32 /55,Local Alignment: Example,33 /55,Local Alignment: Example,34 /55,Local Alignment: Example,35 /55,Local Alignment: Running Time,Long run time O(n4): - In the grid of size n x n there are n2 vertices (i,j) that may serve as a source. - For each such vertex computing alignments from (i,j) to (i,j) takes O(n2) time. This can be remedied by giving free rides,36 /55,Local Alignment: Free Rides,Vertex (0,0),The dashed edges represent the free rides from (0,0) to every other node.,Yeah, a free ride!,37 /55,The Local Alignment Recurrence,The largest value of si,j over the whole edit graph is the score of the best local alignment. The recurrence:,0 si,j = max si-1,j-1 + (vi, wj) s i-1,j + (vi, -) s i,j-1 + (-, wj),38 /55,The Local Alignment Recurrence,The largest value of si,j over the whole edit graph is the score of the best local alignment. The recurrence:,0 si,j = max si-1,j-1 + (vi, wj) s i-1,j + (vi, -) s i,j-1 + (-, wj),39 /55,/Blast.cgi NP_006735 NP_000945,40 /55,41 /55,习题,考虑序列v=TACGGGTAT和w= GGACGTACG。假设匹配奖励+1,错配和插缺罚分均为-1. 【作业】 填写序列v和w之间的全局联配的动态规划表(编辑图或相似度矩阵)。在各单元画出箭头以存储返回信息。全局最优联配的得分是多少?这个得分对应的联配又是什么? 填写序列v和w之间的局部联配的动态规划表。在各单元画出箭头以存储返回信息。在这种情形下,局部最优联配的得分是多少?这个得分对应的联配又是什么?,42 /55,G,G,A,C,G,T,A,C,G,T,A,C,G,G,G,T,A,T,全局比对,43 /55,局部比对,G,G,A,C,G,T,A,C,G,T,A,C,G,G,G,T,A,T,44 /55,Scoring Indels: Naive Approach,A fixed penalty is given to every indel: - for 1 indel, -2 for 2 consecutive indels -3 for 3 consecutive indels, etc. Can be too severe penalty for a series of 100 consecutive indels,45 /55,Affine Gap Penalties,In nature, a series of k indels often come as a single event rather than a series of k single nucleotide events:,ATA_GC ATATTGC,ATAG_GC AT_GTGC,Normal scoring would give the same score for both alignments,46 /55,Accounting for Gaps,Gaps- contiguous sequence of spaces in one of the rows Score for a gap of length x is: -( + x) where 0 is the penalty for introducing a gap: gap opening penalty will be large relative to : gap extension penalty because you do not want to add too much of a penalty for extending the gap.,47 /55,Affine Gap Penalties,Gap penalties: - when there is 1 indel -2 when there are 2 indels -3 when there are 3 indels, etc. - x (-gap opening - x gap extensions) Somehow reduced penalties (as compared to nave scoring) are given to runs of horizontal and vertical edges,48 /55,Affine Gap Penalties and Edit Graph,To reflect affine gap penalties we have to add “long” horizontal and vertical edges to the edit graph. Each such edge of length x should have weight - - x *,49 /55,Adding “Affine Penalty” Edges to the Edit Graph,There are many such edges! Adding them to the graph increases the running time of the alignment algorithm by a factor of n (where n is the number of vertices) So the complexity increases from O(n2) to O(n3),50 /55,Manhattan in 3 Layers,51 /55,Affine Gap Penalties and 3 Layer Manhattan Grid,The three recurrences for the scoring algorithm creates a 3-layered graph. The top level creates/extends gaps in the sequence w. The bottom level creates/extends gaps in sequence v. The midd
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 衡水金卷四省(四川云南)高三联考9月联考历史(含答案)
- 2025租赁合同终止协议书范文
- 企业安全培训账号密码课件
- 氢气制备与储存优化-洞察及研究
- 出入口保安培训课件
- 2025电视剧版权购买合同范本
- 2025合同范本合同协议书模板管理规程
- 2025年版融法合同违约诉状范本
- 2025管理技能合同风险评估与控制方法
- 2025《上海市机动车驾驶培训服务合同(示范文本)》
- 2023年中元节烧包袱禁忌 中元节烧包袱是单数还是双数(3篇)
- 幼儿文学课件完整版
- DB6101T3128-2022养老服务规范 助餐服务
- 临时用地复垦与方案
- 语言学纲要课件
- 地下室开槽引流方案
- 电子课件-《市场营销》-A45-2298完整版教学课件全书电子讲义(最新)
- 新苏教版科学六年级上册教学计划含进度表
- 2021年新苏教版科学六年级上册知识点整理
- 美的观念(玛丽艳)
- 农药学原理课件--作用机制研究的思路和方法
评论
0/150
提交评论