chapter 8 dynamic programming - bioinformatics division ：8章动态规划-生物信息学部门

上传人：咸*** IP属地：江苏上传时间：2018-01-09 格式：PPT 页数：60 大小：7.95MB 积分：20 举报 版权申诉

chapter 8 dynamic programming - bioinformatics division ：8章动态规划-生物信息学部门_第2页

chapter 8 dynamic programming - bioinformatics division ：8章动态规划-生物信息学部门_第3页

chapter 8 dynamic programming - bioinformatics division ：8章动态规划-生物信息学部门_第4页

chapter 8 dynamic programming - bioinformatics division ：8章动态规划-生物信息学部门_第5页

已阅读5页，还剩55页未读，继续免费阅读

版权说明：本文档由用户提供并上传，收益归属内容提供方，若内容存在侵权，请进行举报或认领

文档简介

Chapter 8,Dynamic Programming,Copyright 2007 Pearson Addison-Wesley. All rights reserved.,Dynamic Programming,Dynamic Programming is a general algorithm design technique for solving problems defined by or formulated as recurrences with overlapping subinstances Invented by American mathematician Richard Bellman in the 1950s to solve optimization problems and later assimilated by CS “Programming” here means “planning” Main idea:set up a recurrence relating a solution to a larger instance to solutions of some smaller instances- solve smaller instances oncerecord solutions in a table extract solution to the initial instance from that table,Example: Fibonacci numbers,Recall definition of Fibonacci numbers:F(n) = F(n-1) + F(n-2)F(0) = 0F(1) = 1 Computing the nth Fibonacci number recursively (top-down): F(n) F(n-1) + F(n-2)F(n-2) + F(n-3) F(n-3) + F(n-4) .,Example: Fibonacci numbers (cont.),Computing the nth Fibonacci number using bottom-up iteration and recording results: F(0) = 0 F(1) = 1 F(2) = 1+0 = 1 F(n-2) = F(n-1) = F(n) = F(n-1) + F(n-2) Efficiency: - time - space,n n,What if we solve it recursively?,Examples of DP algorithms,Computing a binomial coefficient Longest common subsequence Warshalls algorithm for transitive closure Floyds algorithm for all-pairs shortest paths Constructing an optimal binary search tree Some instances of difficult discrete optimization problems: - traveling salesman - knapsack,Computing a binomial coefficient by DP,Binomial coefficients are coefficients of the binomial formula:(a + b)n = C(n,0)anb0 + . . . + C(n,k)an-kbk + . . . + C(n,n)a0bn Recurrence: C(n,k) = C(n-1,k) + C(n-1,k-1) for n k 0 C(n,0) = 1, C(n,n) = 1 for n 0 Value of C(n,k) can be computed by filling a table: 0 1 2 . . . k-1 k 0 1 1 1 1 . . . n-1 C(n-1,k-1) C(n-1,k) n C(n,k),Computing C(n,k): pseudocode and analysis,Time efficiency: (nk)Space efficiency: (nk),Knapsack Problem by DP,Given n items of integer weights: w1 w2 wn values: v1 v2 vn a knapsack of integer capacity W find most valuable subset of the items that fit into the knapsackConsider instance defined by first i items and capacity j (j W).Let Vi,j be optimal value of such an instance. Then max Vi-1,j, vi + Vi-1,j- wi if j- wi 0Vi,j = Vi-1,j if j- wi Vi-1,j thenVi,j := vi + Vi-1,j-wi; Pi,j := j-wielse Vi,j := Vi-1,j; Pi,j := jreturn Vn,W and the optimal subset by backtracing,Running time and space: O(nW).,Longest Common Subsequence (LCS),A subsequence of a sequence/string S is obtained by deleting zero or more symbols from S. For example, the following are some subsequences of “president”: pred, sdn, predent. In other words, the letters of a subsequence of S appear in order in S, but they are not required to be consecutive.The longest common subsequence problem is to find a maximum length common subsequence between two sequences.,LCS,For instance, Sequence 1: president Sequence 2: providence Its LCS is priden.,presidentprovidence,LCS,Another example: Sequence 1: algorithm Sequence 2: alignmentOne of its LCS is algm.,a l g o r i t h ma l i g n m e n t,How to compute LCS?,Let A=a1a2am and B=b1b2bn .len(i, j): the length of an LCS between a1a2ai and b1b2bjWith proper initializations, len(i, j) can be computed as follows.,Running time and memory: O(mn) and O(mn).,The backtracing algorithm,The backtracing algorithm,C-TTAACTCGGATCA-T,Pairwise Alignment,Sequence A: CTTAACTSequence B: CGGATCATAn alignment of A and B:,Sequence A,Sequence B,Pairwise Alignment,Sequence A: CTTAACTSequence B: CGGATCATAn alignment of A and B:,Alignment (or Edit) Graph,Sequence A: CTTAACTSequence B: CGGATCAT,C G G A T C A T,CTTAACT,C-TTAACTCGGATCA-T,A simple scoring scheme,Match: +8 (w(x, y) = 8, if x = y)Mismatch: -5 (w(x, y) = -5, if x y)Each gap symbol: -3 (w(-,x)=w(x,-)=-3),C - - - T T A A C TC G G A T C A - - T,+8 -3 -3 -3 +8 -5 +8 -3 -3 +8 = +12,alignment score,(i.e. space),Scoring Matrices,Amino acid substitution matricesPAMBLOSUMDNA substitution matricesDNA is less conserved than protein sequencesLess effective to compare coding regions at nucleotide level,PAM,Point Accepted Mutation (Dayhoff, et al.)1 PAM = PAM1 = 1% average change of all amino acid positionsAfter 100 PAMs of evolution, not every residue will have changedsome residues may have mutated several timessome residues may have returned to their original statesome residues may not have changed at all,PAMX,PAMx = PAM1x E.g. PAM250 = PAM1250PAM250 is a widely used scoring matrix.,Ala Arg Asn Asp Cys Gln Glu Gly His Ile Leu Lys . A R N D C Q E G H I L K .Ala A 13 6 9 9 5 8 9 12 6 8 6 7 .Arg R 3 17 4 3 2 5 3 2 6 3 2 9Asn N 4 4 6 7 2 5 6 4 6 3 2 5Asp D 5 4 8 11 1 7 10 5 6 3 2 5Cys C 2 1 1 1 52 1 1 2 2 2 1 1Gln Q 3 5 5 6 1 10 7 3 7 2 3 5.Trp W 0 2 0 0 0 0 0 0 1 0 1 0Tyr Y 1 1 2 1 3 1 1 1 3 2 2 1Val V 7 4 4 4 4 4 4 4 5 4 15 10,BLOSUM,Blocks Substitution Matrix Scores derived from observations of the frequencies of substitutions in blocks of local alignments in related proteinsMatrix name indicates evolutionary distanceBLOSUM62 was created using sequences sharing no more than 62% identity,The Blosum50 Scoring Matrix,An optimal alignment- an alignment of maximum score,Let A=a1a2am and B=b1b2bn .Si,j: the score of an optimal alignment between a1a2ai and b1b2bjWith proper initializations, Si,j can be computedas follows.,Computing Si,j,Initialization,C G G A T C A T,CTTAACT,S3,5 = ？,C G G A T C A T,CTTAACT,S3,5 = ？,C G G A T C A T,CTTAACT,optimal score,C T T A A C TC G G A T C A T,C G G A T C A T,CTTAACT,8 5 5 +8 -5 +8 -3 +8 = 14,Global Alignment vs. Local Alignment,global alignment:local alignment:,An optimal local alignment,Si,j: the score of an optimal local alignment ending at ai and bjWith proper initializations, Si,j can be computedas follows.,local alignment,C G G A T C A T,CTTAACT,Match: 8Mismatch: -5Gap symbol: -3,local alignment,C G G A T C A T,CTTAACT,Match: 8Mismatch: -5Gap symbol: -3,the best score,C G G A T C A T,CTTAACT,the best score,A C - TA T C A T8-3+8-3+8 = 18,(not always at the corner),Affine gap penalties,Match: +8 (w(x, y) = 8, if x = y)Mismatch: -5 (w(x, y) = -5, if x y)Each gap symbol: -3 (w(-,x)=w(x,-)=-3)E.g. each gap is charged an extra gap-open penalty: -4.In general, a gap of length k should have penalty g(k),C - - - T T A A C TC G G A T C A - - T,+8 -3 -3 -3 +8 -5 +8 -3 -3 +8 = +12,-4,-4,alignment score: 12 4 4 = 4,Affine gap penalties,A gap of length k is penalized x + ky.,Three cases for alignment endings:.x.x.x.-.-.x,an aligned pair,a deletion,an insertion,Affine gap penalties,Let D(i, j) denote the maximum score of any alignment between a1a2ai and b1b2bj ending with a deletion.Let I(i, j) denote the maximum score of any alignment between a1a2ai and b1b2bj ending with an insertion.Let S(i, j) denote the maximum score of any alignment between a1a2ai and b1b2bj.,Affine gap penalties,Affine gap penalties (Gotohs algorithm),S,I,D,-y,-x-y,-x-y,-y,w(ai,bj),Warshalls Algorithm: Transitive Closure,Computes the transitive closure of a relation Alternatively: existence of all nontrivial paths in a digraph Example of transitive closure:,0 0 1 01 0 0 10 0 0 00 1 0 0,0 0 1 01 1 1 10 0 0 01 1 1 1,Warshalls Algorithm,Constructs transitive closure T as the last matrix in the sequence of n-by-n matrices R(0), , R(k), , R(n) whereR(k)i,j = 1 iff there is nontrivial path from i to j with only the first k vertices allowed as intermediate Note that R(0) = A (adjacency matrix), R(n) = T (transitive closure),R(0)0 0 1 01 0 0 10 0 0 00 1 0 0,R(1)0 0 1 01 0 1 10 0 0 00 1 0 0,R(2)0 0 1 01 0 1 10 0 0 01 1 1 1,R(3)0 0 1 01 0 1 10 0 0 01 1 1 1,R(4)0 0 1 01 1 1 10 0 0 01 1 1 1,Warshalls Algorithm (recurrence),On the k-th iteration, the algorithm determines for every pair of vertices i, j if a path exists from i and j with just vertices 1,k allowed as intermediateR(k-1)i,j (path using just 1 ,k-1) R(k)i,j = or R(k-1)i,k and R(k-1)k,j (path from i to k and from k to j using just 1 ,k-1),i,j,k,Initial condition?,Warshalls Algorithm (matrix generation),Recurrence relating elements R(k) to elements of R(k-1) is: R(k)i,j = R(k-1)i,j or (R(k-1)i,k and R(k-1)k,j),It implies the following rules for generating R(k) from R(k-1):Rule 1 If an element in row i and column j is 1 in R(k-1), it remains 1 in R(k)Rule 2 If an element in row i and column j is 0 in R(k-1), it has to be changed to 1 in R(k) if and only if the element in its row i and column k and the element in its column j and row k are both 1s in R(k-1),Warshalls Algorithm (example),0 0 1 01 0 0 10 0 0 00 1 0 0,R(0) =,0 0 1 01 0 1 10 0 0 00 1 0 0,R(1) =,0 0 1 01 0 1 10 0 0 01 1 1 1,R(2) =,0 0 1 01 0 1 10 0 0 01 1 1 1,R(3) =,0 0 1 01 1 1 10 0 0 01 1 1 1,R(4) =,Warshalls Algorithm (pseudocode and analysis),Time efficiency: (n3)Space efficiency: Matrices can be written over their predecessors (with some care), so its (n2).,Floyds Algorithm: All pairs shortest paths,Problem: In a weighted (di)graph, find shortest paths between every pair of verticesSame idea: construct solution through series of matrices D(0), , D (n) using increasing subsets of the vertices allowed as intermediateExample:,0 4 1 0 4 3 0 6 5 1 0,Floyds Algorithm (matrix generation),On the k-th iteration, the algorithm determines shortest paths between every pair of vertices i, j that use only vertices among 1,k as intermediate D(k)i,j = min D(k-1)i,j, D(k-1)i,k + D(k-1)k,j,i,j,k,D(k-1)i,j,D(k-1)i,k,D(k-1)k,j,Initial condition?,Floyds Algorithm (example),0 3 2 0 7 0 16 0,D(0) =,0 3 2 0 5 7 0 16 9 0,D(1) =,0 3 2 0 5 9 7 0 16 9 0,D(2) =,0 10 3 42 0 5 69 7 0 16 16 9 0,D(3) =,0 10 3 42 0 5 67 7 0 16 16 9 0,D(4) =,Floyds Algorithm (pseudocode and analysis),Time efficiency: (n3)Space efficiency: Matrices can be written over their predecessorsNote: Works on graphs with negative edges but without negative cycles. Shortest paths themselves can be found, too. How?,If Di,k + Dk,j Di,j then Pi,j k,Since the superscripts k or k-1 make no difference to Di,k and Dk,j.,Optimal Binary Search Trees,Problem: Given n keys a1 an and probabilities p1, , pn searching for them, find a BST with a minimum average number of comparisons in successful search.Since total number of BSTs with n nodes is given by C(2n,n)/(n+1), which grows exponentially, brute force is hopeless.,Example: What is an optimal BST for keys A, B, C, and D with search probabilities 0.1, 0.2, 0.4, and 0.3, respectively?,Average # of compariso

人人文库> 全部分类> 应用文书 > 项目管理

温馨提示

1. 本站所有资源如无特殊说明，都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
2. 本站的文档不包含任何第三方提供的附件图纸等，如果需要附件，请联系上传者。文件的所有权益归上传用户所有。
3. 本站RAR压缩包中若带图纸，网页内容里面会有图纸预览，若没有图纸预览就没有图纸。
4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
5. 人人文库网仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对用户上传分享的文档内容本身不做任何修改或编辑，并不能对任何下载内容负责。
6. 下载文件中如有侵权或不适当内容，请与我们联系，我们立即纠正。
7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

chapter 8 dynamic programming - bioinformatics division ：8章动态规划-生物信息学部门

文档简介

温馨提示

最新文档

评论

chapter 8 dynamic programming - bioinformatics division ：8章动态规划-生物信息学部门

文档简介

温馨提示

最新文档

评论

相关文档