资料结构与演算法ppt课件_第1页
资料结构与演算法ppt课件_第2页
资料结构与演算法ppt课件_第3页
资料结构与演算法ppt课件_第4页
资料结构与演算法ppt课件_第5页
已阅读5页,还剩93页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

1、台大資工系 呂學一.tw/hil/algo/1;2字尾樹、後綴樹an extremely powerful data structure for string algorithms;3Input: P and S.Output: All occurrences of P in S.Time: O(|P| + |S|)Technique: Z values of PS.Z(i + |P|) |P| iff P = Sii + |P| 1. PS;4Preprocessing PGusfieldBoyer-MooreKnuth-Morris-PrattPreprocessi

2、ng SSuffix tree;5;6S = b b a b b a a bS18= b b a b b a a bS28= b a b b a a bS38= a b b a a bS48= b b a a bS58= b a a bS68= a a bS78= a bS88= b 1st suffix2nd suffix3rd suffix4th suffix5th suffix6th suffix7th suffix8th suffix;7S = b b a b b a a bS18= b b a b b a a bS28= b a b b a a bS38= a b b a a bS4

3、8= b b a a bS58= b a a bS68= a a bS78= a bS88= b 1st suffix2nd suffix3rd suffix4th suffix5th suffix6th suffix7th suffix8th suffix;8b b a b b a a b b a b b a a b a b b a a b b b a a b b a a b a a b a b b ;9The following statements are equivalent.P occurrs in S.P is a prefix of a suffix of S.P corresp

4、onds to a path of T starting from the root of T.;10b b a b b a a b b a b b a a b a b b a a b b b a a b b a a b a a b a b b P occurs in S!;11b b a b b a a b b a b b a a b a b b a a b b b a a b b a a b a a b a b b P doesnt occur in S!;12b b a b b a a b b a b b a a b a b b a a b b b a a b b a a b a a b

5、 a b b P doesnt occur in S!;13;14844441111155522222676333731 2 3 4 5 6 7 8b b a b b a a b b a b b a a b a b b a a b b b a a b b a a b a a b a b b Output: 3;15Question: If we are given a suffix trie of S, what is the time complexity for finding an occurrence of P in S?Answer:O(|P|) time.;16Q: Time co

6、mplexity for constructing the suffix trie T of S?Q(|S|)Q(|S| log |S|)Q(|S|2)Q(|S|3);17844441111155522222676333731 2 3 4 5 6 7 8b b a b b a a b b a b b a a b a b b a a b b b a a b b a a b a a b a b b ;18How to establish a lower bound?Answer: Showing an example which takes W(|S|2) time for any algorit

7、hm.;19;20Suffix trie is good in solving Substring Problem, but may require W(|S|2) time and space.Question: is there a compact representation of suffix trie that needs only O(|S|) time and space?;21A compact representation of suffix trie;22T has at most |S| leaves.Why?T has at most |S| 1 branching n

8、odes.Why?;23Keeping leaves and branching nodes pact representation of edge labels5,85,85,85,84,81,12,23,3;245,85,85,85,84,81,12,23,3;25;261,12,34,87,84,87,84,87,83,33,33,31,13,32,37,84,87,84,87,84,8;273,31,13,32,37,84,87,84,87,84,8;28The space complexity of suffix treeO(|S|)O(|S| log |S|)O(|S|2)O(|S

9、|3)Why?Number of nodes = O(|S|).Number of edges = O(|S|).Space required by each edge = O(1).;29Constructing Suffix Tree in Linear Time;30Weiner, IEEE FOCS 1973Linear time but expensive in space.D. E. Knuth: “the algorithm of 1973.McCreight, J. ACM 1976Linear time and quadratic space.Ukkonen, Algorit

10、hmica 1995Linear time and linear space.Much better readability.;31;32Let T(k) denote the suffix tree of S1k.Frameworkcompute T(1);for k = 2 to |S| do compute T(k) from T(k-1);;33S = b b a b b a a bS18= b b a b b a a bS28= b a b b a a bS38= a b b a a bS48= b b a a bS58= b a a bS68= a a bS78= a bS88=

11、b ;34b b a b b a a b b a b b a a b a b b a a b b b a a b b a a b a a b a b b ;35S =b b a b b a a b1,12,31,23,33,31,12,43,43,42,53,53,52,63,63,63,37,74,73,37,74,72,37,74,77,84,87,84,87,84,8Observation:The tree structure does not change very often, i.e., only O(|S|) times.;36At the beginning of the k-

12、th iteration, there are exactly k “growing points(生長點), all with different heights.The k-th iteration iteratively grows k edges with label Sk at those k growing points.;37b b a b b a a b b a b b a a b a b b a a b b b a a b b a a b a a b a b b ;38Three cases while growing trieCase 1: growing an edge

13、at a leaf.Case 2: growing a new branch of leaf.Case 3: does not change the tree structure.;391 2 3 4 5 6 7 8b b a b b a a b b a b b a a b a b b a a b b b a a b b a a b a a b a b b Case 1: 長此以往Case 2: 節外生枝Case 3: 假设無其事;40Those k steps in the k-th iteration have the following pattern:some (at least on

14、e) Case-1 steps, followed by some (could be zero) Case-2 steps, followed by some (could be zero) Case-3 steps.;411 2 3 4 5 6 7 8b b a b b a a b b a b b a a b a b b a a b b b a a b b a a b a a b a b b Case 1: 長此以往Case 2: 節外生枝Case 3: 假设無其事;42在同一個iteration之中,長此以往之前(下)的step一定也是長此以往。At the end of each it

15、eration, if a growing point is a leaf, then all previous (lower) growing points are also leaves. Why?;43b b a b b a a b b a b b a a b a b b a a b b b a a b b a a b a a b a b b ;44The i-th growing point = the end of the i-th suffix Sik of the current prefix S1k.Argument:the i-th (i 1) growing point i

16、s a leaf.Neither Sika nor Si.kb is a substring of S1k.Neither Si1ka nor Si1.kb is a substring of S1k.The (i 1)-st growing point is a leaf.;45b b a b b a a b b a b b a a b a b b a a b b b a a b b a a b a a b a b b ;46At the beginning of the current iteration, its corresponding growing point is a leaf

17、.;47b b a b b a a b b a b b a a b a b b a a b b b a a b b a a b a a b a b b ;48At the end of each iteration, if a growing point is a leaf, then all previous (lower) growing points are also leaves. Therefore, in the next iteration, if a step is a “長此以往, then all previous steps are also “長此以往.;49在同一個i

18、teration之中,假设無其事之後(上)的step一定也是假设無其事。At the end of each iteration, if a growing point is an internal node, then all latter (higher) growing points are also internal.Why?;50b b a b b a b a b b a a b b a b b a b a a ;51The i-th growing point = the end of the i-th suffix Sik of the current prefix S1k.Ar

19、gument:the i-th (i k) growing point is internal. Sika or Si.kb is a substring of S1k.Si+1ka or Si+1.kb is a substring of S1k.The (i+1)-st growing point is internal.;52b b a b b a b a b b a a b b a b b a b a a ;53At the end of the current iteration, its corresponding growing point is still an interna

20、l node.;54b b a b b a b a b b a a b b a b b a b a a ;55At the end of each iteration, if a growing point is an internal node, then all latter (higher) growing points are also internal.Therefore, in “this iteration, if a step is “假设無其事, then all its following steps are also “假设無其事.;56Those k steps in

21、the k-th iteration have the following pattern:some (at least one) Case-1 steps, followed by some (could be zero) Case-2 steps, followed by some (could be zero) Case-3 steps.This is about the status within the same iteration;57葉子的宿命 一日為葉子, 終身為葉子;58b b a b b a a b b a b b a a b a b b a a b b b a a b b

22、 a a b a a b a b b Case 1: 長此以往Case 2: 節外生枝Case 3: 假设無其事;59Only O(|S|) Case-2 steps throughout the execution.Why?;60n 葉子生長點之所對應的edge labeln 斷頭指標n ;61Case-1 Step;62Always occurs at a leaf (growing point).At the beginning of iteration k, the path above a leaf has label ?, k1.Each Case-1 step in iterat

23、ion k simply changes the label ?, k1 to ?, k.?, k1?, k;63Use ?, - to label the path above each leaf.Thus, no need to do anything for each case-1 step.?, -;64S =b b a b b a a b1,-2,-3,-3,-3,37,-4,-3,37,-4,-2,37,-4,-11,11 2111;65We can simply ignore all Case-1 steps.Recall that the number of Case-2 st

24、eps is at most |S|.Q: Is this good enough?Case 1: 長此以往Case 2: 節外生枝Case 3: 假设無其事1 2 3 4 5 6 7 8b b a b b a a b b a b b a a b a b b a a b b b a a b b a a b a a b a b b ;66Case-2 Steps: 節外生枝;67Always occurs at a growing point that is not a leaf, which is not necessarily an internal node.When the growin

25、g point is an internal node Label = k, -, where k is the current iteration index.k, -;68When the growing point is not an internal node k, -ti, ji+t, ji, i+t-1;69Case-3 Steps: 假设無其事;70Always occurs at a growing point that is not a leaf, which is not necessarily an internal node. When the new position

26、 of the growing point is an internal node ti, j0;71When the new position of the growing point is not an internal node ti, jt + 1;72Give a sequence S such that the number of Case-3 steps throughout the process of growing its suffix trie (or suffix tree) is W(|S|2).;73Completely ignore them以假设無其事的態度處理

27、假设無其事的狀況;74ti, j0ti, jt + 1;751.For correctly maintaining the position of each growing point. (Why?)2.For correctly running Case-2 steps. (By what?)k, -ti, ji+t, ji, i+t-1;76Saving the book keeping efforts in all Case-3 steps ;77Case 1: 長此以往Case 2: 節外生枝Case 3: 假设無其事1 2 3 4 5 6 7 8b b a b b a a b b a

28、 b b a a b a b b a a b b b a a b b a a b a a b a b b ;78Just keep one current growing point throughout the execution.Deriving the new position of the current growing point from its previous position (with the helpusing suffix links (斷頭指標);79Case 2: 節外生枝Case 3: 假设無其事1 2 3 4 5 6 7 8b b a b b a a b b a

29、 b b a a b a b b a a b b b a a b b a a b a a b a b b The challenges: How do we derive the position of the current growing point?Vertically (case 2)Horizontally (case 3)Q: Which one is easier?;80Moving from iteration k 1 to iteration k.The growing point does not move!This is the easier case.Case 2: 節

30、外生枝Case 3: 假设無其事1 2 3 4 5 6 7 8b b a b b a a b b a b b a a b a b b a a b b b a a b b a a b a a b a b b ;811 2 3 4 5 6 7 8b b a b b a a b b a b b a a b a b b a a b b b a a b b a a b a a b a b b Case 1: 長此以往Case 2: 節外生枝Case 3: 假设無其事;82Moving from Step i to Step i+1 in the same iteration.The growing po

31、int moves dramatically.This is the tougher case.Case 2: 節外生枝Case 3: 假设無其事1 2 3 4 5 6 7 8b b a b b a a b b a b b a a b a b b a a b b b a a b b a a b a a b a b b ;831 2 3 4 5 6 7 8b b a b b a a b b a b b a a b a b b a a b b b a a b b a a b a a b a b b Case 1: 長此以往Case 2: 節外生枝Case 3: 假设無其事;84前人種樹後人涼的哲學

32、;85每次千辛萬苦找到vertical movement的目的時, 把這個movement的起點與終點用一個link記錄下來.下回遇到這個起點時, 就可以直接走到終點去,不用再重新找一次.這些link就叫做suffix link (斷頭指標).;86終點所對應的字串,是起點所對應之字串的斷頭字串(second suffix)Case 2: 節外生枝Case 3: 假设無其事1 2 3 4 5 6 7 8b b a b b a a b b a b b a a b a b b a a b b b a a b b a a b a a b a b b ;87每個斷頭指標的起點一定是個internal

33、node, 不會葉子不會是某個suffix tree edge的中間Why?Case 2: 節外生枝Case 3: 假设無其事1 2 3 4 5 6 7 8b b a b b a a b b a b b a a b a b b a a b b b a a b b a a b a a b a b b ;88每個internal node一定是某個斷頭指標的起點Why?Case 2: 節外生枝Case 3: 假设無其事1 2 3 4 5 6 7 8b b a b b a a b b a b b a a b a b b a a b b b a a b b a a b a a b a b b ;89S

34、 =b b a b b a a b1,-2,-3,-3,-3,37,-4,-3,37,-4,-2,37,-4,-11,1121111 2 3 4 5 6 7 8b b a b b a a b b a b b a a b a b b a a b b b a a b b a a b a a b a b b ;90Going up to a closest internal node (whose suffix link must be available). Suppose this upward traversal passes through t characters.Following the suffix link that starts from this internal node.ti, j;91Going down by matching the t-character substring Si, i + t 1 of S.ti,

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

最新文档

评论

0/150

提交评论