suffixtreeandsuffixarray内容摘要:
i $ $ Ex: 4 7 10 2 5 8 4 5 6 7 8 9 10 11 12 s124 = [ 3 2 1 5 5 4 ] s4 = [ i s s i p p i $ $ ] 1 4 7 10 2 5 8 1 2 3 4 5 6 7 8 9 10 11 12 s121 = [ 3 3 2 1 5 5 4 ] s1 = [ i s s i s s i p p i $ $ ] S12i S12j = si sj s124 s121 s4 s1 Case 2: i ≠ j mod 3 1 4 7 10 2 5 8 0 1 2 3 4 5 6 7 8 9 10 11 12 s12 = [ 3 3 2 1 5 5 4 ] s = m i s s i s s i p p i $ $ Ex: 4 7 10 2 5 8 4 5 6 7 8 9 10 11 12 s124 = [ 3 2 1 5 5 4 ] s4 = [ i s s i p p i $ $ ] 5 8 5 6 7 8 9 10 s125 = [ 5 4 ] s5 =[ s s i p p i ] S12i S12j = si sj s124 s125 s4 s5 Step 2: SA= 0 = sort the suffixes starting at position i = 0 mod 3. The rank of sj among {sk | k ≠ 0 mod 3 } was determined in Step1 for all j ≠ 0 mod 3. SA=0 = radix sort { (s[i], Si+1 ) | i = 0 mod 3 }. 0: (m, ississippi) 3: (s, issippi) 6: (s, ippi) 9: (p, i) 0: (m, ississippi) 9: (p, i) 6: (s, ippi) 3: (s, issippi) Radix sort 0 1 2 3 4 5 6 7 8 9 10 s = m i s s i s s i p p i (s[i], Si+1 ) 9: (p, i) 6: (s, ippi) 3: (s, issippi) 0: (m, ississippi) Step 1 Step 3: SA = merge SA= 0 and SA≠ 0 . SA= 0 = [s0 s9 s6 s3] SA≠0 = [s10 s7 s4 s1 s8 s5 s2] SA = merge SA= 0 and SA≠0 =[s10 s7 s4 s1 s0 s9 s8 s6 s3 s5 s2] = [10 7 4 1 0 9 8 6 3 5 2] It is in time O(n) if we can determine the relative order of Si SA= 0 and Sj SA≠0 in constant time. Time plexity analysis Step1: O(n) + T(2n/3) Step2: O(n) Step3: O(n) T(n) = O(n) + T(2n/3) = O(n) Exact matching using a Suffix Array A B A A B B A B B A C SA = 2 0 3 6 9 1 5 8 4 7 10 Basic Idea: 2 binary searches in SA Search for leftmost position Search for rightmost position SUFFIX ARRAY SA: A B A A B B A B B A C 2 0 3 6 9 1 5 8 4 7 10 Search for leftmost occurence of: B B 0 1 2 3 4 5 6 7 8 9 10 Search for leftmost occurence of: B B 0 1 2 3 4 5 6 7 8 9 10 A B A A B B A B B A C 2 0 3 6 9 1 5 8 4 7 10 BB BA Continue binary search in the right (larger) half of SA Search for leftmost occurence of: B B 0 1 2 3 4 5 6 7 8 9 10 A B A A B B A B B A C 2 0 3 6 9 1 5 8 4 7 10 BB = BB More occurences of BB left of this one possible! Search for leftmost occurence of: B B 0 1 2 3 4 5 6 7 8 9 10 A B A A B B A B B A C 2 0 3 6 9 1 5 8 4 7 10 BB BA leftmost position of BB is pointed to by SA[8] Search for rightmost occurence of: B B 0 1 2 3 4 5 6 7 8 9 10 A B A A B B A B B A C 2 0 3 6 9 1 5 8 4 7 10 BB = BA More occurences of BB right of this one possible! Search for rightmost occurence of: B B 0 1 2 3 4 5 6 7 8 9 10 A B A A B B A B B A C 2 0 3 6 9 1 5 8 4 7 10 BB C rightmost position of BB is pointed to by SA[9] Results of search for: B B 0 1 2 3 4 5 6 7 8 9 10 A B A A B B A B B A C 2 0 3 6 9 1 5 8 4 7 10 rightmost position of BB is pointed to by SA[9] leftmost position of BB is pointed to by SA[8] =All occurences of the pattern BB are pointed to by SA[8..9] Important Properties for |SA| = n and m = length of pattern: • Size : 1 Pointer per Letter (4 Byte if n 4Gb) • Speed of exact matching : • O(log n) binary search steps • of pared chars is O(mlogn) can be reduced to O(m + log n) Longest mon prefixes Definition: lcp(i,j) is the length of the longest mon prefix of the suffixes beginning at SA[i] and SA[j]. Mississippi Example SA[2] = 4 (issippi) SA[3] = 1 (ississippi) lcp(2, 3) = 4 SA = [10 7 4 1 0 9 8 6 3 5 2] s = m i s s i s s i p p i Example Let S = mississippi i ippi issippi ississippi mississippi pi 7 4 1 0 9。suffixtreeandsuffixarray
阅读剩余 0%
本站所有文章资讯、展示的图片素材等内容均为注册用户上传(部分报媒/平媒内容转载自网络合作媒体),仅供学习参考。
用户通过本站上传、发布的任何内容的知识产权归属用户或原始著作权人所有。如有侵犯您的版权,请联系我们反馈本站将在三个工作日内改正。