外文文献 Factorizing words over an ordered alphabet_第1页
外文文献 Factorizing words over an ordered alphabet_第2页
外文文献 Factorizing words over an ordered alphabet_第3页
外文文献 Factorizing words over an ordered alphabet_第4页
外文文献 Factorizing words over an ordered alphabet_第5页
已阅读5页,还剩14页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

JOURNAL OF ALGORITHMS 4, 363-381 (1983)Factorizing Words over an Ordered AlphabetJEAN PIERRE DUVALLaboratoire dInformatique, Faculte des Sciences, Universite de Rouen B.P. 67, 76130 Mont Saint Aignan FranceReceived May 15,1982An efficient algorithm to obtain a factorization of words over an ordered alphabet known as Lyndon factorization is presented. Applications of this algorithm are given to the computation of the least suffix of a word and the least circular shift of a word.INTRODUCTIONA number of algorithms deal with linear arrays of symbols,also called strings or words. This contains for example string-matching algorithms and, more generally, the algorithms related to the theory of automata.A special case occurs when the set of symbols is an ordered set. This leads of course to the vast domain of sorting algorithms. The purpose of this paper is to show how a computation of a special decomposition of a string allows the simultaneous solution of several special sorting problems such as finding the lexicographically least circular substring, finding the minimum suffix of a string. Tlie first problem has recently been studied by K. S. Booth 2 and by V. Siloach 8. The second one is usually treated by the method of the so-called “position tree” (see 1】,for instance).The decomposition of words that we study here is the following: any word w can be written uniquelyw = ww2. ,where(1) the sequence w” w2,, vv is nonincreasing for lexicographic order,(2) each wt is strictly less than any of its proper circular shift.This decomposition is due to Chen, Fox, and Lyndon 3 and it has been introduced for the purpose of computing a basis of the free Lie algebras. It is a particular case of a factorization of free monoids (see 7).3630196-6774/83 $3.00Copyright 1983 by Academic Press, Inc.All rights of reproduction in any form reserved.364 JEAN PIERRE DUVALIn the first section, we introduce the necessary background to handle Lyndon words. Some of the material is of common knowledge to the specialists of this field. Some other is not standard; such is the characterization of the left factors of Lyndon words which is used as a basis for the algorithms presented further. A complete study of this type of property appears in 5.The second section presents the algorithms of factorization of a word into Lyndon words. Two variants of this algorithm are described. They differ by the use in the second of an auxiliary table which plays a role which is similar to the “failure function” used in the Knuth-Morris-Pratt algorithm 6. A preliminary version of these algorithms has been presented in 4】.The third section presents three applications of the factorization algorithm; the first one is the computation of the minimum suffix of a word; the second, of the maximum suffix. The third one is the computation of the least circular shift of a word. This problem has been already considered in 2 and also in 8.I. FACTORIZING INTO LYNDON WORDSWe consider a fixed alphabet A which is a totally ordered set. We denote by the free monoid over the set A. A + is A* minus the empty word. The free monoid A* is ordered by lexicographic order. Therefore, for u,v A* one hasu .If u is not a prefix of t, since u has the form v = uvff with t/ a proper suffix of v. Therefore, v 1, ukvk, e L.366 JEAN PIERRE DUVALIt is a particular case of the above property that:For w L,a Ayk 1 and w 2).If there is no maximum letter in A, Pf = P.One hasL c ?.LEMMA 1.6. Let w = (uavf)ku with M, vf e A*, a k 1 and uavf e L. The following propositions hold:(1) For af A and a / - L.(3) For ar A and a a waf PProof, Let af A with a For h e A*, we have that (uavf)kuafh r e A*, a A, k 1 and uavr G L. By Lemma 1.6, wa e P Therefore, w is a prefix of a word in P Then w e P This proves S c P.Let u G L; we have v = (uv)u, where u is the empty word. Therefore, t? e 5. This proves L and uxaxv e L. Then uav)kxua is in Pf L, and according to Lemma 1.6, we have that al = a. Therefore, we have that w has the form (uv)ku, where M, vy k has the respective value, either u = uxax, v - v, A: = A:j if v is nonempty, or u is the empty word v = uxaxvfv /c = + 1 ifv is empty. In both cases we have that uv e L, v A+ and k L Therefore w e S.Now then, we have Pf c S. DTherefore, to recognize if a word is a possible prefix of a Lyndon word we have to recognize if it is in S. Lemma 1.6 gives us a method to do this by reading the word from left to right. This the basic idea used in Section II.The proof of the following result can be found in 7.THEOREM 1.8 (CHEN, FOX,LYNDON). Any word w A* admits a unique factorization.368 JEAN PIERRE DUVALsuch that each (1 ( i ( m) is a Lyndon word and w2 (1.4)The decomposition (1.3) will be called in the sequel the standard factorization of the word w. We denote it byCFL(w) = w,w2. wm.It is not difficult to see that such a factorization exists. In fact, in any factorization of w into Lyndon words which does not satisfies (1.2) we can group two consecutive factors u, v with u Then w/ w. . Therefore wm is the minimum suffix of w. This proves (1).Assume that s is strictly longer than wm. Then i vm 1. Therefore, wj vy, v. Since wxw2 L, it follows from Pro-FACTORIZING WORDS OVER AN ORDERED ALPHABET 369position 1.2 that w, w2. Therefore CFL(w) = MCFIVV). PROPOSITION 1.11. Let w = (uavf)quafh with u, v h e a, af e A, q and uavr G L. Letw = w2 = * * * = = uavf.We have thatCFL(uavf)qu) = wxw2. wCFL(w)and, for a arCFL( w) = . wg CFL( uarh) whatever h e A*.Proof, Since a Lyndon word has no proper prefix and suffix, the longest prefix of (uavf)qu which is in L is a prefix of uav Therefore it is wm/. According to Proposition 1.10, we have thatCFL(wat/)9w) = CFL(wat/)9_1w)and, by induction for q 2,CFL(uavf)qu) = CFL(w).Assume a a according to Lemma 1,6, (uav)quah 史 P (whatever h e A*). Therefore, the longest prefix of w which is in L is a prefix of (uavf)qu. The same arguments as above lead us toCFL(v) = wqCFL(uafh). Now then, Proposition 1.11 and Lemma 1.6 give a method for factorizing a word into Lyndon words by reading it from left to right. This is done in Section II. But, before we do it, let us see the problem of minimum suffix. It is obviously related; according to Proposition 1.9 the minimum suffix of a word is the last term of the factorization.We denote by minsuf(w) the minimum nonempty suffix of w A+.PROPOSITION 1.12. Let iw G L with uv A+. Then, minsuf(M) is a prefix of M.Proof. Let s = minsuf(M). We have that s u. If s u and s is not a prefix of u, then sv l and uav e L. Let sa be the minimum suffix of ua. The following370 JEAN PIERRE DUVALpropositions holds:(1) If a a rmnsni(uavf)guar) = minsuf(/i) whatever h G A*.Proof. For a af, according to Proposition 1.11,the last term in the standard factorization of w is the last term in the factorization of uafh. Therefore minsuf(um/)今wflA) = minsuf(i/a/i). Let be a suffix of u longer than s. We have that sa (or j = n 1). We have that al. is the first factor in the standard factorization of a. an. The other factors are obtained by factorization a/_r+1. an. The variable 众 is introduced (k = / - V) in order that the remaining suffix is ak+l. an. The prefix ax. ak is no more considered.FACTORIZING WORDS OVER AN ORDERED ALPHABET 371At this step the corresponding property for (2.1) isak-)r - - ai- = ak+j-i+ - - - aj- ak+ aj-i 各 L.(2.10And, for k /, the process may restart at the beginning of the remaining suffix ak+ anby taking / = A: 4- 1 and j = k 2. That is the method for Algorithm 2.1. But this implies a rescanning from ak+l to ay. Such a rescanning is avoided in Algorithm 2.2 by introducing a table /, such that if A: V we have that (2.T) holds for the values i = fc + ff - k andj =/*Then, Algorithm 2,1 is as follows:Algorithm 2.1Input: a word string ax. anof letters over A.Output: the sequence FACT = (ku , , km) such that, for 1 一 1 * * a灸丨 _ a众丨 + 】 . . - , , + 1 rnone hasCFL(fl!an) = WMwm. (2.2)The method is described as follows:(FACT: sequence of integers): begin FACT: = the empty sequence; k:-0 while k a ory = /j + 1: (repeat k: = k (j /);add * in FACTendcaseendwhileenduntil k i)Let us prove the correctness of the above program.Let k i and f be the respective values of fc, and j at the first time we enter the while loop with k 0. Thus, and f are the values of i and j at the first time we have that at aj or y = + L372 JEAN PIERRE DUVALSince k = Q from y = 2 to this step, we may ignore the variable A: in a first analysis. And show that (2.1) holds from y = 2 to f.For i = 1 and j = 2, a at_j and a) are the empty word and a. ai is reduced to one letter ax. Therefore, (2.1) holds.Then assume that (2.1) holds at step j. The comparison is between the letters ai and aj.For at = (2.1) holds with the new respective values/ + 1 and y + 1 for i and j, (2.3)This is quite clear. It is the justification for case 2 statement in the program.For the current values i and j we define q(i, j), r(i, j), (/, j), a(i, j), and t/(/,j_) as follows:/- 1a ar andmod(7 - 1, ; - /), 1/ = + 2w.Since (2.1) holds we have thatuav ay. a】G L a =u = ax. ar,(2-4)and . = (uavf)qu.(2.5)Then, we have thatFor ai ar (or / = n + 1) andCFL(a,. a) = a)_ (2.8)Proof We assume the notation (2.4) for i = V and j = /. Then we havethat a,. a = (uavf)quar. an.Since a ay (or jf = n 1) and uavf e L, it follows from Proposition1.11 that CFL(a. an) = xw2. wq CFL(way an).Since kq = qf - i), uaf.an = akq+This proves (2.8). Now then, we are in case 3 statement with ir = j The repeat statementconsists to add in FACT the successive ku k2,., kq as defined in (2,7).When the repeat statement ends the variable k has the value kf = kq. Then,we enter the while loop with k = kf it is the first time that k 0 whenentering the while loop.At this step the situation is as follows:We have in FACT the kvk2, ., kq corresponding tothe first q factors in (2.2). The remaining factors arethose of the standard factorization of ak,+x. an.A+i ay- *= A. a“ with r =/ - 1 - kf.Then, ax. ak is no more considered, and the process continues forakJrX. an. The corresponding property to (2.1) is now (with k = kf)ak,+1 * * -1 ak+j-i+ - aj and ak+i - - - ak+j_i /: array 2n + 1/2J of integersbegin: FACT: = the empty sequence; k = 0; j: = 2,/2; = 1; while 众 a ory = n + 1: (repeat k: k j - /); addkin FACTuntil k i; if k =j - then j: -y H- 1)endcaseendwhileend (“if y +l. ar_x = a. ar, and / satisfies (2.11) at step k i / it is true thatFor j = k( 2to/, (2.T) holds for i = kf fj kf. (2.IT)Then, we have to process the suffix aA,+1. an and according to (2.11) we may continue with i = kf + fj k and j = / as the new values for i and j The table / is still correct for j = kf + 2 to f. This is the method in Algorithm 2.2. And there is no more rescanning of +dy_vNote that the conditional “/y - 1 a ory = + 1. A new factor is obtained. If the new value k is j - 1, then charge the letter af, in this case j is increased by one and a is no more charge. If the new value k is such that k 2.376 JEAN PIERRE DUVALThe total charge is: n for the letters a2,., ananJrX9 and, since the factors of length 1 are never charged, no more than n/2 for the factors. Then 3n/2. III. APPLICATIONSA first application concerns the computation of the minimum suffix of a word. Since the minimum suffix of a word is the last factor in the standard factorization of this word (Proposition 1.9), Algorithm 2.1 may be used to compute it, using only three indices and no more than In comparisons between two letters. Algorithm 2.2 may be used too.But we can do better and compute for a given word a. an the table M defined byFor 1 :(k: = Mi + j i if k = j - then begin Mj: = k j: =y + 1 end)endcaseendwhileendLet V and j. be the respective values of / and ) at the first time we find a aj. From 7 = 2 to j Algorithm 3.1 is just the same as Algorithm 2.2.FACTORIZING WORDS OVER AN ORDERED ALPHABET 377Thus we have that (2.1) holds and with the notation (2.4).uaif = aE L, ci = au and ax. a_x = (uavf)qu.For at ay. Let sa be the minimum suffix of ua. Note that according to Proposition 1.13 it is the minimum suffix of uavf)qxua which is av. According to Proposition 1.13 we have that mmsuf(a!.,. ah) = rmnsxxiisah) whatever h G. A*. Let kf = M(if) + f .Since aM(r)+1. = aM(r)+rr+. af_x = we have thatminsuaj.ajfh) = minsuf(+1. cijh) whatever h e A*.This is the justification for “k: = M(i) +j - iin case 3 statement. But what about the / corresponding to akf+l. ay for the new value kf of klAccording to Proposition 1.12, since uavf e L we have that 似 is a prefix of ua. Therefore ak, + x. a_, = a“. ar_ k. Then the table/is correct for akf+t u when we have it relative to the basis index k. That is truly what is done. Therefore, the process may continue and same arguments as above still holds.The case kf = j 1, is a particular one since we have that aWJr =af. In such case the word reduced to a single letter is its own minimum suffix, and we process the next j. This is the justification for uif k =j - then begin M(j): = k j: = 7 + 1 end” in case 3 statement.THEOREM 3,1. Algorithm 3.1 computes for the word ax. an the table M. n such thataW0)+i- a j = minsuf(a,. a7) for 1 . It is not difficult to see thatFor w, i? e u 2 if there is a maximum letter c in 4; otherwise Pr = P.We have that Pf c Pff.Proof. Since, L c Pf, and each prefix of a word of P“ is in 尸,we have that P c P,f. It is clear that or a A and A: 2, we have that ak P. Then Pf c Pf,. PROPOSITION 3.2.尸 =Proof. Let w e Pn and v/ be the longest prefix of w which is in P According to Proposition 1.7, we have that has the form wf = uavr)qu with u, vf e A*, a A, q 1, and uavf e L.Suppose that w wf. Let af A, h A*, be such that w = wfafh. According to Lemma 1.6, since wfaf Pr we have that ar ay and k takes the new value kf3 for Algorithm 3.1 or kr2 for Algorithm 2.2. Since the process at this step consists to consider now akf+,. instead of A,. a, with kf = fc3 or k2, kl ay.Let sa be the minimum suffix of ua. According to Proposition 1.12, sa is the minimum suffix of mvf)qua. Let m be 啟 suffix of (uavf)qua longer than380JEAN PIERRE DUVALsa. We have that sa 1.It is quite clear that each circular shift of a word is a factor of ww. Therefore, to recognize it, we may search in vvvv a factor of length |v| which is a power of a Lyndon word. Then, the least circular shift uq of w appears when factorizing ww into Lyndon words.PROPOSITION. Let a“. anbe a word. When factorizing ax. anax an into Lyndon words by either algorithm 2.1 or algorithm 2.2, the least circular shift is ak+x, a-_x when for the first time we have that j l k = n and j i divides n.Proof. For simplification of notations we assume that ax, an is a primitive word (i.e., is not a power of another word). Then, the least circular shift w = m ol an = uv, is a Lyndon word. We have that a. anax. an = uwfv. Since M

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论