c++ - How does this code for obtaining LCP from a Suffix Array work? -
Can anyone tell me how does this code work to produce LCP from a suffix array? suffixArr []
is an array that holds the value of the index in the string for suffix with suffixArr [i]
rank i .
zero LCPconstruct () {int i, c [1001], l; C [suffixRR [0]] = N; For (i = 1; i & lt; n; i ++) C [suffix ARR [i]] = suffix array [i-1]; L = 0; For (i = 0; i & lt; n; i ++) {if (c [i] == n) LCPAdge [i] = 0; Other {While (I + L & L; N & amp; C; i) + L & LT; N & amp; A [I + L] == S [Ii] + L] L ++; LCPEdge [ I] l = l = L (max. -1.0);}} for (i = 0; i & lt; n; i ++) COAT & lt; lcpeds [suffix ARR [i]] & lt ; & Lt; "\ n";}
First of all, it is important to realize That the algorithm processes the suffix in the original order, i.e. the order in which they appear in the input string Not in the syfographical order.
So if the input string is abxabc
, then it first accepts abxabc
, then bxabc
, Then xabc
and so forth.
For each suffix it considers in this sequence, it determines the suffix state, which is its Dictionary preceding (* ) (So here, and only here, it uses the concept of the dictionary order) For the first suffix abxabc
, the dictionary The successor, i.e. suffix, which appears directly in the dictionary order of suffix, is the abc
in this array c
by o (1) lookup, Has been prepared for this purpose.
By entering abxabc
and ABC
in the inner loop, another determines that the first 2 letters in these two prefixes are identical. This is the variable l
in your code, and that means the login to LCP should be abxabc
2 for the suffix, so we set the LCPadj [i] = L
. Note that i
here refers to the suffix state of the input string, which is not in the suffix array. Then LCPadj
is not an LCP array (yet) it is a supportive data structure.
Then it turns out to be the next string, which is bxabc
. Then using that code to find C
is the predecessor of that dictionary and after that determines how many prefix letters are two parts. And here comes the trick: it can be ensured that it is at least as the previous step (like 2), should be equal to zero 1. Why? Since we currently believe that the string, bxabc
, is already considered ( abxabc
) is a suffix of string, so that the string preceding string ( abc
) should also have a suffix that is 1 character small ( bc
), and this suffix should be anywhere in the suffix array, and it shares its prefix with the currently existing string Besides, zero first character, besides any other Can not be the expense which is currently considered that the string is close to both, both small and grammatically, the latter is quite logical if you think how lexifographical sorting works, but there are formal evidence about it (for example Lemma 5.10) in the lecture of Karkkenin so that the main principle can be described on the work.
- As explained,
C
is an auxiliary array (n
length in integer) which stores, each in the input string For the suffix, the condition of the other suffix that has the immediate predictor of the instant, this array is not created from left to right, but (wisely) going from left to right through suffixation, because it determines the precise terminology of any string preceding Be easy to do: suffix immediate Semantic previous starts from positionCredit ARR [i]
mustPrtyyaarar [i -1] . Confirm in your code how this is defined as
C
. - As mentioned above,
LCPadj stores the LCP value for the suffix in that order in which they appear in the input string, not the order in which they are in the suffix array This is the reason why the output time is that, at the output time,
LCPadj
is not printed from left to right, but going from left to right through suffix andLCPadj [i]
The order can not be in that print. Confirm that this is the case.
I hope it helps. Let me know whether or not.
(*) Lexifographical predecessor I mean immediate prefix suffixes suffix in sorted list of lixigraphy, i.e. Suffix to its immediate left in the suffix array.
Comments
Post a Comment