Monday, July 22, 2013

Codility Hydrogenium 2013

I am trying to post exciting problems, together with as complete and convincing proofs as I can, and this is my first attempt.This is my first blog post. I apologize in advance if my explanation is not clear as it should be. I'll try to get back and clarify whenever I can.

Link: http://codility.com/cert/start/helium2013/

You are given a string. The problem is to find the length of the longest possible substring which satisfies the following conditions:

Is prefix (starts from the first character of the string)
Is suffix (finishes at the last character of the string)
Occurs atleast once in the middle of the string (neither a suffix nor a prefix)
None of the 3 occurances mentioned above overlap

The intended solution is to have a linear time complexity O(n), where n is the length of the provided string.

A prerequisite to the solution is to know the Z-Algorithm. The Z-Algorithm constructs a linear array (size n, where n is the size of the original string). Such that:

z[i] = length of longest prefix that equals a substring starting from position i

Note that, however, this does not guarantee non-overlapping. Which means in this case for example:
a  a  a  a  a  a
1 2  4  3  2  1

This issue can be easily solved by modifying the z value, so after calculating it, we modify each z[i] to be:

z[i]=min(z[i],i)

Which means the length of the largest non-overlapping prefix that equals a substring starting from position i.

Now that we have this, we immediately realise that the problem becomes quite easily solvable if it was only for prefix\suffix without needing another reptition in between.

If a substring starting at position i is a suffix that equals a prefix, it will have to satisfy the equality:
i+z[i]== n (where n is the length of the string)

So, this begins the idea of the basic algorithm:
For each suffix that is a prefix AND is repeated somewhere in between, pick the max of those

The question now becomes: How to know if I have a suffix that I know occurs as prefix as well, whether or not its repeated somewhere in the middle.

To solve this issue, we would construct an auxillary array to the z[] array.

This array would have the following property:
dp[i]==length of longest possible prefix that equals a substring starting from i AND does not overlap with a potential suffix that covers all or part of it.

To calculate dp[i] for each i, we first note the following:
Suppose f ==length of longest possible prefix that equals a substring starting from position i == min(z[i],i)

Number of characters after this substring will be: (n-1)-(i+f-1)

Now, this is the crux of the algorithm:
if number of characters after the longest non-overlapping prefix from it is >= length of that prefix:
dp[i]=f
Why ?
First, we can not make it > f
But how about < f ? Do we lose anything by making it equal to exactly f ?
Well, the idea is that if there exists a suffix after me, it MUST be within the characters after me. And if there are more characters after me than my length, it must be there. Otherwise its length will be > f (proof by contradiction).
Otherwise:
dp[i]=floor((n-i)/2.0)
Why ? Well, there are less characters after me than my length. This means I occupy > half the remaining characters from my start position till the end of the string.

The maximum suffix that can co-exist with me is one that occupies half whats left and not anymore.

What we have to do after that is a simple iteration from backwards. For each valid suffix (that occurs as a prefix as well), we check the maximum value of dp[] BEFORE i (can be easily calulated in O(1) using an auxillary array). If its >= current suffix length, then res=max(res,n-i). Then we return res in the end.

Similar problem (overlap allowed): http://codeforces.com/contest/126/problem/B

No comments:

Post a Comment