Manacher Algorithm

This post tries to explain the idea behind Manacher’s algorithm. It is written mainly for my own future reference so it might lack some clarity. I might improve upon it in the future when I have the time.

Introduction

Manacher’s algorithm is a O(N) solution for finding the longest palindromic substring of a given string.

A palindrome is a sequence of symbols or elements that reads the same forward or reversed.

It is a dynamic programming problem. We let $dp[i]$ be the longest palindrome length for the palindrome that is centered at character i. We use a bottom up approach and fill the table dp. After that we find the maximum value in dp and retrieve the substring.

The transformation (preprocessing)

A clever trick to avoid splitting cases for odd and even as well as boundary checks is to transform the given string into one with delimiters added in between each character.

$abacabb \mapsto \#a\#b\#a\#c\#a\#b\#b\#$

This will help us easily identify our palindrome because in our transformed string, the longest palindrome must start and end on a delimiter, and this maps nicely back to the index of the original string, providing an easy way to extract the substring. For example, to extract the palindrome bb, we note that it is a palindrome about index 12, and the left and right limits are 10 and 14 respectively. Dividing it by 2 gives us the indices of the original string, 5 and 7. And we can extract the substring by using s.substring(5,7).

The crux

The crux of the algorithm is noting that after calculating some $dp[k]$, we can save calculations of $dp[n]$ for $k < n \le k + dp[k]$.

To see why. Let us denote the indices of left limit of the palindrome as L, right limit as R, and the center as C. Let T be the transformed string. Then let $C = T[k], L = T[k-m], R = T[k+m]$ where $m$ is the range on either side of the palindrome centered at C.

Suppose we are now calculating $dp[i]$ where i is between C and R. There are two cases for this palindrome that is centered at i. Either it extends past R or it doesn’t. Suppose it doesn’t extend past R. Then by looking at the reflected point about C, that is dp[2C – i], we know that the palindrome centered at that point must not extend past L either. Therefore, $dp[i] = dp[2C-i]$. In the other case, the palindrome extends beyond R so we have to manually calculate the length of this palindrome centered at i. However, we can speed up this calculation by noting that since this extends past R, it’s length must be at least R-i, so we can start comparing characters after that distance.

This translate into code by setting dp[i] = min(dp[2C-i], R-i) and then updating dp[i] by trying to extend the palindrome.

Functional programming languages

When I first came across the functional paradigm, I thought that it sounded very cool and hip because other paradigms were too mainstream.

But when asked in several occasions to explain it, I found difficulty in conveying the essence of what it means to be a functional language. So I have decided to sit down and examine this issue.

While mainly emphasized in academia (standard ml, haskell), there have been many uses in the real world. Jane Street uses OCaml, some financial institutions use q. Many non-functional languages even support functional paradigm. In C++11 for instance, lambdas are introduced.

So what is a functional language? Firstly, a functional language is mostly pure. That is, given the same input, the function will always return the same output. This is similar to the mathematical definition of a function. Other forms of programming paradigm sometimes use function to mean a subroutine – jumping out of sequential execution to execute a block of code – and this may not be pure because the block might have some side effects such as modifying a global variable. Such pureness can lead to optimisations such as running code in parallel and memoization, which are attractive now that Moore’s law is arguably no longer true and people are looking at other ways to improve performance.

Secondly, a functional language will most likely have first class functions which means that functions are treated as values which can be returned by a function or passed around (to other functions). This is supported in most scripting languages like javascript, python, etc. Personally, I think this is a sensible way programme. Compare

`for (int i = 0; i < Thing.length; i++) {doStuff(i)}`

with

`Thing.forEach(doStuff)`

However, this comes at a cost because extra indirection is needed to treat functions as values.

I am not a big advocate of functional languages, but I think that a functional paradigm is inherently very elegant and neat (also readable) and this should be encouraged if it is the most natural way to write code for a use case.