Data Structures, Algorithms, & Applications in C++
Chapter 19, Exercise 21
Let lcs(a,b) denote a longest common subsequence
of the strings a and b.
We make the following observations:
-
If
a1 = b1, then
lcs(a,b) = a1 + lcs(a2...an,
b2...bm), where +
denotes string concatenation.
-
If
a1 != b1, then
lcs(a,b) = lcs(a1...an,
b2...bm) or
lcs(a,b) = lcs(a2...an,
b1...bm), depending on which is longer.
These observations result in the following dynamic programming
recurrence:
l(i,j) = 0, i = 0 or j = 0
l(i,j) = 1 + l(i-1, j-1), if ai = bj
l(i,j) = max{l(i-1, j), l(i, j-1)}, otherwise
We may determine a longest common subsequence as follows:
- Step 1
-
Set
l(0,j) = 0 for 0 <= j <= m
Set l(i,0) = 0 for 1 <= i <= n
This takes O(n+m) time.
- Step 2
-
Compute
l(i,j) for
i = 1, 2, ..., n, in that order,
using the equation for l(i,j)
developed above. For each i,
compute l(i,j) for
j = 1, 2, ..., m, in that order.
It takes O(mn) time to compute all of these
l(i,j) values.
- Step 3
-
l(n,m) gives the length of
lcs(a,b).
The longest common subsequence is determined by using a traceback procedure
for l(n,m).
The traceback for l(i,j) is recursive.
When l(i,j) = 0, there are no more characters
in the lcs, and we are done.
If l(i,j) = 1 + l(i-1, j-1), then
ai (equivalently,
bj) is the next character
in the lcs; the remaining characters are obtained by performing a traceback
for l(i-1, j-1).
If l(i,j) = l(i-1, j), then
the remaining characters are obtained by performing a traceback
for l(i-1, j).
Otherwise,
the remaining characters are obtained by performing a traceback
for l(i, j-1).
The traceback takes O(min{m,n}) time.
From the above description and analysis, we see that the overall complexity
is O(mn).