simshadows

Dynamic Programming - CSES Problem Set Solutions

I solved all problems on this page without any hints/spoilers (not even reading the CSES recommended book).

Visit https://cses.fi/problemset/ for the full problem set.

I don't provide the full problem specifications on this page due to possible copyright issues. Links to the original problem specs are provided below along with the date accessed, which should allow you to use Internet Archive if the original URL hosting a problem specification ever meaningfully changes.

Dice Combinations [Spec] (2024-03-13)

The subproblem is simply a Fibonacci-esque recursive function:

F<0=0F0=1Fn=Fn1+Fn2+Fn3+Fn4+Fn5+Fn6forn>0\begin{gather*} F_{<0} = 0 \qquad F_0 = 1 \\ F_n = F_{n-1} + F_{n-2} + F_{n-3} + F_{n-4} + F_{n-5} + F_{n-6} \quad\text{for}\quad n > 0 \end{gather*}

With DP, this runs in O(n)O(n) time. Considering how the input bounds set n106n \le {10}^6, a test case can in theory run a hotspot 6×1066 \times {10}^6 times.

We choose Fn1F_{n-1} through to Fn6F_{n-6} since we know each dice roll has to be integers 1-6.

My first attempt was a top-down implementation, though it comes to no surprise that it causes runtime errors (presumably stack overflow):

Setting a higher setrecursionlimit fixed the runtime errors, but instead TLE’s. Not too surprising. Implementation:

Switching to a simple top-down approach passed all tests (worst-case runtime in CPython and PyPy3 was 1.00s and 0.22s, respectively):

#!/usr/bin/env python3

MOD = int(1e9 + 7)

n = int(input())

tab = [1]
for _ in range(n):
    window_size = min(len(tab), 6)
    tab.append(sum(tab[-window_size:]) % MOD)
print(tab[-1])

Switching to only storing a fixed-size window reduced the worst test case runtimes to 0.44s and 0.13s (CPython and PyPy3, respectively):

#!/usr/bin/env python3

MOD = int(1e9 + 7)

n = int(input())

tab = [0, 0, 0, 0, 0, 1]
for i in range(n):
    tab[i % 6] = sum(tab) % MOD
print(tab[(n - 1) % 6])

Making it so we don’t re-add all six elements of tab anymore turned out to perform worse:

#!/usr/bin/env python3

MOD = int(1e9 + 7)

n = int(input())

tab = [0, 0, 0, 0, 0, 1]
this_sum = 1
for i in range(n):
    i %= 6
    old = tab[i]
    tab[i] = this_sum
    this_sum = ((this_sum * 2) - old) % MOD
print(tab[(n - 1) % 6])

Minimizing Coins [Spec] (2024-03-31)

The first subproblem that comes to mind is, for every integer from 00 to xx, we find the fewest number of coins possible. Runtime is O(xn)O(xn):

C={c0,,cn1}=set of all coin values C = \braces{c_0, \dots, c_{n-1}} = \text{set of all coin values}

F<0=F0=0Fx=mincC(Fxc)+1forx>0\begin{gather*} F_{<0} = \infty \qquad F_0 = 0 \\ F_x = \min_{c \in C}\parens{ F_{x - c} } + 1 \quad\text{for}\quad x > 0 \end{gather*}

Considering how the input bounds set n100n \le 100 and x106x \le {10}^6, a test case can in theory run a hotspot 108{10}^8 times. Assuming a similar style of implementation to my first Dice Combinations passed solution which had a worst runtime of 0.22s, it doesn’t sound like Minimizing Coins in Python should pass since we get a hotspot running at least 10x as many times.

For fun, we start with a Python top-down solution, as well as bottom-up, and I also tried an optimization that precomputes some trivial table entries. The logic appears correct, but they TLE as expected:

Implementing the bottom-up approach in C++ passed all tests (with a longest runtime of 0.15s):

#include <iostream>
#include <vector>
#include <limits>

constexpr long LONG_MAX = std::numeric_limits<long>::max();

int main() {
    long n, targetSum;
    std::cin >> n >> targetSum;

    std::vector<long> coins;
    for (long i = 0; i < n; ++i) {
        long tmp;
        std::cin >> tmp;
        coins.push_back(tmp);
    }

    std::vector<long> tab(targetSum + 1, LONG_MAX - 1);
    tab[0] = 0;

    for (long i = 1; i <= targetSum; ++i) {
        for (const auto coin : coins) {
            if (coin <= i) {
                tab[i] = std::min(tab[i], tab[i - coin]);
            }
        }
        ++tab[i];
    }

    std::cout << ((tab[targetSum] == LONG_MAX) ? -1 : tab[targetSum]) << std::endl;
    return 0;
}

Coin Combinations I [Spec] (2024-04-01)

Different orderings of the same set of coins count as different combinations, which makes subproblem selection easier. My chosen subproblem is to find the number of combinations for each possible target sum 00 to xx. For nn coins, the runtime is O(xn)O(xn):

C={c0,,cn1}=set of all coin values C = \braces{c_0, \dots, c_{n-1}} = \text{set of all coin values}

F<0=0F0=1Fx=cCFxcforx>0\begin{gather*} F_{<0} = 0 \qquad F_0 = 1 \\ F_x = \sum_{c \in C}{ F_{x - c} } \quad\text{for}\quad x > 0 \end{gather*}

Input bounds are similar to Minimizing Coins, so a hotspot possibly runs 108{10}^8 times, and is similarly unlikely to pass using Python.

But let’s write a top-down Python implementation anyway! As expected, it TLE’s:

Implementing bottom-up in C++ passes all tests:

#include <iostream>
#include <vector>

constexpr long MOD = 1e9 + 7;

int main() {
    long n, targetSum;
    std::cin >> n >> targetSum;

    std::vector<long> coins;
    for (long i = 0; i < n; ++i) {
        long tmp;
        std::cin >> tmp;
        coins.push_back(tmp);
    }

    std::vector<long> tab(targetSum + 1, 0);
    tab[0] = 1;

    for (long i = 1; i <= targetSum; ++i) {
        for (const auto coin : coins) {
            if (coin <= i) tab[i] = (tab[i] + tab[i - coin]) % MOD;
        }
    }

    std::cout << tab[targetSum] << std::endl;
    return 0;
}

Coin Combinations II [Spec] (2024-04-01)

If we used the strategy from Coin Combinations I, we run into the problem of needing to avoid double-counting combinations. To deal with this, maybe we need a two-dimensional table: one dimension for coin value, and another dimension for coin sum. The following subproblem comes to mind, which may naively run in O(x2n)O(x^2 n) time:

C={c1,,cn}=set of all coin values C = \braces{c_1, \dots, c_n} = \text{set of all coin values}

F[0,x]={1x=00otherwiseF[i,x]=j=0F[i1,xjci]fori>0\begin{gather*} F\brackets{0, x} = \begin{cases} 1 \quad& x = 0 \\ 0 \quad& \text{otherwise} \end{cases} \\ F\brackets{i, x} = \sum_{j = 0}^{\infty}{ F\brackets{i - 1, x - j c_i} } \quad\text{for}\quad i > 0 \end{gather*}

I say “naively” because there may be a way to cut that runtime to O(xn)O(xn) with a clever optimization. I’ll get to my idea for that later.

I am very certain O(x2n)O(x^2 n) will TLE, but I made a top-down Python implementation anyway just for fun. It passes some tests, but TLE’s everything else. Good to know that we’re not getting any wrong answers! Implementation:

Let’s have a look at my O(xn)O(xn) idea now. I’d use the same subproblem and implement it bottom-up, but I’d fill up the table with the help of an accumulator, which should let me avoid doing the whole j=0\sum_{j=0}^{\infty} summation in O(x)O(x) time each time. Instead, the accumulator calculates this summation in O(1)O(1) on average.

It’s probably easier to just see how this works in code.

My first implementation works perfectly fine on my computer, but causes weird RUNTIME ERRORs on some test cases when I submit to CSES. I have a screenshot included in the spoiler block below. My code provides the correct answers when I run them on my computer, and none of them TLE, so… I’m not sure what’s going on. Full implementation:

(If anyone knows why I’m getting RUNTIME ERROR, I’d love to know!)

I improved on this solution by changing to a more space-efficient 1D vector, which did pass all test cases:

#include <iostream>
#include <vector>

constexpr long MOD = 1e9 + 7;

int main() {
    long n, targetSum;
    std::cin >> n >> targetSum;

    std::vector<long> coins;
    for (long i = 0; i < n; ++i) {
        long tmp;
        std::cin >> tmp;
        coins.push_back(tmp);
    }

    std::vector<long> tab(targetSum + 1, 0);
    tab[0] = 1;

    for (const auto coin : coins) {
        for (long offset = 0; offset < coin; ++offset) {
            long acc = 0;
            for (long i = offset; i <= targetSum; i += coin) {
                acc = (acc + tab[i]) % MOD;
                tab[i] = acc;
            }
        }
    }

    std::cout << tab[targetSum] << std::endl;
    return 0;
}
TODO: Were the RUNTIME ERRORs to do with the way I implemented the 2D table as 2D vectors?

Removing Digits [Spec] (2024-04-03)

The first subproblem that comes to mind finds the minimum number of steps for each number from 00 to nn. It runs in O(nlog10n)O(n \log_{10} n) since there are O(n)O(n) entries in the 1D table, and at each entry, you check every decimal digit (which is O(log10n)O(\log_{10} n)). The subproblem written out:

n=input number n = \text{input number}

F0=0Fn=mindD(n)(Fnd)+1forx>0\begin{gather*} F_0 = 0 \\ F_n = \min_{d \in D\parens{n}}\parens{ F_{n - d} } + 1 \quad\text{for}\quad x > 0 \end{gather*}

where D(n)=all decimal digits of n except 0 \text{where } D\parens{n} = \text{all decimal digits of } n \text{ except } 0

For fun, I started with a Python top-down implementation, which TLE’s as expected, but also didn’t give any wrong answers:

My bottom-up implementation in Python passed all tests:

#!/usr/bin/env python3

n = int(input())

tab = [0]*(n + 1)
for v in range(1, n + 1):
    solution = n  # arbitrary large number
    v2 = v
    while v2 > 0:
        digit = v2 % 10
        if digit:
            solution = min(solution, tab[v - digit])
        v2 //= 10
    tab[v] = 1 + solution

print(tab[-1])

It also passed all tests when I used string conversion to get the digits:

#!/usr/bin/env python3

n = int(input())

tab = [0]*(n + 1)
for v in range(1, n + 1):
    tab[v] = 1 + min(tab[v - int(c)] for c in str(v) if (c != '0'))

print(tab[-1])

Grid Paths [Spec] (2024-04-03)

The fact that paths can only move in one direction makes this problem so much easier since you don’t have to worry about paths backtracking. The first subproblem that comes to mind is to, at each cell in the grid, you add up how many paths can come from above with how many paths can come from the left. It runs in O(n2)O(n^2) (or more usefully, O(N)O(N) for NN total cells in the grid):

G[i,j]{EMPTY,TRAP}\begin{gather*} G\brackets{i, j} \in \braces{\texttt{EMPTY}, \texttt{TRAP}} \end{gather*}

F[i,j]={0(i<0)(j<0)(G[i,j]=TRAP)F[i1,j]+F[i,j1]otherwise\begin{gather*} F\brackets{i, j} = \begin{cases} 0 \quad& \parens{i < 0} \lor \parens{j < 0} \lor \parens{G\brackets{i, j} = \texttt{TRAP}} \\ F\brackets{i - 1, j} + F\brackets{i, j - 1} \quad& \text{otherwise} \end{cases} \end{gather*}

Surprisingly, a Python top-down implementation using @functools.cache (and running on CPython3) almost passed, but TLE’d on only 2 out of the 15 test cases:

It took 0.99s for one of the passed 1000×10001000 \times 1000 test cases, which indicates to me that not only should Python be feasible, but Python top-down is likely to be feasible as well if I can lower the overhead. And indeed it did pass, although the code isn’t quite that nice since I’m implementing my own memoization:

#!/usr/bin/env python3

from sys import stdin, setrecursionlimit

setrecursionlimit(int(2e6))

MOD = int(1e9 + 7)

n = int(stdin.readline())
grid = [s.strip() for s in stdin.readlines()]

memo = [[None]*n for _ in range(n)]
memo[0][0] = 1

def dp(i, j):
    if (i < 0) or (j < 0) or grid[i][j] == '*':
        return 0
    elif memo[i][j] is None:
        memo[i][j] = (dp(i - 1, j) + dp(i, j - 1)) % MOD
    return memo[i][j]

print(dp(n - 1, n - 1))

My bottom-up implementation is also a bit obscured since I used some tricks to avoid adding if-statements to the main loop (and therefore unnecessary overhead). Here, I pad the grid with an extra row and column (on the top and left sides) so that we automatically get zeroes for paths that try to come from outside of the grid. Implementation:

#!/usr/bin/env python3

from sys import stdin

MOD = int(1e9 + 7)

n = int(stdin.readline())
grid = ["*"*(n + 1), *("*" + s.strip() for s in stdin.readlines())]

tab = [[0]*(n + 1) for _ in range(n + 1)]
tab[0][1] = 1  # intentionally initialized the cell adjacent to the starting cell

for i in range(1, n + 1):
    for j in range(1, n + 1):
        if grid[i][j] == '.':
            tab[i][j] = (tab[i - 1][j] + tab[i][j - 1]) % MOD

print(tab[-1][-1])

Book Shop [Spec] (2024-04-03)

This is the classic 0-1 knapsack problem, which I’ve already done a very detailed writeup on (here), so unfortunately, I’ve already been very thoroughly spoiled on this problem.

The subproblem has two dimensions: “the current book”, and current max cost of currently selected books (from 00 to xx). At each “current book” (and corresponding current max cost), we find out what the best possible number of pages is whether or not we purchase the current book. It runs in O(nx)O(nx):

{(c1,p1),,(cn,pn)}=costs and number of pages \braces{\parens{c_1, p_1}, \dots, \parens{c_n, p_n}} = \text{costs and number of pages}

F[i,x]={x<00(x0)(i=0)max(F[i1,xci]+pi,F[i1,x])otherwise F\brackets{i, x} = \begin{cases} {\displaystyle \vphantom{\sum} -\infty } \quad& x < 0 \\ {\displaystyle \vphantom{\sum} 0 } \quad& \parens{x \ge 0} \land \parens{i = 0} \\ {\displaystyle \max\parens{ \vphantom{\sum} F\brackets{i - 1, x - c_i} + p_i, F\brackets{i - 1, x} } } \quad& \text{otherwise} \end{cases}

My first top-down Python implementation TLE’d as I expected:

Implementing bottom-up Python with a full O(nx)O(nx)-sized table caused many RUNTIME ERROR results, but they otherwise gave me correct answers on my computer (albeit very slowly):

My more optimized bottom-up Python implementation that instead uses a O(x)O(x)-sized table TLE’d, even on PyPy3:

Reimplementing this optimized bottom-up solution in C++ passes all tests with a worst runtime of only 0.15s:

#include <iostream>
#include <vector>

int main() {
    long n, maxSum;
    std::cin >> n >> maxSum;

    std::vector<long> costs {0};  // padded with an extra dummy element so we can 1-index
    for (long i = 0; i < n; ++i) {
        long tmp;
        std::cin >> tmp;
        costs.push_back(tmp);
    }

    std::vector<long> pages {0};  // padded with an extra dummy element so we can 1-index
    for (long i = 0; i < n; ++i) {
        long tmp;
        std::cin >> tmp;
        pages.push_back(tmp);
    }

    std::vector<long> realTab0(maxSum + 1, 0);
    std::vector<long> realTab1(maxSum + 1, 0);
    std::vector<long> *tab0 = &realTab0;
    std::vector<long> *tab1 = &realTab1;
    for (long i = 1; i <= n; ++i) {
        for (long curMaxSum = 1; curMaxSum <= maxSum; ++curMaxSum) {
            if (curMaxSum < costs[i]) {
                (*tab1)[curMaxSum] = (*tab0)[curMaxSum];
            } else {
                (*tab1)[curMaxSum] = std::max(
                    (*tab0)[curMaxSum - costs[i]] + pages[i],
                    (*tab0)[curMaxSum]
                );
            }
        }
        std::swap(tab0, tab1);
    }

    std::cout << (*tab0)[maxSum] << std::endl;
    return 0;
}

Array Description [Spec] (2024-04-03)

The first subproblem that comes to mind has two dimensions: current position in the array description, and candidate value for the position. At each position, we simply count the number of possible “paths” coming from the start of the array. The runtime is O(nm)O(nm).

What I mean by “path” is, let’s pretend that the beginning of the array (index 0) is known to be the value 5. From index 0, we might be able to “walk” to three different positions in index 1: {index 1, value 4}, {index 1, value 5}, and {index 1, value 6}. We say that there are valid “paths” from the beginning of the array to these three positions. These paths may extend all the way through the array, constrained by the known array values.

TODO: I should illustrate what I mean by a “path”.

The subproblem written out:

m=upper bound for array values{a0,,an1}=the array description\begin{gather*} m = \text{upper bound for array values} \\ \braces{a_0, \dots, a_{n - 1}} = \text{the array description} \end{gather*}

F[i,v]={AA1(0<vm)(a0{0,v})(i=0)F[i1,v]+F[i1,v1]+F[i1,v+1](0<vm)(a0{0,v})(i>0)AA0otherwise F\brackets{i, v} = \begin{cases} \displaystyle \vphantom{\sum_{A}^A} \begin{array}{l} 1 \end{array} \quad& \parens{0 < v \le m} \land \parens{a_0 \in \braces{0, v}} \land \parens{i = 0} \\ \displaystyle \begin{array}{l} F\brackets{i - 1, v} \\ \quad + F\brackets{i - 1, v - 1} \\ \quad + F\brackets{i - 1, v + 1} \end{array} \quad& \parens{0 < v \le m} \land \parens{a_0 \in \braces{0, v}} \land \parens{i > 0} \\ \displaystyle \vphantom{\sum_{A}^A} \begin{array}{l} 0 \end{array} \quad& \text{otherwise} \end{cases}

solution=v=1mF[n1,v] \text{solution} = \sum_{v = 1}^m F\brackets{n - 1, v}

A top-down implementation in Python TLE’d, but otherwise gave no wrong answers:

Changing to a bottom-up with a full O(nm)O(nm) sized table passed all tests with a PyPy3 worst-case runtime of 0.33s:

#!/usr/bin/env python3

from sys import stdin

MOD = int(1e9 + 7)

(_, max_v) = [int(s) for s in stdin.readline().strip().split()]
desc = [int(s) for s in stdin.readline().strip().split()]

tab = [[0]*(max_v + 2) for _ in range(len(desc))]

if desc[0] == 0:
    tab[0] = [0, *(1 for _ in range(max_v)), 0]
else:
    tab[0][desc[0]] = 1

for i, known_v in enumerate(desc[1:], start=1):
    it = range(1, max_v + 1) if (known_v == 0) else (known_v,)
    for v in it:
        tab[i][v] = (tab[i - 1][v] + tab[i - 1][v - 1] + tab[i - 1][v + 1]) % MOD

print(sum(tab[-1]) % MOD)

Improving on the bottom-up solution by switching to an O(m)O(m) table also passed, but now with a PyPy3 worst-case runtime of 0.13s:

Counting Towers [Spec] (2024-04-04)

This one’s a hard one! But also a fun one! Let’s start by enumerating some simple examples to get a feel for how this works:

n=1  --> solution=2

AB  AA
n=2  --> solution=8

CD       AB  BA  AB
AB       AC  CA  AB

AA       AA  BC  AA
AA       BC  AA  BB

The first idea that comes to mind is to try a subproblem with two dimensions: the xx and yy coordinates of the top-right of the tower, where the bottom-left corner is at coordinates (0,0)\parens{0, 0}. Let’s call this subproblem F[x,y]F\brackets{x, y}. xx is the column index while yy is the row index. However, it doesn’t work well:

The next idea that comes to mind is also two dimensions, with one dimension for height of the left side, and one for the right side. It naively sounds like it should perform poorly at O(n2)O(n^2) where n106n \le {10}^6, but maybe we can apply the same trick as Coin Combinations II where we use an accumulator?

Let’s start by visualizing what’s going on then formalizing the subproblem:

Case 1: Both sides same level
        We arbitrarily pick the top-right block to be fixed.

   F[3,3] = F[3,2] + F[3,1] + F[3,0] + F[2,2] + F[1,1] + F[0,0]

     ??       ?        ?        ?
     ??   =   ??   +   ?    +   ?    +   ??   +        +
     ??       ??       ??       ?        ??       ??

Case 2: Left side is taller
        We pick the tallest block to be fixed.

   F[3,2] = F[2,2] + F[1,2] + F[0,2]

     ?
     ??   =   ??   +    ?   +    ?
     ??       ??       ??        ?

Case 3: Right side is taller
        We just mirror it.

   F[2,3] = F[3,2]

      ?       ?
     ??   =   ??
     ??       ??

Writing out the subproblem:

F[a,b]={AA1(a=b=0)AAx=0b1(F[a,x]+F[x,x])(a=b0)AAF[b,a](a<b0)AAx=0a1F[x,b](b<a0) F\brackets{a, b} = \begin{cases} \displaystyle \vphantom{\sum_{A}^A} \begin{array}{l} 1 \end{array} \quad& \parens{a = b = 0} \\ \displaystyle \vphantom{\sum_{A}^A} \begin{array}{l} \displaystyle \sum_{x = 0}^{b - 1}\parens{\vphantom{\sum} F\brackets{a, x} + F\brackets{x, x}} \end{array} \quad& \parens{a = b \ne 0} \\ \displaystyle \vphantom{\sum_{A}^A} \begin{array}{l} F\brackets{b, a} \end{array} \quad& \parens{a < b \ne 0} \\ \displaystyle \vphantom{\sum_{A}^A} \begin{array}{l} \displaystyle \sum_{x = 0}^{a - 1}{F\brackets{x, b}} \end{array} \quad& \parens{b < a \ne 0} \end{cases}

I made a naive top-down Python implementation and ran it against the example and it passed! However, it was really, really, terribly slow. Implementation:

In an attempt to optimize, I tried expanding some calculations to get a feel for what we might build accumulators for. However, I found nothing. Here’s what I tried:

F[1,1]=F[1,0]+F[0,0]\begin{align*} F\brackets{1, 1} &= F\brackets{1, 0} + F\brackets{0, 0} \end{align*}

F[2,2]=2F[1,1]+2F[0,0]+2F[1,0]\begin{align*} F\brackets{2, 2} &= 2 F\brackets{1, 1} + 2 F\brackets{0, 0} + 2 F\brackets{1, 0} \end{align*}

F[3,3]=2F[2,2]+2F[2,0]+4F[1,1]+4F[1,0]+2F[0,0]\begin{align*} F\brackets{3, 3} &= 2 F\brackets{2, 2} + 2 F\brackets{2, 0} \\ &\qquad + 4 F\brackets{1, 1} + 4 F\brackets{1, 0} \\ &\qquad + 2 F\brackets{0, 0} \end{align*}

F[4,4]=2F[3,3]+2F[3,0]+4F[2,2]+4F[2,0]+10F[1,1]+10F[1,0]+2F[0,0]\begin{align*} F\brackets{4, 4} &= 2 F\brackets{3, 3} + 2 F\brackets{3, 0} \\ &\qquad + 4 F\brackets{2, 2} + 4 F\brackets{2, 0} \\ &\qquad + 10 F\brackets{1, 1} + 10 F\brackets{1, 0} \\ &\qquad + 2 F\brackets{0, 0} \end{align*}

I also tried expanding out the F[x,0]F\brackets{x, 0}’s:

F[2,2]=2F[1,1]+4F[0,0]F[3,3]=2F[2,2]+4F[1,1]+10F[0,0]F[4,4]=2F[3,3]+4F[2,2]+10F[1,1]+28F[0,0]\begin{align*} F\brackets{2, 2} &= 2 F\brackets{1, 1} + 4 F\brackets{0, 0} \\ F\brackets{3, 3} &= 2 F\brackets{2, 2} + 4 F\brackets{1, 1} + 10 F\brackets{0, 0} \\ F\brackets{4, 4} &= 2 F\brackets{3, 3} + 4 F\brackets{2, 2} + 10 F\brackets{1, 1} + 28 F\brackets{0, 0} \end{align*}

I also tried expanding the F[x,x]F\brackets{x, x}’s:

F[2,2]=4F[0,0]+4F[1,0]F[3,3]=2F[2,0]+16F[1,0]+14F[0,0]F[4,4]=2F[3,0]+36F[2,0]+68F[1,0]+28F[0,0]\begin{align*} F\brackets{2, 2} &= 4 F\brackets{0, 0} + 4 F\brackets{1, 0} \\ F\brackets{3, 3} &= 2 F\brackets{2, 0} + 16 F\brackets{1, 0} + 14 F\brackets{0, 0} \\ F\brackets{4, 4} &= 2 F\brackets{3, 0} + 36 F\brackets{2, 0} + 68 F\brackets{1, 0} + 28 F\brackets{0, 0} \end{align*}

So let’s try a different approach. I believe that the following sums should be easy to build accumulators for:

x=0nF[x,x]x=0nF[x,0] \sum_{x = 0}^n F\brackets{x, x} \qquad\qquad \sum_{x = 0}^n F\brackets{x, 0}

But this sum is giving me headaches right now:

S[b]x=0b1F[x,b]for b>0 S\brackets{b} \coloneqq \sum_{x = 0}^{b - 1} F\brackets{x, b} \quad\text{for } b > 0

That sum is an important part of the second line of the subproblem. So is it possible to build an accumulator for it?

S[b]=x=0b1F[b,x]=x=0b1y=0b1F[y,x]\begin{align*} S\brackets{b} &= \sum_{x = 0}^{b - 1} F\brackets{b, x} = \sum_{x = 0}^{b - 1} \sum_{y = 0}^{b - 1} F\brackets{y, x} \end{align*}

Now, let’s visualize what’s happening. For example, S[4]S\brackets{4} is the sum of all of these:

F[0,3]F[1,3]F[2,3]F[3,3]F[0,2]F[1,2]F[2,2]F[3,2]F[0,1]F[1,1]F[2,1]F[3,1]F[0,0]F[1,0]F[2,0]F[3,0] \begin{array}{cccc} F\brackets{0, 3} & F\brackets{1, 3} & F\brackets{2, 3} & \xmemphR{F\brackets{3, 3}} \\ F\brackets{0, 2} & F\brackets{1, 2} & \xmemphR{F\brackets{2, 2}} & F\brackets{3, 2} \\ F\brackets{0, 1} & \xmemphR{F\brackets{1, 1}} & F\brackets{2, 1} & F\brackets{3, 1} \\ \xmemphR{F\brackets{0, 0}} & F\brackets{1, 0} & F\brackets{2, 0} & F\brackets{3, 0} \end{array}

Using the third case of the subproblem, we can reduce this to:

F[3,3]F[2,2]2F[3,2]F[1,1]2F[2,1]2F[3,1]F[0,0]2F[1,0]2F[2,0]2F[3,0] \begin{array}{cccc} {} & {} & {} & \xmemphR{F\brackets{3, 3}} \\ {} & {} & \xmemphR{F\brackets{2, 2}} & 2 F\brackets{3, 2} \\ {} & \xmemphR{F\brackets{1, 1}} & 2 F\brackets{2, 1} & 2 F\brackets{3, 1} \\ \xmemphR{F\brackets{0, 0}} & 2 F\brackets{1, 0} & 2 F\brackets{2, 0} & 2 F\brackets{3, 0} \end{array}

Based on this pattern, S[b]S\brackets{b} is reducible to:

S[b]=x=0b1F[x,x]+2x=1b1S[x]for b>0 S\brackets{b} = \sum_{x = 0}^{b - 1} F\brackets{x, x} + 2 \sum_{x = 1}^{b - 1} S\brackets{x} \qquad\text{for } b > 0

Therefore, we can calculate S[b]S\brackets{b} using accumulators!

Let’s now construct an alternative subproblem derived from this, where F[n]F'\brackets{n} is the solution for the number of combinations to construct an nn-sized tower:

F[n]F[n,n]=x=0n1(F[n,x]+F[x,x])=x=0n1F[n,x]+x=0n1F[x,x]=S[n]+x=0n1F[x]for n>0\begin{align*} F'\brackets{n} &\coloneqq F\brackets{n, n} = \sum_{x=0}^{n - 1}\parens{F\brackets{n, x} + F\brackets{x, x}} \\ &= \sum_{x=0}^{n - 1}F\brackets{n, x} + \sum_{x=0}^{n - 1}F\brackets{x, x} \\ &= S\brackets{n} + \sum_{x=0}^{n - 1}F'\brackets{x} \qquad\text{for } n > 0 \end{align*}

F[0]=1 F'\brackets{0} = 1

S[n]=x=0n1F[x]+2x=1n1S[x]for n>0 S\brackets{n} = \sum_{x = 0}^{n - 1} F'\brackets{x} + 2 \sum_{x = 1}^{n - 1} S\brackets{x} \qquad\text{for } n > 0

Attempting to simplify seems… too complicated for me to bother right now.

My Python top-down implementation passed the first two test cases containing queries for n100n \le 100, but we get RUNTIME ERROR for the other much bigger test cases, so my code probably goes over the memory usage limit of the auto-judge. Implementation:

Reimplementing as bottom-up passed all tests:

#!/usr/bin/env python3

from sys import stdin

MOD = int(1e9 + 7)

stdin.readline()
tests = [int(s) for s in stdin.readlines()]

tab = [(1, 1, None, None), (2, 3, 1, 1)]  # [(f_val, f_cum, s_val, s_cum), ...]
for _ in range(max(tests) - 1):
    (f_val, f_cum, s_val, s_cum) = tab[-1]
    s_val_new = (f_cum + (2 * s_cum)) % MOD
    f_val_new = (s_val_new + f_cum) % MOD
    tab.append((
        f_val_new,
        (f_val_new + f_cum) % MOD,
        s_val_new,
        (s_val_new + s_cum) % MOD,
    ))

for height in tests:
    print(tab[height][0])

Edit Distance [Spec] (2024-04-08)

Thinking through some edge cases, it was tempting to conclude that you’d never need to remove characters from the shorter word, but I believe this would break that assumption:

Test case:
   xxABCDEFGH
   ABCDEFGHxxx

The actual edit distance is 5:
   xxABCDEFGH   (0)
   xABCDEFGH    (1)
   ABCDEFGH     (2)
   ABCDEFGHx    (3)
   ABCDEFGHxx   (4)
   ABCDEFGHxxx  (5)

If we disallowed removing characters:
   xxABCDEFGH   (0)
   AxABCDEFGH   (1)
   ABABCDEFGH   (2)
   ...
   ABCDEFGHGH   (8)
   ABCDEFGHxH   (9)
   ABCDEFGHxx   (10)
   ABCDEFGHxxx  (11)

The first subproblem that comes to mind is two-dimensional, each dimension is an index into each string. Subproblem F[i,j]F\brackets{i, j} is the edit distance of substrings string1[0..i] and string2[0..j] (i and j being inclusive).

I jumped straight into coding a solution first. Here’s my Python top-down solution that passes many test cases but only fails due to RUNTIME ERROR (almost certainly due to memory usage):

Writing out the subproblem:

s1=string 1s2=string 2 s_1 = \text{string 1} \qquad\qquad s_2 = \text{string 2}

F[i,j]={AA0(i<0)(j<0)AAj+1(i<0)(j0)AAi+1(i0)(j<0)AAF[i1,j1](i0)(j0)(s1[i]=s2[j])AA1+min(F[i1,j1],F[i1,j],F[i,j1])otherwise F\brackets{i, j} = \begin{cases} \displaystyle \vphantom{\sum_{A}^A} \begin{array}{l} 0 \end{array} \quad& \parens{i < 0} \land \parens{j < 0} \\ \displaystyle \vphantom{\sum_{A}^A} \begin{array}{l} j + 1 \end{array} \quad& \parens{i < 0} \land \parens{j \ge 0} \\ \displaystyle \vphantom{\sum_{A}^A} \begin{array}{l} i + 1 \end{array} \quad& \parens{i \ge 0} \land \parens{j < 0} \\ \displaystyle \vphantom{\sum_{A}^A} \begin{array}{l} F\brackets{i - 1, j - 1} \end{array} \quad& \parens{i \ge 0} \land \parens{j \ge 0} \land \parens{s_1\brackets{i} = s_2\brackets{j}} \\ \displaystyle \vphantom{\sum_{A}^A} \begin{array}{l} \displaystyle 1 + \min(F\brackets{i - 1, j - 1}, \\ \qquad F\brackets{i - 1, j}, F\brackets{i, j - 1}) \end{array} \quad& \text{otherwise} \end{cases}

The first case is simply the case of two empty substrings. Naturally, empty substrings are already equivalent.

The second and third cases are the case where you compare an empty substring with a non-empty substring. Naturally, the solution is just the length of the non-empty substring.

My top-down Python solution simplifies these first three cases down with essentially:

assert (i >= -1) and (j >= -1)
if (i < 0) or (j < 0):
    return i + j + 2

The fourth case is the case where the two indices point to an equivalent character. This means no change needs to be made at that character.

The final case is where the two indices don’t point to an equivalent character. There’s a lot going on in that one line, so to see how this works, let’s consider this example:

              v i=4
string 1: ...AB...
string 2: ..XY.....
             ^ j=3

The dots represent characters that we don’t care about. Index i points to that B character in string 1, while index j points to that Y character in string 2.

To make it easier to understand, let’s write this situation down in two different but conceptually identical ways:

EditDistance("...AB","..XY") \text{EditDistance}\parens{\texttt{"...AB"}, \texttt{"..XY"}}

F[4,3] F\brackets{4, 3}

Recall that the problem specification allows these operations:

  1. Add one character to the string.
  2. Remove one character from the string.
  3. Replace one character in the string.

Let’s start with operation 3 because it’s the simplest. Let’s suppose we change the B in string 1 to Y:

EditDistance31("...AB","..XY")=1+EditDistance("...AY","..XY")=1+EditDistance("...A","..X")\begin{align*} \text{EditDistance31}\parens{\texttt{"...AB"}, \texttt{"..XY"}} &= 1 + \text{EditDistance}\parens{\texttt{"...A\xmemphRC{Y}"}, \texttt{"..XY"}} \\ &= 1 + \text{EditDistance}\parens{\texttt{"...A"}, \texttt{"..X"}} \end{align*}

F31[4,3]=1+F[3,2] F_{31}\brackets{4, 3} = 1 + F\brackets{3, 2}

The math works out identically even if we instead changed the Y in string 2 to B:

EditDistance32("...AB","..XY")=1+EditDistance("...AB","..XB")=1+EditDistance("...A","..X")\begin{align*} \text{EditDistance32}\parens{\texttt{"...AB"}, \texttt{"..XY"}} &= 1 + \text{EditDistance}\parens{\texttt{"...AB"}, \texttt{"..X\xmemphRC{B}"}} \\ &= 1 + \text{EditDistance}\parens{\texttt{"...A"}, \texttt{"..X"}} \end{align*}

F32[4,3]=1+F[3,2] F_{32}\brackets{4, 3} = 1 + F\brackets{3, 2}

Now, let’s try operation 2. If we remove B from string 1:

EditDistance21("...AB","..XY")=1+EditDistance("...A","..XY") \text{EditDistance21}\parens{\texttt{"...AB"}, \texttt{"..XY"}} = 1 + \text{EditDistance}\parens{\texttt{"...A"}, \texttt{"..XY"}}

F21[4,3]=1+F[3,3] F_{21}\brackets{4, 3} = 1 + F\brackets{3, 3}

Or if we remove Y from string 2 instead:

EditDistance22("...AB","..XY")=1+EditDistance("...AB","..X") \text{EditDistance22}\parens{\texttt{"...AB"}, \texttt{"..XY"}} = 1 + \text{EditDistance}\parens{\texttt{"...AB"}, \texttt{"..X"}}

F22[4,3]=1+F[4,2] F_{22}\brackets{4, 3} = 1 + F\brackets{4, 2}

And finally, let’s try operation 1. If we insert Y to string 1:

EditDistance11("...AB","..XY")=1+EditDistance("...ABY","..XY")=1+EditDistance("...AB","..X")\begin{align*} \text{EditDistance11}\parens{\texttt{"...AB"}, \texttt{"..XY"}} &= 1 + \text{EditDistance}\parens{\texttt{"...AB\xmemphRC{Y}"}, \texttt{"..XY"}} \\ &= 1 + \text{EditDistance}\parens{\texttt{"...AB"}, \texttt{"..X"}} \end{align*}

F11[4,3]=1+F[4,2] F_{11}\brackets{4, 3} = 1 + F\brackets{4, 2}

And if we instead insert B into string 2:

EditDistance12("...AB","..XY")=1+EditDistance("...AB","..XYB")=1+EditDistance("...A","..XY")\begin{align*} \text{EditDistance12}\parens{\texttt{"...AB"}, \texttt{"..XY"}} &= 1 + \text{EditDistance}\parens{\texttt{"...AB"}, \texttt{"..XY\xmemphRC{B}"}} \\ &= 1 + \text{EditDistance}\parens{\texttt{"...A"}, \texttt{"..XY"}} \end{align*}

F12[4,3]=1+F[3,3] F_{12}\brackets{4, 3} = 1 + F\brackets{3, 3}

Having gone through all six possible operations, it should be fairly straightforward to see how we get 1+min(F[i1,j1],F[i1,j],F[i,j1])1 + \min\parens{F\brackets{i - 1, j - 1}, F\brackets{i - 1, j}, F\brackets{i, j - 1}} as the final case of the subproblem.

I think it’s also useful to point out that the equivalence of F21=F12F_{21} = F_{12} and F22=F11F_{22} = F_{11} can also be examined more intuitively.

Let’s consider these two strings:

AXXX
XXX

To make them equivalent, you can either delete the A from string 1:

XXX
XXX

or, you can add an A to string 2:

AXXX
AXXX

Either way, we incurred a cost of 11 operation.

My Python bottom-up implementation passes all tests:

#!/usr/bin/env python3

(s1, s2) = (input(), input())

tab = [[i]*(len(s2) + 1) for i in range(1, len(s1) + 1)]
# extra row and column, useful for negative indexing
tab.append(list(range(1, len(s2) + 2)))
tab[-1][-1] = 0

for i in range(len(s1)):
    for j in range(len(s2)):
        if s1[i] == s2[j]:
            tab[i][j] = tab[i - 1][j - 1]
        else:
            tab[i][j] = 1 + min(tab[i - 1][j - 1], tab[i - 1][j], tab[i][j - 1])

print(tab[-2][-2])