Dynamic Programming - CSES Problem Set Solutions

I solved all problems on this page without any hints/spoilers (not even reading the CSES recommended book).

Visit https://cses.fi/problemset/ for the full problem set.

I don't provide the full problem specifications on this page due to possible copyright issues. Links to the original problem specs are provided below along with the date accessed, which should allow you to use Internet Archive if the original URL hosting a problem specification ever meaningfully changes.

Dice Combinations [Spec] (2024-03-13)

The subproblem is simply a Fibonacci-esque recursive function:

\begin{gather*} F_{<0} = 0 \qquad F_0 = 1 \\ F_n = F_{n-1} + F_{n-2} + F_{n-3} + F_{n-4} + F_{n-5} + F_{n-6} \quad\text{for}\quad n > 0 \end{gather*}

With DP, this runs in $O(n)$ time. Considering how the input bounds set $n \le {10}^6$ , a test case can in theory run a hotspot $6 \times {10}^6$ times.

We choose $F_{n-1}$ through to $F_{n-6}$ since we know each dice roll has to be integers 1-6.

My first attempt was a top-down implementation, though it comes to no surprise that it causes runtime errors (presumably stack overflow):

#!/usr/bin/env python3 from functools import cache MOD = int(1e9 + 7) n = int(input()) @cache def dp(v): if v < 0: return 0 elif v < 2: return 1 return sum(dp(v - i) for i in range(1, 7)) % MOD print(dp(n))

Setting a higher setrecursionlimit fixed the runtime errors, but instead TLE’s. Not too surprising. Implementation:

#!/usr/bin/env python3 from sys import setrecursionlimit from functools import cache MOD = int(1e9 + 7) setrecursionlimit(int(2e6)) n = int(input()) @cache def dp(v): if v < 0: return 0 elif v < 2: return 1 return sum(dp(v - i) for i in range(1, 7)) % MOD print(dp(n))

Switching to a simple top-down approach passed all tests (worst-case runtime in CPython and PyPy3 was 1.00s and 0.22s, respectively):

#!/usr/bin/env python3

MOD = int(1e9 + 7)

n = int(input())

tab = [1]
for _ in range(n):
    window_size = min(len(tab), 6)
    tab.append(sum(tab[-window_size:]) % MOD)
print(tab[-1])

Switching to only storing a fixed-size window reduced the worst test case runtimes to 0.44s and 0.13s (CPython and PyPy3, respectively):

#!/usr/bin/env python3

MOD = int(1e9 + 7)

n = int(input())

tab = [0, 0, 0, 0, 0, 1]
for i in range(n):
    tab[i % 6] = sum(tab) % MOD
print(tab[(n - 1) % 6])

Making it so we don’t re-add all six elements of tab anymore turned out to perform worse:

#!/usr/bin/env python3

MOD = int(1e9 + 7)

n = int(input())

tab = [0, 0, 0, 0, 0, 1]
this_sum = 1
for i in range(n):
    i %= 6
    old = tab[i]
    tab[i] = this_sum
    this_sum = ((this_sum * 2) - old) % MOD
print(tab[(n - 1) % 6])

Minimizing Coins [Spec] (2024-03-31)

The first subproblem that comes to mind is, for every integer from $0$ to $x$ , we find the fewest number of coins possible. Runtime is $O(xn)$ :

C = \braces{c_0, \dots, c_{n-1}} = \text{set of all coin values}

\begin{gather*} F_{<0} = \infty \qquad F_0 = 0 \\ F_x = \min_{c \in C}\parens{ F_{x - c} } + 1 \quad\text{for}\quad x > 0 \end{gather*}

Considering how the input bounds set $n \le 100$ and $x \le {10}^6$ , a test case can in theory run a hotspot ${10}^8$ times. Assuming a similar style of implementation to my first Dice Combinations passed solution which had a worst runtime of 0.22s, it doesn’t sound like Minimizing Coins in Python should pass since we get a hotspot running at least 10x as many times.

For fun, we start with a Python top-down solution, as well as bottom-up, and I also tried an optimization that precomputes some trivial table entries. The logic appears correct, but they TLE as expected:

#!/usr/bin/env python3 from sys import stdin, setrecursionlimit from functools import cache from math import inf, isinf setrecursionlimit(int(2e6)) (_, target) = [int(s) for s in stdin.readline().strip().split()] coins = [int(s) for s in stdin.readline().strip().split()] @cache def dp(v): if v < 0: return inf elif v == 0: return 0 return 1 + min(dp(v - coin) for coin in coins) solution = dp(target) print(-1 if isinf(solution) else solution)

#!/usr/bin/env python3 from sys import stdin from math import inf, isinf (_, target) = [int(s) for s in stdin.readline().strip().split()] coins = [int(s) for s in stdin.readline().strip().split()] tab = [0]*(target + 1) for i in range(1, target + 1): tab[i] = 1 + min((tab[i - coin] for coin in coins if (coin <= i)), default=inf) print(-1 if isinf(tab[target]) else tab[target])

#!/usr/bin/env python3 from sys import stdin from math import inf, isinf (_, target) = [int(s) for s in stdin.readline().strip().split()] coins = sorted(int(s) for s in stdin.readline().strip().split()) tab = [inf]*(target + 1) tab[0] = 0 # We attempt to quickly precompute a bunch of trivial table entries if len(coins) > 1: base_val = 0 base_coins = 0 while base_val <= target: tab[base_val] = base_coins for coin in coins[:-1]: if base_val + coin > target: break tab[base_val + coin] = base_coins + 1 base_val += coins[-1] base_coins += 1 # And use the DP subproblem to fill in the gaps if isinf(tab[-1]): for i in range(1, target + 1): if isinf(tab[i]): tab[i] = 1 + min((tab[i - coin] for coin in coins if (coin <= i)), default=inf) print(-1 if isinf(tab[-1]) else tab[-1])

Implementing the bottom-up approach in C++ passed all tests (with a longest runtime of 0.15s):

#include <iostream>
#include <vector>
#include <limits>

constexpr long LONG_MAX = std::numeric_limits<long>::max();

int main() {
    long n, targetSum;
    std::cin >> n >> targetSum;

    std::vector<long> coins;
    for (long i = 0; i < n; ++i) {
        long tmp;
        std::cin >> tmp;
        coins.push_back(tmp);
    }

    std::vector<long> tab(targetSum + 1, LONG_MAX - 1);
    tab[0] = 0;

    for (long i = 1; i <= targetSum; ++i) {
        for (const auto coin : coins) {
            if (coin <= i) {
                tab[i] = std::min(tab[i], tab[i - coin]);
            }
        }
        ++tab[i];
    }

    std::cout << ((tab[targetSum] == LONG_MAX) ? -1 : tab[targetSum]) << std::endl;
    return 0;
}

Coin Combinations I [Spec] (2024-04-01)

Different orderings of the same set of coins count as different combinations, which makes subproblem selection easier. My chosen subproblem is to find the number of combinations for each possible target sum $0$ to $x$ . For $n$ coins, the runtime is $O(xn)$ :

C = \braces{c_0, \dots, c_{n-1}} = \text{set of all coin values}

\begin{gather*} F_{<0} = 0 \qquad F_0 = 1 \\ F_x = \sum_{c \in C}{ F_{x - c} } \quad\text{for}\quad x > 0 \end{gather*}

Input bounds are similar to Minimizing Coins, so a hotspot possibly runs ${10}^8$ times, and is similarly unlikely to pass using Python.

But let’s write a top-down Python implementation anyway! As expected, it TLE’s:

#!/usr/bin/env python3 from sys import stdin, setrecursionlimit from functools import cache setrecursionlimit(int(2e6)) MOD = int(1e9 + 7) (_, target_sum) = [int(s) for s in stdin.readline().strip().split()] coins = [int(s) for s in stdin.readline().strip().split()] @cache def dp(v): if v < 0: return 0 elif v == 0: return 1 return sum(dp(v - coin) for coin in coins) % MOD print(dp(target_sum))

Implementing bottom-up in C++ passes all tests:

#include <iostream>
#include <vector>

constexpr long MOD = 1e9 + 7;

int main() {
    long n, targetSum;
    std::cin >> n >> targetSum;

    std::vector<long> coins;
    for (long i = 0; i < n; ++i) {
        long tmp;
        std::cin >> tmp;
        coins.push_back(tmp);
    }

    std::vector<long> tab(targetSum + 1, 0);
    tab[0] = 1;

    for (long i = 1; i <= targetSum; ++i) {
        for (const auto coin : coins) {
            if (coin <= i) tab[i] = (tab[i] + tab[i - coin]) % MOD;
        }
    }

    std::cout << tab[targetSum] << std::endl;
    return 0;
}

Coin Combinations II [Spec] (2024-04-01)

If we used the strategy from Coin Combinations I, we run into the problem of needing to avoid double-counting combinations. To deal with this, maybe we need a two-dimensional table: one dimension for coin value, and another dimension for coin sum. The following subproblem comes to mind, which may naively run in $O(x^2 n)$ time:

C = \braces{c_1, \dots, c_n} = \text{set of all coin values}

\begin{gather*} F\brackets{0, x} = \begin{cases} 1 \quad& x = 0 \\ 0 \quad& \text{otherwise} \end{cases} \\ F\brackets{i, x} = \sum_{j = 0}^{\infty}{ F\brackets{i - 1, x - j c_i} } \quad\text{for}\quad i > 0 \end{gather*}

I say “naively” because there may be a way to cut that runtime to $O(xn)$ with a clever optimization. I’ll get to my idea for that later.

I am very certain $O(x^2 n)$ will TLE, but I made a top-down Python implementation anyway just for fun. It passes some tests, but TLE’s everything else. Good to know that we’re not getting any wrong answers! Implementation:

#!/usr/bin/env python3 from sys import stdin, setrecursionlimit from functools import cache setrecursionlimit(int(2e6)) MOD = int(1e9 + 7) (_, target_sum) = [int(s) for s in stdin.readline().strip().split()] coins = [int(s) for s in stdin.readline().strip().split()] @cache def dp(i, x): if i == -1: return 1 if (x == 0) else 0 elif x < 0: return 0 j_max = x // coins[i] return sum(dp(i - 1, x - (j * coins[i])) for j in range(j_max + 1)) % MOD print(dp(len(coins) - 1, target_sum))

Let’s have a look at my $O(xn)$ idea now. I’d use the same subproblem and implement it bottom-up, but I’d fill up the table with the help of an accumulator, which should let me avoid doing the whole $\sum_{j=0}^{\infty}$ summation in $O(x)$ time each time. Instead, the accumulator calculates this summation in $O(1)$ on average.

It’s probably easier to just see how this works in code.

My first implementation works perfectly fine on my computer, but causes weird RUNTIME ERRORs on some test cases when I submit to CSES. I have a screenshot included in the spoiler block below. My code provides the correct answers when I run them on my computer, and none of them TLE, so… I’m not sure what’s going on. Full implementation:

#include <iostream> #include <vector> constexpr long MOD = 1e9 + 7; int main() { long n, targetSum; std::cin >> n >> targetSum; std::vector<long> coins; for (long i = 0; i < n; ++i) { long tmp; std::cin >> tmp; coins.push_back(tmp); } std::vector<std::vector<long>> tab(coins.size() + 1, std::vector<long>(targetSum + 1, 0)); tab[0][0] = 1; for (unsigned long i = 1; i <= coins.size(); ++i) { const long coin = coins[i - 1]; for (long offset = 0; offset < coin; ++offset) { long acc = 0; for (long j = offset; j <= targetSum; j += coin) { acc = (acc + tab[i - 1][j]) % MOD; tab[i][j] = acc; } } } std::cout << tab[coins.size()][targetSum] << std::endl; return 0; }

(If anyone knows why I’m getting RUNTIME ERROR, I’d love to know!)

I improved on this solution by changing to a more space-efficient 1D vector, which did pass all test cases:

#include <iostream>
#include <vector>

constexpr long MOD = 1e9 + 7;

int main() {
    long n, targetSum;
    std::cin >> n >> targetSum;

    std::vector<long> coins;
    for (long i = 0; i < n; ++i) {
        long tmp;
        std::cin >> tmp;
        coins.push_back(tmp);
    }

    std::vector<long> tab(targetSum + 1, 0);
    tab[0] = 1;

    for (const auto coin : coins) {
        for (long offset = 0; offset < coin; ++offset) {
            long acc = 0;
            for (long i = offset; i <= targetSum; i += coin) {
                acc = (acc + tab[i]) % MOD;
                tab[i] = acc;
            }
        }
    }

    std::cout << tab[targetSum] << std::endl;
    return 0;
}

TODO: Were the RUNTIME ERRORs to do with the way I implemented the 2D table as 2D vectors?

Removing Digits [Spec] (2024-04-03)

The first subproblem that comes to mind finds the minimum number of steps for each number from $0$ to $n$ . It runs in $O(n \log_{10} n)$ since there are $O(n)$ entries in the 1D table, and at each entry, you check every decimal digit (which is $O(\log_{10} n)$ ). The subproblem written out:

n = \text{input number}

\begin{gather*} F_0 = 0 \\ F_n = \min_{d \in D\parens{n}}\parens{ F_{n - d} } + 1 \quad\text{for}\quad x > 0 \end{gather*}

\text{where } D\parens{n} = \text{all decimal digits of } n \text{ except } 0

For fun, I started with a Python top-down implementation, which TLE’s as expected, but also didn’t give any wrong answers:

#!/usr/bin/env python3 from sys import setrecursionlimit from functools import cache setrecursionlimit(int(2e6)) n = int(input()) @cache def dp(v): if v == 0: return 0 solution = n # arbitrary large number v2 = v while v2 > 0: digit = v2 % 10 if digit: solution = min(solution, dp(v - digit)) v2 //= 10 return 1 + solution print(dp(n))

My bottom-up implementation in Python passed all tests:

#!/usr/bin/env python3

n = int(input())

tab = [0]*(n + 1)
for v in range(1, n + 1):
    solution = n  # arbitrary large number
    v2 = v
    while v2 > 0:
        digit = v2 % 10
        if digit:
            solution = min(solution, tab[v - digit])
        v2 //= 10
    tab[v] = 1 + solution

print(tab[-1])

It also passed all tests when I used string conversion to get the digits:

#!/usr/bin/env python3

n = int(input())

tab = [0]*(n + 1)
for v in range(1, n + 1):
    tab[v] = 1 + min(tab[v - int(c)] for c in str(v) if (c != '0'))

print(tab[-1])

Grid Paths [Spec] (2024-04-03)

The fact that paths can only move in one direction makes this problem so much easier since you don’t have to worry about paths backtracking. The first subproblem that comes to mind is to, at each cell in the grid, you add up how many paths can come from above with how many paths can come from the left. It runs in $O(n^2)$ (or more usefully, $O(N)$ for $N$ total cells in the grid):

\begin{gather*} G\brackets{i, j} \in \braces{\texttt{EMPTY}, \texttt{TRAP}} \end{gather*}

\begin{gather*} F\brackets{i, j} = \begin{cases} 0 \quad& \parens{i < 0} \lor \parens{j < 0} \lor \parens{G\brackets{i, j} = \texttt{TRAP}} \\ F\brackets{i - 1, j} + F\brackets{i, j - 1} \quad& \text{otherwise} \end{cases} \end{gather*}

Surprisingly, a Python top-down implementation using @functools.cache (and running on CPython3) almost passed, but TLE’d on only 2 out of the 15 test cases:

#!/usr/bin/env python3 from sys import stdin, setrecursionlimit from functools import cache setrecursionlimit(int(2e6)) MOD = int(1e9 + 7) n = int(stdin.readline()) grid = [s.strip() for s in stdin.readlines()] @cache def dp(i, j): if (i < 0) or (j < 0) or grid[i][j] == '*': return 0 elif (i == 0) and (j == 0): return 1 return (dp(i - 1, j) + dp(i, j - 1)) % MOD print(dp(n - 1, n - 1))

It took 0.99s for one of the passed $1000 \times 1000$ test cases, which indicates to me that not only should Python be feasible, but Python top-down is likely to be feasible as well if I can lower the overhead. And indeed it did pass, although the code isn’t quite that nice since I’m implementing my own memoization:

#!/usr/bin/env python3

from sys import stdin, setrecursionlimit

setrecursionlimit(int(2e6))

MOD = int(1e9 + 7)

n = int(stdin.readline())
grid = [s.strip() for s in stdin.readlines()]

memo = [[None]*n for _ in range(n)]
memo[0][0] = 1

def dp(i, j):
    if (i < 0) or (j < 0) or grid[i][j] == '*':
        return 0
    elif memo[i][j] is None:
        memo[i][j] = (dp(i - 1, j) + dp(i, j - 1)) % MOD
    return memo[i][j]

print(dp(n - 1, n - 1))

My bottom-up implementation is also a bit obscured since I used some tricks to avoid adding if-statements to the main loop (and therefore unnecessary overhead). Here, I pad the grid with an extra row and column (on the top and left sides) so that we automatically get zeroes for paths that try to come from outside of the grid. Implementation:

#!/usr/bin/env python3

from sys import stdin

MOD = int(1e9 + 7)

n = int(stdin.readline())
grid = ["*"*(n + 1), *("*" + s.strip() for s in stdin.readlines())]

tab = [[0]*(n + 1) for _ in range(n + 1)]
tab[0][1] = 1  # intentionally initialized the cell adjacent to the starting cell

for i in range(1, n + 1):
    for j in range(1, n + 1):
        if grid[i][j] == '.':
            tab[i][j] = (tab[i - 1][j] + tab[i][j - 1]) % MOD

print(tab[-1][-1])

Book Shop [Spec] (2024-04-03)

This is the classic 0-1 knapsack problem, which I’ve already done a very detailed writeup on (here), so unfortunately, I’ve already been very thoroughly spoiled on this problem.

The subproblem has two dimensions: “the current book”, and current max cost of currently selected books (from $0$ to $x$ ). At each “current book” (and corresponding current max cost), we find out what the best possible number of pages is whether or not we purchase the current book. It runs in $O(nx)$ :

\braces{\parens{c_1, p_1}, \dots, \parens{c_n, p_n}} = \text{costs and number of pages}

F\brackets{i, x} = \begin{cases} {\displaystyle \vphantom{\sum} -\infty } \quad& x < 0 \\ {\displaystyle \vphantom{\sum} 0 } \quad& \parens{x \ge 0} \land \parens{i = 0} \\ {\displaystyle \max\parens{ \vphantom{\sum} F\brackets{i - 1, x - c_i} + p_i, F\brackets{i - 1, x} } } \quad& \text{otherwise} \end{cases}

My first top-down Python implementation TLE’d as I expected:

#!/usr/bin/env python3 from sys import stdin, setrecursionlimit from functools import cache setrecursionlimit(int(2e6)) (_, max_sum) = [int(s) for s in stdin.readline().strip().split()] costs = [int(s) for s in stdin.readline().strip().split()] pages = [int(s) for s in stdin.readline().strip().split()] @cache def dp(i, cur_max_sum): if cur_max_sum < 0: return -1e6 # arbitrary small number elif i == -1: return 0 return max( dp(i - 1, cur_max_sum - costs[i]) + pages[i], dp(i - 1, cur_max_sum), ) print(dp(len(costs) - 1, max_sum))

Implementing bottom-up Python with a full $O(nx)$ -sized table caused many RUNTIME ERROR results, but they otherwise gave me correct answers on my computer (albeit very slowly):

#!/usr/bin/env python3 from sys import stdin (_, max_sum) = [int(s) for s in stdin.readline().strip().split()] costs = [None, *(int(s) for s in stdin.readline().strip().split())] pages = [None, *(int(s) for s in stdin.readline().strip().split())] tab = [[0]*(max_sum + 1) for _ in range(len(costs))] for i in range(1, len(costs)): for cur_max_sum in range(max_sum + 1): if cur_max_sum < costs[i]: tab[i][cur_max_sum] = tab[i - 1][cur_max_sum] else: tab[i][cur_max_sum] = max( tab[i - 1][cur_max_sum - costs[i]] + pages[i], tab[i - 1][cur_max_sum], ) print(tab[-1][-1])

My more optimized bottom-up Python implementation that instead uses a $O(x)$ -sized table TLE’d, even on PyPy3:

#!/usr/bin/env python3 from sys import stdin (_, max_sum) = [int(s) for s in stdin.readline().strip().split()] costs = [None, *(int(s) for s in stdin.readline().strip().split())] pages = [None, *(int(s) for s in stdin.readline().strip().split())] tab0 = [0]*(max_sum + 1) tab1 = [0]*(max_sum + 1) for i in range(1, len(costs)): for cur_max_sum in range(max_sum + 1): if cur_max_sum < costs[i]: tab1[cur_max_sum] = tab0[cur_max_sum] else: tab1[cur_max_sum] = max( tab0[cur_max_sum - costs[i]] + pages[i], tab0[cur_max_sum], ) (tab0, tab1) = (tab1, tab0) print(tab0[-1])

Reimplementing this optimized bottom-up solution in C++ passes all tests with a worst runtime of only 0.15s:

#include <iostream>
#include <vector>

int main() {
    long n, maxSum;
    std::cin >> n >> maxSum;

    std::vector<long> costs {0};  // padded with an extra dummy element so we can 1-index
    for (long i = 0; i < n; ++i) {
        long tmp;
        std::cin >> tmp;
        costs.push_back(tmp);
    }

    std::vector<long> pages {0};  // padded with an extra dummy element so we can 1-index
    for (long i = 0; i < n; ++i) {
        long tmp;
        std::cin >> tmp;
        pages.push_back(tmp);
    }

    std::vector<long> realTab0(maxSum + 1, 0);
    std::vector<long> realTab1(maxSum + 1, 0);
    std::vector<long> *tab0 = &realTab0;
    std::vector<long> *tab1 = &realTab1;
    for (long i = 1; i <= n; ++i) {
        for (long curMaxSum = 1; curMaxSum <= maxSum; ++curMaxSum) {
            if (curMaxSum < costs[i]) {
                (*tab1)[curMaxSum] = (*tab0)[curMaxSum];
            } else {
                (*tab1)[curMaxSum] = std::max(
                    (*tab0)[curMaxSum - costs[i]] + pages[i],
                    (*tab0)[curMaxSum]
                );
            }
        }
        std::swap(tab0, tab1);
    }

    std::cout << (*tab0)[maxSum] << std::endl;
    return 0;
}

Array Description [Spec] (2024-04-03)

The first subproblem that comes to mind has two dimensions: current position in the array description, and candidate value for the position. At each position, we simply count the number of possible “paths” coming from the start of the array. The runtime is $O(nm)$ .

What I mean by “path” is, let’s pretend that the beginning of the array (index 0) is known to be the value 5. From index 0, we might be able to “walk” to three different positions in index 1: {index 1, value 4}, {index 1, value 5}, and {index 1, value 6}. We say that there are valid “paths” from the beginning of the array to these three positions. These paths may extend all the way through the array, constrained by the known array values.

TODO: I should illustrate what I mean by a “path”.

The subproblem written out:

\begin{gather*} m = \text{upper bound for array values} \\ \braces{a_0, \dots, a_{n - 1}} = \text{the array description} \end{gather*}

F\brackets{i, v} = \begin{cases} \displaystyle \vphantom{\sum_{A}^A} \begin{array}{l} 1 \end{array} \quad& \parens{0 < v \le m} \land \parens{a_0 \in \braces{0, v}} \land \parens{i = 0} \\ \displaystyle \begin{array}{l} F\brackets{i - 1, v} \\ \quad + F\brackets{i - 1, v - 1} \\ \quad + F\brackets{i - 1, v + 1} \end{array} \quad& \parens{0 < v \le m} \land \parens{a_0 \in \braces{0, v}} \land \parens{i > 0} \\ \displaystyle \vphantom{\sum_{A}^A} \begin{array}{l} 0 \end{array} \quad& \text{otherwise} \end{cases}

\text{solution} = \sum_{v = 1}^m F\brackets{n - 1, v}

A top-down implementation in Python TLE’d, but otherwise gave no wrong answers:

#!/usr/bin/env python3 from sys import stdin, setrecursionlimit from functools import cache setrecursionlimit(int(2e6)) MOD = int(1e9 + 7) (_, max_v) = [int(s) for s in stdin.readline().strip().split()] desc = [int(s) for s in stdin.readline().strip().split()] @cache def dp(i, v): if (v <= 0) or (v > max_v) or ((desc[i] != 0) and (desc[i] != v)): return 0 return 1 if (i == 0) else ((dp(i - 1, v) + dp(i - 1, v - 1) + dp(i - 1, v + 1)) % MOD) print(sum(dp(len(desc) - 1, v) for v in range(1, max_v + 1)) % MOD)

Changing to a bottom-up with a full $O(nm)$ sized table passed all tests with a PyPy3 worst-case runtime of 0.33s:

#!/usr/bin/env python3

from sys import stdin

MOD = int(1e9 + 7)

(_, max_v) = [int(s) for s in stdin.readline().strip().split()]
desc = [int(s) for s in stdin.readline().strip().split()]

tab = [[0]*(max_v + 2) for _ in range(len(desc))]

if desc[0] == 0:
    tab[0] = [0, *(1 for _ in range(max_v)), 0]
else:
    tab[0][desc[0]] = 1

for i, known_v in enumerate(desc[1:], start=1):
    it = range(1, max_v + 1) if (known_v == 0) else (known_v,)
    for v in it:
        tab[i][v] = (tab[i - 1][v] + tab[i - 1][v - 1] + tab[i - 1][v + 1]) % MOD

print(sum(tab[-1]) % MOD)

Improving on the bottom-up solution by switching to an $O(m)$ table also passed, but now with a PyPy3 worst-case runtime of 0.13s:

#!/usr/bin/env python3 from sys import stdin MOD = int(1e9 + 7) (_, max_v) = [int(s) for s in stdin.readline().strip().split()] desc = [int(s) for s in stdin.readline().strip().split()] #tab0 = ASSIGNED LATER tab1 = [0]*(max_v + 2) if desc[0] == 0: tab0 = [0, *(1 for _ in range(max_v)), 0] else: tab0 = [0]*(max_v + 2) tab0[desc[0]] = 1 for i, known_v in enumerate(desc[1:], start=1): if known_v == 0: for v in range(1, max_v + 1): tab1[v] = (tab0[v] + tab0[v - 1] + tab0[v + 1]) % MOD else: tab1 = [0]*(max_v + 2) tab1[known_v] = (tab0[known_v] + tab0[known_v - 1] + tab0[known_v + 1]) % MOD (tab0, tab1) = (tab1, tab0) print(sum(tab0) % MOD)

Counting Towers [Spec] (2024-04-04)

This one’s a hard one! But also a fun one! Let’s start by enumerating some simple examples to get a feel for how this works:

n=1  --> solution=2

AB  AA

n=2  --> solution=8

CD       AB  BA  AB
AB       AC  CA  AB

AA       AA  BC  AA
AA       BC  AA  BB

The first idea that comes to mind is to try a subproblem with two dimensions: the $x$ and $y$ coordinates of the top-right of the tower, where the bottom-left corner is at coordinates $\parens{0, 0}$ . Let’s call this subproblem $F\brackets{x, y}$ . $x$ is the column index while $y$ is the row index. However, it doesn’t work well:

We expect since the only possible tower is a single block.We expect since it’s the case. Perhaps we should try calculating every possible block that contains coordinates ? Let’s try sketching it out:(Apologies if this is hard to understand, but hopefully it'll be clear by the time you also see the F[1, 1] example.) There are two cases of blocks that contain (1, 0): ?A AA The first case can only be the tower: BA Therefore, the first case shows one possible tower. The second case is trivially just the entire tower being a single block, therefore it also shows only one tower. Therefore, there are two possible towers for F[1, 0].Let’s also try , which is the case:There are four cases to consider: ?A AA ?A AA ?? ?? ?A AA Case 1 has 3 combinations: BA BA BA CD BD DD Case 2 has 2 combinations: AA AA BC BB Case 3 has 2 combinations: BA BA CA BA Case 4 trivially only has the one combination: AA AA Therefore, there are 8 possible towers for F[1, 1].And let’s also try :n=3 case 1 has 1 combination: XX XX XX case 2 has 2 combinations: XX XX XX XX --> XX XX ?? AB AA case 3 has 8 combinations: XX XX XX XX XX XX XX XX XX (These are all similar ?? --> CD AB BA AB AA AA BC AA to n=1.) ?? AB AC CA AB AA BC AA BB case 4 has 4 combinations: ?X AX AX CX AX ?X --> BX AX AX AX ?X CX CX AX AX case 5 has 6 combinations: ?X AX AX BX BX AX AX ?X --> BX AX AX CX AX AX ?? CD CD AD AA BB AB case 6 has 12 combinations: ?X YX YX YX YX YX YX YX YX (This row is all ?? --> CD AB BA AB AA AA BC AA similar to case 3.) ?? AB AC CA AB AA BC AA BB YX YX YX YX YD YB YA YX AB AA BA YA Therefore, there are 33 possible towers for F[1, 2].So how might we calculate each cases of ?The first thing that comes to mind is to try breaking it down into three calculations. Let’s try Case 6 as an example. Case 6 might be seen as a combination of three cases: , , and :F[0, 2] | F[1, 1] | F[0, 1] | | ? | | ? | ?? | ? ? | ?? | ? | | 4 combs. | 8 combs. | 2 combs.

\begin{align*} F\brackets{0, 2} + F\brackets{1, 1} - F\brackets{0, 1} &= 4 + 8 - 2 \\ &= 10 \ne 12 \end{align*}

I also tried out:

\begin{align*} F\brackets{0, 2} + F\brackets{1, 1} &= 4 + 8 \\ &= 12 \end{align*}

But this doesn’t generalize well. For example:The case to consider: ?A ?? This would be F[0, 1] + F[1, 0] = 2 + 2 = 4 But we have already seen that this case should only have 3 combinations.I’m still not sure how to make this idea work, so let’s move on to something different.

#!/usr/bin/env python3 from sys import stdin, setrecursionlimit from functools import cache setrecursionlimit(int(2e6)) MOD = int(1e9 + 7) stdin.readline() tests = [int(s) for s in stdin.readlines()] @cache def dp(col, row): # --> (this count, cumulative count) if (col == 0) and (row == 0): return 1 elif (col < 0) or (row < 0): return 0 solution = 0 for row2 in range(0, row + 1): for col2 in range(0, col + 1): if (row2 == 0) and (col2 == 0): solution += 1 else: solution += dp(col, row2 - 1) + dp(col2 - 1, row) - dp(col2 - 1, row2 - 1) return solution % MOD print("test 1, expecting 2") print(dp(1, 0)) print("test 2, expecting 8") print(dp(1, 1)) print("test 3") print(dp(1, 2)) print() print(0, 0, dp(0, 0)) print(1, 0, dp(1, 0)) print(0, 1, dp(0, 1)) print(1, 1, dp(1, 1)) print() print(0, 2, dp(0, 2)) print(1, 1, dp(1, 1)) print(0, 1, dp(0, 1)) print() for height in tests: print(dp(1, height - 1))

The next idea that comes to mind is also two dimensions, with one dimension for height of the left side, and one for the right side. It naively sounds like it should perform poorly at $O(n^2)$ where $n \le {10}^6$ , but maybe we can apply the same trick as Coin Combinations II where we use an accumulator?

Let’s start by visualizing what’s going on then formalizing the subproblem:

Case 1: Both sides same level
        We arbitrarily pick the top-right block to be fixed.

   F[3,3] = F[3,2] + F[3,1] + F[3,0] + F[2,2] + F[1,1] + F[0,0]

     ??       ?        ?        ?
     ??   =   ??   +   ?    +   ?    +   ??   +        +
     ??       ??       ??       ?        ??       ??

Case 2: Left side is taller
        We pick the tallest block to be fixed.

   F[3,2] = F[2,2] + F[1,2] + F[0,2]

     ?
     ??   =   ??   +    ?   +    ?
     ??       ??       ??        ?

Case 3: Right side is taller
        We just mirror it.

   F[2,3] = F[3,2]

      ?       ?
     ??   =   ??
     ??       ??

Writing out the subproblem:

F\brackets{a, b} = \begin{cases} \displaystyle \vphantom{\sum_{A}^A} \begin{array}{l} 1 \end{array} \quad& \parens{a = b = 0} \\ \displaystyle \vphantom{\sum_{A}^A} \begin{array}{l} \displaystyle \sum_{x = 0}^{b - 1}\parens{\vphantom{\sum} F\brackets{a, x} + F\brackets{x, x}} \end{array} \quad& \parens{a = b \ne 0} \\ \displaystyle \vphantom{\sum_{A}^A} \begin{array}{l} F\brackets{b, a} \end{array} \quad& \parens{a < b \ne 0} \\ \displaystyle \vphantom{\sum_{A}^A} \begin{array}{l} \displaystyle \sum_{x = 0}^{a - 1}{F\brackets{x, b}} \end{array} \quad& \parens{b < a \ne 0} \end{cases}

I made a naive top-down Python implementation and ran it against the example and it passed! However, it was really, really, terribly slow. Implementation:

#!/usr/bin/env python3 from sys import stdin, setrecursionlimit from functools import cache setrecursionlimit(int(2e6)) MOD = int(1e9 + 7) stdin.readline() tests = [int(s) for s in stdin.readlines()] @cache def dp(a, b): if (a == 0) and (b == 0): return 1 elif (a < 0) or (b < 0): raise RuntimeError() solution = 0 if a == b: for x in range(a): solution += dp(a, x) + dp(x, x) elif b < a: for x in range(a): solution += dp(x, b) else: return dp(b, a) return solution % MOD for height in tests: print(dp(height, height))

In an attempt to optimize, I tried expanding some calculations to get a feel for what we might build accumulators for. However, I found nothing. Here’s what I tried:

\begin{align*} F\brackets{1, 1} &= F\brackets{1, 0} + F\brackets{0, 0} \end{align*}

\begin{align*} F\brackets{2, 2} &= F\brackets{2, 1} + F\brackets{1, 1} \\ &\qquad + F\brackets{2, 0} + F\brackets{0, 0} \\ &= \parens{F\brackets{1, 1} + F\brackets{0, 1}} + F\brackets{1, 1} \\ &\qquad + \parens{F\brackets{1, 0} + F\brackets{0, 0}} + F\brackets{0, 0} \\ &= 2 F\brackets{1, 1} + 2 F\brackets{0, 0} + 2 F\brackets{1, 0} \end{align*}

\begin{align*} F\brackets{3, 3} &= F\brackets{3, 2} + F\brackets{2, 2} \\ &\qquad + F\brackets{3, 1} + F\brackets{1, 1} \\ &\qquad + F\brackets{3, 0} + F\brackets{0, 0} \\ &= \parens{F\brackets{2, 2} + F\brackets{1, 2} + F\brackets{0, 2}} + F\brackets{2, 2} \\ &\qquad + \parens{F\brackets{2, 1} + F\brackets{1, 1} + F\brackets{0, 1}} + F\brackets{1, 1} \\ &\qquad + \parens{F\brackets{2, 0} + F\brackets{1, 0} + F\brackets{0, 0}} + F\brackets{0, 0} \\ &= 2 F\brackets{2, 1} + 2 F\brackets{2, 2} + 2 F\brackets{2, 0} \\ &\qquad + 2 F\brackets{1, 1} + 2 F\brackets{1, 0} \\ &\qquad + 2 F\brackets{0, 0} \\ &= 2 \parens{F\brackets{1, 1} + F\brackets{0, 1}} + 2 F\brackets{2, 2} + 2 F\brackets{2, 0} \\ &\qquad + 2 F\brackets{1, 1} + 2 F\brackets{1, 0} \\ &\qquad + 2 F\brackets{0, 0} \\ &= 2 F\brackets{2, 2} + 2 F\brackets{2, 0} \\ &\qquad + 4 F\brackets{1, 1} + 4 F\brackets{1, 0} \\ &\qquad + 2 F\brackets{0, 0} \end{align*}

\begin{align*} F\brackets{4, 4} &= F\brackets{4, 3} + F\brackets{3, 3} \\ &\qquad + F\brackets{4, 2} + F\brackets{2, 2} \\ &\qquad + F\brackets{4, 1} + F\brackets{1, 1} \\ &\qquad + F\brackets{4, 0} + F\brackets{0, 0} \\ &= \parens{F\brackets{3, 3} + F\brackets{2, 3} + F\brackets{1, 3} + F\brackets{0, 3}} + F\brackets{3, 3} \\ &\qquad + \parens{F\brackets{3, 2} + F\brackets{2, 2} + F\brackets{1, 2} + F\brackets{0, 2}} + F\brackets{2, 2} \\ &\qquad + \parens{F\brackets{3, 1} + F\brackets{2, 1} + F\brackets{1, 1} + F\brackets{0, 1}} + F\brackets{1, 1} \\ &\qquad + \parens{F\brackets{3, 0} + F\brackets{2, 0} + F\brackets{1, 0} + F\brackets{0, 0}} + F\brackets{0, 0} \\ &= \parens{F\brackets{3, 2} + F\brackets{3, 1} + F\brackets{3, 0}} + 2 F\brackets{3, 3} \\ &\qquad + \parens{F\brackets{3, 2} + F\brackets{2, 1} + F\brackets{2, 0}} + 2 F\brackets{2, 2} \\ &\qquad + \parens{F\brackets{3, 1} + F\brackets{2, 1} + F\brackets{1, 0}} + 2 F\brackets{1, 1} \\ &\qquad + \parens{F\brackets{3, 0} + F\brackets{2, 0} + F\brackets{1, 0}} + 2 F\brackets{0, 0} \\ &= 2 F\brackets{3, 3} + 2 F\brackets{3, 2} + 2 F\brackets{3, 1} + 2 F\brackets{3, 0} \\ &\qquad + 2 F\brackets{2, 2} + 2 F\brackets{2, 1} + 2 F\brackets{2, 0} \\ &\qquad + 2 F\brackets{1, 1} + 2 F\brackets{1, 0} \\ &\qquad + 2 F\brackets{0, 0} \\ &= 2 F\brackets{3, 3} + 2 \parens{F\brackets{2, 2} + F\brackets{2, 1} + F\brackets{2, 0}} + 2 \parens{F\brackets{2, 1} + F\brackets{1, 1} + F\brackets{1, 0}} + 2 F\brackets{3, 0} \\ &\qquad + 2 F\brackets{2, 2} + 2 \parens{F\brackets{1, 1} + F\brackets{1, 0}} + 2 F\brackets{2, 0} \\ &\qquad + 2 F\brackets{1, 1} + 2 F\brackets{1, 0} \\ &\qquad + 2 F\brackets{0, 0} \\ &= 2 F\brackets{3, 3} + 4 \parens{F\brackets{2, 1}} + 2 F\brackets{3, 0} \\ &\qquad + 4 F\brackets{2, 2} + 4 F\brackets{2, 0} \\ &\qquad + 6 F\brackets{1, 1} + 6 F\brackets{1, 0} \\ &\qquad + 2 F\brackets{0, 0} \\ &= 2 F\brackets{3, 3} + 4 \parens{F\brackets{1, 1} + F\brackets{1, 0}} + 2 F\brackets{3, 0} \\ &\qquad + 4 F\brackets{2, 2} + 4 F\brackets{2, 0} \\ &\qquad + 6 F\brackets{1, 1} + 6 F\brackets{1, 0} \\ &\qquad + 2 F\brackets{0, 0} \\ &= 2 F\brackets{3, 3} + 2 F\brackets{3, 0} \\ &\qquad + 4 F\brackets{2, 2} + 4 F\brackets{2, 0} \\ &\qquad + 10 F\brackets{1, 1} + 10 F\brackets{1, 0} \\ &\qquad + 2 F\brackets{0, 0} \end{align*}

\begin{align*} F\brackets{1, 1} &= F\brackets{1, 0} + F\brackets{0, 0} \end{align*}

\begin{align*} F\brackets{2, 2} &= 2 F\brackets{1, 1} + 2 F\brackets{0, 0} + 2 F\brackets{1, 0} \end{align*}

\begin{align*} F\brackets{3, 3} &= 2 F\brackets{2, 2} + 2 F\brackets{2, 0} \\ &\qquad + 4 F\brackets{1, 1} + 4 F\brackets{1, 0} \\ &\qquad + 2 F\brackets{0, 0} \end{align*}

\begin{align*} F\brackets{4, 4} &= 2 F\brackets{3, 3} + 2 F\brackets{3, 0} \\ &\qquad + 4 F\brackets{2, 2} + 4 F\brackets{2, 0} \\ &\qquad + 10 F\brackets{1, 1} + 10 F\brackets{1, 0} \\ &\qquad + 2 F\brackets{0, 0} \end{align*}

I also tried expanding out the $F\brackets{x, 0}$ ‘s:

\begin{align*} F\brackets{1, 1} &= F\brackets{1, 0} + F\brackets{0, 0} \end{align*}

\begin{align*} F\brackets{3, 3} &= 2 F\brackets{2, 2} + 2 \parens{F\brackets{1, 0} + F\brackets{0, 0}} \\ &\qquad + 4 F\brackets{1, 1} + 4 F\brackets{1, 0} \\ &\qquad + 2 F\brackets{0, 0} \\ &= 2 F\brackets{2, 2} \\ &\qquad + 4 F\brackets{1, 1} + 6 F\brackets{1, 0} \\ &\qquad + 4 F\brackets{0, 0} \\ &= 2 F\brackets{2, 2} + 4 F\brackets{1, 1} + 10 F\brackets{0, 0} \end{align*}

\begin{align*} F\brackets{4, 4} &= 2 F\brackets{3, 3} + 2 \parens{F\brackets{2, 0} + F\brackets{1, 0} + F\brackets{0, 0}} \\ &\qquad + 4 F\brackets{2, 2} + 4 F\brackets{2, 0} \\ &\qquad + 10 F\brackets{1, 1} + 10 F\brackets{1, 0} \\ &\qquad + 2 F\brackets{0, 0} \\ &= 2 F\brackets{3, 3} \\ &\qquad + 4 F\brackets{2, 2} + 6 F\brackets{2, 0} \\ &\qquad + 10 F\brackets{1, 1} + 12 F\brackets{1, 0} \\ &\qquad + 4 F\brackets{0, 0} \\ &= 2 F\brackets{3, 3} \\ &\qquad + 4 F\brackets{2, 2} + 6 \parens{F\brackets{0, 0} + F\brackets{0, 0}} \\ &\qquad + 10 F\brackets{1, 1} + 12 F\brackets{0, 0} \\ &\qquad + 4 F\brackets{0, 0} \\ &= 2 F\brackets{3, 3} + 4 F\brackets{2, 2} + 10 F\brackets{1, 1} + 28 F\brackets{0, 0} \end{align*}

\begin{align*} F\brackets{2, 2} &= 2 F\brackets{1, 1} + 4 F\brackets{0, 0} \\ F\brackets{3, 3} &= 2 F\brackets{2, 2} + 4 F\brackets{1, 1} + 10 F\brackets{0, 0} \\ F\brackets{4, 4} &= 2 F\brackets{3, 3} + 4 F\brackets{2, 2} + 10 F\brackets{1, 1} + 28 F\brackets{0, 0} \end{align*}

I also tried expanding the $F\brackets{x, x}$ ‘s:

\begin{align*} F\brackets{1, 1} &= F\brackets{1, 0} + F\brackets{0, 0} \end{align*}

\begin{align*} F\brackets{3, 3} &= 2 \parens{4 F\brackets{0, 0} + 4 F\brackets{1, 0}} + 2 F\brackets{2, 0} \\ &\qquad + 8 F\brackets{1, 0} \\ &\qquad + 6 F\brackets{0, 0} \\ &= 2 F\brackets{2, 0} + 16 F\brackets{1, 0} + 14 F\brackets{0, 0} \end{align*}

\begin{align*} F\brackets{4, 4} &= 2 \parens{2 F\brackets{2, 0} + 16 F\brackets{1, 0} + 14 F\brackets{0, 0}} + 2 F\brackets{3, 0} \\ &\qquad + 4 \parens{4 F\brackets{0, 0} + 4 F\brackets{1, 0}} + 4 F\brackets{2, 0} \\ &\qquad + 20 F\brackets{1, 0} \\ &\qquad + 12 F\brackets{0, 0} \\ &= 2 F\brackets{3, 0} + 36 F\brackets{2, 0} + 68 F\brackets{1, 0} + 28 F\brackets{0, 0} \end{align*}

\begin{align*} F\brackets{2, 2} &= 4 F\brackets{0, 0} + 4 F\brackets{1, 0} \\ F\brackets{3, 3} &= 2 F\brackets{2, 0} + 16 F\brackets{1, 0} + 14 F\brackets{0, 0} \\ F\brackets{4, 4} &= 2 F\brackets{3, 0} + 36 F\brackets{2, 0} + 68 F\brackets{1, 0} + 28 F\brackets{0, 0} \end{align*}

So let’s try a different approach. I believe that the following sums should be easy to build accumulators for:

\sum_{x = 0}^n F\brackets{x, x} \qquad\qquad \sum_{x = 0}^n F\brackets{x, 0}

But this sum is giving me headaches right now:

S\brackets{b} \coloneqq \sum_{x = 0}^{b - 1} F\brackets{x, b} \quad\text{for } b > 0

That sum is an important part of the second line of the subproblem. So is it possible to build an accumulator for it?

\begin{align*} S\brackets{b} &= \sum_{x = 0}^{b - 1} F\brackets{b, x} = \sum_{x = 0}^{b - 1} \sum_{y = 0}^{b - 1} F\brackets{y, x} \end{align*}

Now, let’s visualize what’s happening. For example, $S\brackets{4}$ is the sum of all of these:

\begin{array}{cccc} F\brackets{0, 3} & F\brackets{1, 3} & F\brackets{2, 3} & \xmemphR{F\brackets{3, 3}} \\ F\brackets{0, 2} & F\brackets{1, 2} & \xmemphR{F\brackets{2, 2}} & F\brackets{3, 2} \\ F\brackets{0, 1} & \xmemphR{F\brackets{1, 1}} & F\brackets{2, 1} & F\brackets{3, 1} \\ \xmemphR{F\brackets{0, 0}} & F\brackets{1, 0} & F\brackets{2, 0} & F\brackets{3, 0} \end{array}

Using the third case of the subproblem, we can reduce this to:

\begin{array}{cccc} {} & {} & {} & \xmemphR{F\brackets{3, 3}} \\ {} & {} & \xmemphR{F\brackets{2, 2}} & 2 F\brackets{3, 2} \\ {} & \xmemphR{F\brackets{1, 1}} & 2 F\brackets{2, 1} & 2 F\brackets{3, 1} \\ \xmemphR{F\brackets{0, 0}} & 2 F\brackets{1, 0} & 2 F\brackets{2, 0} & 2 F\brackets{3, 0} \end{array}

Based on this pattern, $S\brackets{b}$ is reducible to:

S\brackets{b} = \sum_{x = 0}^{b - 1} F\brackets{x, x} + 2 \sum_{x = 1}^{b - 1} S\brackets{x} \qquad\text{for } b > 0

Therefore, we can calculate $S\brackets{b}$ using accumulators!

Let’s now construct an alternative subproblem derived from this, where $F'\brackets{n}$ is the solution for the number of combinations to construct an $n$ -sized tower:

\begin{align*} F'\brackets{n} &\coloneqq F\brackets{n, n} = \sum_{x=0}^{n - 1}\parens{F\brackets{n, x} + F\brackets{x, x}} \\ &= \sum_{x=0}^{n - 1}F\brackets{n, x} + \sum_{x=0}^{n - 1}F\brackets{x, x} \\ &= S\brackets{n} + \sum_{x=0}^{n - 1}F'\brackets{x} \qquad\text{for } n > 0 \end{align*}

F'\brackets{0} = 1

S\brackets{n} = \sum_{x = 0}^{n - 1} F'\brackets{x} + 2 \sum_{x = 1}^{n - 1} S\brackets{x} \qquad\text{for } n > 0

Attempting to simplify seems… too complicated for me to bother right now.

\begin{align*} F\brackets{1, 1} &= F\brackets{1, 0} + F\brackets{0, 0} \end{align*}

My Python top-down implementation passed the first two test cases containing queries for $n \le 100$ , but we get RUNTIME ERROR for the other much bigger test cases, so my code probably goes over the memory usage limit of the auto-judge. Implementation:

#!/usr/bin/env python3 from sys import stdin, setrecursionlimit from functools import cache setrecursionlimit(int(2e6)) MOD = int(1e9 + 7) stdin.readline() tests = [int(s) for s in stdin.readlines()] @cache def f_val(n): assert n >= 0 if n <= 1: return 2 if n else 1 return (s_val(n) + f_cum(n - 1)) % MOD @cache def f_cum(n): assert n >= 0 return ((f_cum(n - 1) + f_val(n)) if n else f_val(0)) % MOD @cache def s_val(n): assert n > 0 if n == 1: return 1 return (f_cum(n - 1) + (2 * s_cum(n - 1))) % MOD @cache def s_cum(n): assert n > 0 return (s_val(1) if (n == 1) else (s_cum(n - 1) + s_val(n))) % MOD for height in tests: print(f_val(height))

Reimplementing as bottom-up passed all tests:

#!/usr/bin/env python3

from sys import stdin

MOD = int(1e9 + 7)

stdin.readline()
tests = [int(s) for s in stdin.readlines()]

tab = [(1, 1, None, None), (2, 3, 1, 1)]  # [(f_val, f_cum, s_val, s_cum), ...]
for _ in range(max(tests) - 1):
    (f_val, f_cum, s_val, s_cum) = tab[-1]
    s_val_new = (f_cum + (2 * s_cum)) % MOD
    f_val_new = (s_val_new + f_cum) % MOD
    tab.append((
        f_val_new,
        (f_val_new + f_cum) % MOD,
        s_val_new,
        (s_val_new + s_cum) % MOD,
    ))

for height in tests:
    print(tab[height][0])

Edit Distance [Spec] (2024-04-08)

Thinking through some edge cases, it was tempting to conclude that you’d never need to remove characters from the shorter word, but I believe this would break that assumption:

Test case:
   xxABCDEFGH
   ABCDEFGHxxx

The actual edit distance is 5:
   xxABCDEFGH   (0)
   xABCDEFGH    (1)
   ABCDEFGH     (2)
   ABCDEFGHx    (3)
   ABCDEFGHxx   (4)
   ABCDEFGHxxx  (5)

If we disallowed removing characters:
   xxABCDEFGH   (0)
   AxABCDEFGH   (1)
   ABABCDEFGH   (2)
   ...
   ABCDEFGHGH   (8)
   ABCDEFGHxH   (9)
   ABCDEFGHxx   (10)
   ABCDEFGHxxx  (11)

The first subproblem that comes to mind is two-dimensional, each dimension is an index into each string. Subproblem $F\brackets{i, j}$ is the edit distance of substrings string1[0..i] and string2[0..j] (i and j being inclusive).

I jumped straight into coding a solution first. Here’s my Python top-down solution that passes many test cases but only fails due to RUNTIME ERROR (almost certainly due to memory usage):

#!/usr/bin/env python3 from functools import cache (s1, s2) = (input(), input()) @cache def dp(i, j): if (i < 0) or (j < 0): return i + j + 2 elif s1[i] == s2[j]: return dp(i - 1, j - 1) return 1 + min(dp(i - 1, j - 1), dp(i - 1, j), dp(i, j - 1)) print(dp(len(s1) - 1, len(s2) - 1))

Writing out the subproblem:

s_1 = \text{string 1} \qquad\qquad s_2 = \text{string 2}

F\brackets{i, j} = \begin{cases} \displaystyle \vphantom{\sum_{A}^A} \begin{array}{l} 0 \end{array} \quad& \parens{i < 0} \land \parens{j < 0} \\ \displaystyle \vphantom{\sum_{A}^A} \begin{array}{l} j + 1 \end{array} \quad& \parens{i < 0} \land \parens{j \ge 0} \\ \displaystyle \vphantom{\sum_{A}^A} \begin{array}{l} i + 1 \end{array} \quad& \parens{i \ge 0} \land \parens{j < 0} \\ \displaystyle \vphantom{\sum_{A}^A} \begin{array}{l} F\brackets{i - 1, j - 1} \end{array} \quad& \parens{i \ge 0} \land \parens{j \ge 0} \land \parens{s_1\brackets{i} = s_2\brackets{j}} \\ \displaystyle \vphantom{\sum_{A}^A} \begin{array}{l} \displaystyle 1 + \min(F\brackets{i - 1, j - 1}, \\ \qquad F\brackets{i - 1, j}, F\brackets{i, j - 1}) \end{array} \quad& \text{otherwise} \end{cases}

The first case is simply the case of two empty substrings. Naturally, empty substrings are already equivalent.

The second and third cases are the case where you compare an empty substring with a non-empty substring. Naturally, the solution is just the length of the non-empty substring.

My top-down Python solution simplifies these first three cases down with essentially:

assert (i >= -1) and (j >= -1)
if (i < 0) or (j < 0):
    return i + j + 2

The fourth case is the case where the two indices point to an equivalent character. This means no change needs to be made at that character.

The final case is where the two indices don’t point to an equivalent character. There’s a lot going on in that one line, so to see how this works, let’s consider this example:

              v i=4
string 1: ...AB...
string 2: ..XY.....
             ^ j=3

The dots represent characters that we don’t care about. Index i points to that B character in string 1, while index j points to that Y character in string 2.

To make it easier to understand, let’s write this situation down in two different but conceptually identical ways:

\text{EditDistance}\parens{\texttt{"...AB"}, \texttt{"..XY"}}

F\brackets{4, 3}

Recall that the problem specification allows these operations:

Add one character to the string.
Remove one character from the string.
Replace one character in the string.

Let’s start with operation 3 because it’s the simplest. Let’s suppose we change the B in string 1 to Y:

\begin{align*} \text{EditDistance31}\parens{\texttt{"...AB"}, \texttt{"..XY"}} &= 1 + \text{EditDistance}\parens{\texttt{"...A\xmemphRC{Y}"}, \texttt{"..XY"}} \\ &= 1 + \text{EditDistance}\parens{\texttt{"...A"}, \texttt{"..X"}} \end{align*}

F_{31}\brackets{4, 3} = 1 + F\brackets{3, 2}

The math works out identically even if we instead changed the Y in string 2 to B:

\begin{align*} \text{EditDistance32}\parens{\texttt{"...AB"}, \texttt{"..XY"}} &= 1 + \text{EditDistance}\parens{\texttt{"...AB"}, \texttt{"..X\xmemphRC{B}"}} \\ &= 1 + \text{EditDistance}\parens{\texttt{"...A"}, \texttt{"..X"}} \end{align*}

F_{32}\brackets{4, 3} = 1 + F\brackets{3, 2}

Now, let’s try operation 2. If we remove B from string 1:

\text{EditDistance21}\parens{\texttt{"...AB"}, \texttt{"..XY"}} = 1 + \text{EditDistance}\parens{\texttt{"...A"}, \texttt{"..XY"}}

F_{21}\brackets{4, 3} = 1 + F\brackets{3, 3}

Or if we remove Y from string 2 instead:

\text{EditDistance22}\parens{\texttt{"...AB"}, \texttt{"..XY"}} = 1 + \text{EditDistance}\parens{\texttt{"...AB"}, \texttt{"..X"}}

F_{22}\brackets{4, 3} = 1 + F\brackets{4, 2}

And finally, let’s try operation 1. If we insert Y to string 1:

\begin{align*} \text{EditDistance11}\parens{\texttt{"...AB"}, \texttt{"..XY"}} &= 1 + \text{EditDistance}\parens{\texttt{"...AB\xmemphRC{Y}"}, \texttt{"..XY"}} \\ &= 1 + \text{EditDistance}\parens{\texttt{"...AB"}, \texttt{"..X"}} \end{align*}

F_{11}\brackets{4, 3} = 1 + F\brackets{4, 2}

And if we instead insert B into string 2:

\begin{align*} \text{EditDistance12}\parens{\texttt{"...AB"}, \texttt{"..XY"}} &= 1 + \text{EditDistance}\parens{\texttt{"...AB"}, \texttt{"..XY\xmemphRC{B}"}} \\ &= 1 + \text{EditDistance}\parens{\texttt{"...A"}, \texttt{"..XY"}} \end{align*}

F_{12}\brackets{4, 3} = 1 + F\brackets{3, 3}

Having gone through all six possible operations, it should be fairly straightforward to see how we get $1 + \min\parens{F\brackets{i - 1, j - 1}, F\brackets{i - 1, j}, F\brackets{i, j - 1}}$ as the final case of the subproblem.

I think it’s also useful to point out that the equivalence of $F_{21} = F_{12}$ and $F_{22} = F_{11}$ can also be examined more intuitively.

Let’s consider these two strings:

AXXX
XXX

To make them equivalent, you can either delete the A from string 1:

XXX
XXX

or, you can add an A to string 2:

AXXX
AXXX

Either way, we incurred a cost of $1$ operation.

My Python bottom-up implementation passes all tests:

#!/usr/bin/env python3

(s1, s2) = (input(), input())

tab = [[i]*(len(s2) + 1) for i in range(1, len(s1) + 1)]
# extra row and column, useful for negative indexing
tab.append(list(range(1, len(s2) + 2)))
tab[-1][-1] = 0

for i in range(len(s1)):
    for j in range(len(s2)):
        if s1[i] == s2[j]:
            tab[i][j] = tab[i - 1][j - 1]
        else:
            tab[i][j] = 1 + min(tab[i - 1][j - 1], tab[i - 1][j], tab[i][j - 1])

print(tab[-2][-2])