Simple Compile-Time Dynamic Programming in Modern C++

Compile time code can be very efficient. Andrew Drakeford demonstrates how to write efficient chains of matrix multiplication.

Modern C++ enables us to solve mathematical optimisation problems at compile time. With the expanded constexpr capabilities [Fertig21, Turner18, Turner19, Wu24], we can now write clear and efficient optimisation logic that runs during compilation. Fixed-size containers such as std::array fit naturally into these routines. Even standard algorithms, such as std::sort and std::lower_bound, are now constexpr, enabling more straightforward code and more powerful compile-time computations. Additionally, compile-time optimisation generates constant results, which enables the compiler to create even more efficient code. We will use the matrix chain multiplication problem as our worked example.

Matrix chain multiplication

Matrix chain multiplication is a classic dynamic programming problem [Corman22, Das19, Mount]. It aims to determine the most efficient method for multiplying a sequence of matrices. Since matrix multiplication is associative, the order of grouping does not affect the result. However, the number of scalar multiplications involved can vary depending on the grouping.

Consider the three matrices A₁ (10×100), A₂ (100×5), and A₃ (5×50), multiplied in a chain, A₁ × A₂ × A₃.

There are two ways to multiply them:

Grouping as (A₁ × A₂) × A₃ first computes a 10×5 matrix, then multiplies that with A₃. This results in 5,000 operations for the first multiplication, and another 2,500 for the second – a total of 7,500 scalar multiplications.
Grouping as A₁ × (A₂ × A₃) first multiplies A₂ and A₃, then A₁. This results in 25,000 operations for the first step and 50,000 for the second – a total of 75,000, which is clearly worse.

Setting up the dynamic programming problem

We define the cost of computing the matrix chain as the total number of scalar multiplications. To solve the dynamic programming problem, we need to find the grouping which minimises the cost. Let us consider the matrix chain shown in Figure 1.

Figure 1

If we split the matrix chain in half, we obtain the two sub-chains as shown in Figure 2.

Figure 2

On each side of the split, we multiply the matrices together (shown in blue: the uppers, smaller boxes) to produce the two intermediate matrices (shown in red, as ‘result’). To complete the chain multiplication, we multiply the intermediate results together. The total cost for the chain is:

Total cost = left chain cost + right chain cost + split point cost

The split point cost is the cost of multiplying the intermediate (red, ‘result’) matrices together to complete the chain. By pre-calculating the minimum cost for both left and right chains, we can quickly determine the lowest cost for the entire chain by evaluating each possible split and selecting the least expensive option.

Solving it with dynamic programming: the algorithm

We approach this by using bottom-up dynamic programming, i.e., solving the smallest subproblems first. Consequently, we work over chains of increasing length, finding their optimal split position and minimum cost. For each chain, we calculate the cost at each possible split point and then identify the optimal split point. We store the optimal costs of each chain (subsequence) in a square matrix, where the row and column indices define the start and end points of the chain, respectively. This enables us to retrieve the optimal costs for both left and right chains, avoiding recalculation. We use an additional array, the split matrix, to store the optimal split position for each chain. Figure 3 shows the dynamic programming matrix.

The line shows the chains of length 2.

Figure 3

The code

Listing 1 shows the function matrix_chain_dp. It takes a fixed-size std::array containing the matrix dimensions of the chain and returns a square array of the optimal split points for all sub-chains. The function has two local variables, dp and split, which are square arrays used to store the optimal cost and split point for each sub-chain considered.

// Function to compute MCM using Bottom-Up DP
// (constexpr)
template <std::size_t N>
constexpr std::array<std::array<int, N>, N> 
  matrix_chain_dp(const std::array<int, N + 1>& 
  dims)
{
  std::array<std::array<int, N>, N> dp{};
  std::array<std::array<int, N>, N> split{};
  // Initialize dp table for chains of length
  // 1(have zero cost) since no multiplication
  for (std::size_t i = 0; i < N; ++i) 
  {
    dp[i][i] = 0;
  }
  // Bottom-Up DP computation
  for (std::size_t len = 2; len <= N; ++len)
  // iterate over chain length
  { // Chain length
    for (std::size_t i = 0; i < N - len + 1; ++i)
    // iterate over chain start position
    {
      std::size_t j = i + len - 1;
      dp[i][j] = std::numeric_limits<int>::max();
      for (std::size_t k = i; k < j; ++k) 
      // iterate over split position
      {
        // optimal cost  = cost of left chain 
        // + cost of right chain + cost of split
        int cost = dp[i][k] + dp[k + 1][j] 
          + dims[i] * dims[k + 1] * dims[j + 1];
        if (cost < dp[i][j]) 
        {
          dp[i][j] = cost; // store lowest cost
          split[i][j] = k; //store optimal split
        }
      }
    }
  }
  return split;
}

Listing 1

Listing 2 shows an example of running the optimisation at compile time. The function solves the dynamic programming problem in a bottom-up manner. The code iterates through chain length, position, and split point in nested loops. At each split, it evaluates the cost function and stores the minimum cost and its split point in the dp and split matrices, respectively. The function then returns the split matrix.

int main() 
{
  // Given Matrix Chain: 6 Matrices (DIMENSIONS)
  constexpr std::array<int, 7> dims = {
    10, 100, 5, 50, 10, 100, 5 };
  constexpr std::size_t N = dims.size() - 1;
  // Compute split table at compile time
  constexpr auto split 
    = matrix_chain_dp<N>(dims);
  // Print optimal parenthesization
  std::cout << "Optimal Parenthesization: ";
  print_optimal_parenthesization(split, 0,
    N - 2);
  std::cout << "\n";
  return 0;
}

Listing 2

The function main establishes an array, dims, which is constexpr. We initialise it declaratively, with the matrix chain dimensions. It is passed into the constexpr function matrix_chain_dp, which initialises the constexpr variable split with the optimisation results.

The result, split, is an array of arrays, representing a square matrix, which contains the value of a chain’s optimal split point. The rows and columns index the start and end points of a sub-chain, respectively.

To extract the optimal chain sequence, we recursively unpack the split matrix, starting from the top-right cell of the first row, which yields the split point for the entire chain (from 0 to N-1). This point divides the chain into left and right sub-chains, whose optimal splits can also be found in the split matrix.

The function print_optimal_parenthesization retrieves the optimal evaluation order and prints it out to give the result shown below:

  Optimal Parenthesization: ((M1 x M2) x ((M3 x M4) x M5))

This is our first Godbolt example, which is available at https://godbolt.org/z/9a9TP6aos

However, to use something like this in real code, we need to execute the matrix chain calculation using the results given by the split matrix. For this, we use the split matrix to build a tree structure of expression templates for the matrix multiplications. Our second Godbolt example illustrates this https://godbolt.org/z/b3x3ao14K

This example creates a chain of twelve matrices. It performs both a naïve (in declaration order) matrix chain evaluation and an optimised multiplication, recording the times taken to calculate the results.

Figure 4 compares the run time of a matrix chain multiplication for naïve and optimal cases.

Figure 4

Conclusion

Modern C++ provides the necessary features to develop concise code that can solve optimisation problems at compile time. The values resulting from the optimisation process are constant, which enables the compiler to generate even more efficient executables. The optimisation results could even define a structure, such as an expression tree, for the optimal execution of a custom domain-specific language.

It is not necessary to limit our optimisation approach to dynamic programming; alternative techniques, such as graph algorithms or linear programming, may also be employed. The range of potential optimisation problems that we could tackle at compile time is vast; we need to make sensible choices about when this sort of mundane wizardry is appropriate.

References

[Corman22] Thomas Corman, Charles Leiserson, Ronald Rivest and Clifford Stein. (2022). Introduction to Algorithms. 4th Edition. MIT Press. ISBN13: 978-0262046305

[Das19] Avik Das (2019). Dynamic programming deep-dive: Chain Matrix Multiplication. medium.com. Retrieved from https://medium.com/@avik.das/dynamic-programming-deep-dive-chain-matrix-multiplication-a3b8e207b201

[Fertig21] Andreas Fertig (2021) Programming with C++20 Concepts, Coroutines, Ranges, and more Fertig Publications.

[Mount] Dave Mount (n.d.). ‘Dynamic Programming: Chain Matrix Multiplication’ Retrieved July 2025, from https://www.cs.umd.edu/class/spring2025/cmsc451-0101/Lects/lect10-dp-mat-mult.pdf

[Turner18] Jason Turner (2018) ‘Practical constexpr’, Meeting C++ Retrieved from https://isocpp.org/blog/2018/02/practical-constexpr-jason-turner

[Turner19] Jason Turner (2019) ‘Applied constexpr: Doing More Work At Compile Time’, cppConn2019. Retrieved from https://cppcon.org/class-2019-constexpr/

Wu Yongwei (2024) ‘C++ Compile-Time Programming’, Overload 183, available at https://accu.org/journals/overload/32/183/wu/

Andrew Drakeford holds a PhD in physics and has dedicated the past few decades to developing high-performance financial libraries and applications using C++. He has a keen interest in quantitative finance, machine learning, vectorisation, and high-performance computing (HPC). Additionally, he is a member of the UK C++ panel.