C++ Compile-Time Programming

C++ Compile-Time Programming

By Wu Yongwei

Overload, 32(183):7-13, October 2024


Programming at compile time has been possible in C++ for a long time. Wu Yongwei considers its past, present and future.

Compile-time programming is a key feature of C++. It enables writing high-performance code often unattainable in other languages. This article explores its past, present, and future applications, highlighting the diverse possibilities in C++. We’ll briefly cover template metaprogramming, constexpr, variadic templates, static reflection, and more.1

Introduction

Compile-time programming is vastly different from run-time programming. The code runs during compilation, but the results can be used at run time.

Some believe compile-time programming is merely a trick, unused in real-world engineering. To them, I ask: do you use the C++ Standard Library? The mainstream implementations rely heavily on various programming techniques, including compile-time programming.

‘I don’t write the standard library’ – this might be a possible response. But consider this: the standard library is just one tool, a standard weapon. Is it enough to use only standard tools? That’s the real question.

The abundance of excellent open-source libraries suggests otherwise. A skilled programmer crafts tools for themselves and their team. If your work feels tedious, perhaps it’s time to build a bulldozer to tackle problems.

Compile-time programming offers a way to build such tools.

A bit of history

In traditional C++ programming, source code is compiled into executable code, which runs when executed. However, code can also be executed at compile time. This concept dates back over 30 years.

Bjarne Stroustrup proposed templates in 1988 [Stroustrup88]. It was implemented in 1989 and fully described in the 1990 Annotated C++ Reference Manual [Ellis90]. The initial purpose of template design was to express parameterized container classes. However, in order to implement types similar to arrays, non-type template parameters (NTTP) were introduced:

  template<class E, int size> class buffer;
  buffer<char, 1024> x;

Around the same time, Alex Stepanov and David Musser introduced generic programming. Alex realized C++ templates perfectly suited his needs and used them to implement the Standard Template Library (STL), showcasing their real-world potential.

Alex Stepanov remarked on the capabilities of the C++ language [Stroustrup94]:

C++ is a powerful enough language – the first such language in our experience – to allow the construction of generic programming components that combine mathematical precision, beauty, and abstractness with the efficiency of non-generic hand-crafted code.

Please allow me to put STL aside for now, and discuss a side effect of templates. In 1994, at a C++ committee meeting, the first recorded instance of templates being ‘abused’ for compile-time programming occurred. The power of C++ templates exceeded the expectations of both its creator and the father of the STL. Erwin Unruh demonstrated the code in Listing 1 [Unruh94]. This is no longer valid C++, and does not work with today’s compilers.

template <int i> struct D {
  D(void*); operator int();
  };

template <int p, int i> struct is_prime {
  enum {
    prim = (p%i) && 
      is_prime<(i > 2 ? p : 0), i -1> :: prim };
  };

template < int i > struct Prime_print {
  Prime_print<i-1> a;
  enum { prim = is_prime<i, i-1>::prim };
  void f() { D<i> d = prim; }
  };

struct is_prime<0,0> { enum {prim=1}; };
struct is_prime<0,1> { enum {prim=1}; };
struct Prime_print<2> { enum {prim = 1};
  void f() { D<2> d = prim; } };

#ifndef LAST
#define LAST 10
#endif

main () {
  Prime_print<LAST> a;
}
Listing 1

A ‘modern’ working version is given in Listing 2.

template <int p, int i> struct is_prime {
  enum {
    prim = (p==2) ||
           (p%i) && is_prime<(i>2?p:0),
                             i-1> :: prim };
};

template<>
struct is_prime<0,0> { enum {prim=1}; };

template<>
struct is_prime<0,1> { enum {prim=1}; };

template <int i> struct D { D(void*); };

template <int i> struct Prime_print {
  Prime_print<i-1> a;
  enum { prim = is_prime<i, i-1>::prim };
  void f() { D<i> d = prim ? 1 : 0; a.f();}
};

template<> struct Prime_print<1> {
  enum {prim=0};
  void f() { D<1> d = prim ? 1 : 0; };
};

int main() {
  Prime_print<18> a;
  a.f();
}
Listing 2

This code fails to compile in an intriguing way. Some compilers produce error messages like the following (see https://godbolt.org/z/f7zjM6Ysz):

unruh.cpp:20:19: error: no viable conversion from ‘int’ to ‘D<17>’

unruh.cpp:20:19: error: no viable conversion from ‘int’ to ‘D<13>’

unruh.cpp:20:19: error: no viable conversion from ‘int’ to ‘D<11>’

unruh.cpp:20:19: error: no viable conversion from ‘int’ to ‘D<7>’

unruh.cpp:20:19: error: no viable conversion from ‘int’ to ‘D<5>’

unruh.cpp:20:19: error: no viable conversion from ‘int’ to ‘D<3>’

unruh.cpp:20:19: error: no viable conversion from ‘int’ to ‘D<2>’

Simple examples

Factorial

The previous example was a bit weird and complex. Let’s consider a simpler one – factorials. The mathematical definition of factorial can be expressed recursively as:

n! = n (n – 1)!

0! = 1

It corresponds to the following C++ code:

  template <int N>
  struct factorial {
    static const int value =
      N * factorial<N - 1>::value;
  };
  
  template <>
  struct factorial<0> {
    static const int value = 1;
  };

This almost aligns perfectly.

Let’s try some numbers.

  • Evaluating factorial<0>::value yields 1.
  • For factorial<1>::value, it computes 1 * factorial<0>::value, resulting in 1.
  • For factorial<2>::value, it computes 2 * factorial<1>::value, resulting in 2.
  • For factorial<3>::value, it computes 3 * factorial<2>::value, resulting in 6.

And so on.

This approach resembles functional programming, which is atypical for C++:

  1. A class template represents a ‘function’.
  2. Instantiating a template is akin to a ‘function call’, with a unique result and no side effects.
  3. A ‘variable’ (the static member variable in the class template) can be assigned once and not modified later.
  4. The result of instantiation is remembered during subsequent compilation, similar to memoization.

Conditionals and loops

Similarly, we can create a generic compile-time conditional construct with class templates and specialization (Listing 3).

template <bool Condition,
          typename Then, typename Else>
struct conditional;

template <typename Then, typename Else>
struct conditional<true, Then, Else> {
  using type = Then;
};

template <typename Then, typename Else>
struct conditional<false, Then, Else> {
  using type = Else;
};
Listing 3

A compile-time loop is more complex (Listing 4).

template <bool Condition, typename Body>
struct loop_result;

template <typename Body>
struct loop_result<true, Body> {
  using type = typename loop_result<
    Body::next_type::condition,
    typename Body::next_type>::type;
};

template <typename Body>
struct loop_result<false, Body> {
  using type = typename Body::type;
};

template <typename Body>
struct loop {
  using type =
    typename loop_result<Body::condition,
                         Body>::type;
};
Listing 4

When the result of a class template equals a similarly named result on the right side of an expression, we can express it more succinctly with inheritance (Listing 5).

template <bool Condition, typename Body>
struct loop_result;

template <typename Body>
struct loop_result<true, Body>
  : loop_result<Body::next_type::condition,
                typename Body::next_type> {};

template <typename Body>
struct loop_result<false, Body> : Body {};

template <typename Body>
struct loop
  : loop_result<Body::condition, Body> {};
Listing 5

In template metaprogramming, using direct recursion typically enhances readability. Using the loop construct, as shown in Listing 5, might make the result harder to understand (Listing 6).

template <int N, int Last, int Result>
struct factorial_loop {
  static const bool condition = (N <= Last);
  using type = integral_constant<int, Result>;
  using next_type =
    factorial_loop<N + 1, Last, Result * N>;
};

template <int N>
struct factorial
  : loop<factorial_loop<1, N, 1>>::type {};
Listing 6

Regardless of how it is written, we can obtain the result directly at compile time. For code like the following:

  printf("%d\n", factorial<10>::value);

The value 3628800 will appear in the assembly code, with no trace of factorial at all (for details, see https://godbolt.org/z/j59ErdnTj and https://godbolt.org/z/cxdfn97x6).

In the examples above, we follow a C++ standard library convention: using a class template’s member type type for a result type and its static member variable value for a result value.

Prime sieve example

Functional programming in template metaprogramming

Let’s return to prime numbers, without showing the result in error messages, which is not that interesting. We will compute at compile time a list of primes, which can be used at run time as well. To do this, we need some tools.

First, we need a tool to convert between values and types. The standard library provides this functionality, which looks like this:

  template <typename T, T Val>
  struct integral_constant {
    static const T value = Val;is_nonstatic
    typedef T value_type;
    typedef integral_constant type;
  };

Although templates can have non-type template parameters, having a fixed type is not flexible. When the value type is not unique, representing a compile-time constant with a type is common. The class template above can represent a constant of any integer type, like int, size_t, or bool. The value member retrieves the template’s value parameter, value_type retrieves the type parameter, and type points to itself.

Here are some examples:

  // Types
  integral_constant<int, 42>
  integral_constant<bool, true>
  // Values
  integral_constant<int, 42>::value    // 42
  integral_constant<bool, true>::value // true

Next, we need a tool similar to a list in functional programming. I’ll use a functional, C++98-compatible definition:

  struct nil {};
  template <typename Head, typename Tail = nil>
  struct list {};

Those familiar with functional programming will recognize the pattern immediately. For others, you can think of it as a singly linked list:

  struct list {
    any head;
    list* tail{nullptr};
  };

Our algorithm for finding prime numbers is a simple sieve, represented in Haskell as:

  primesTo n = sieve [2..n]
               where 
               sieve (x:xs) =
                 x:(sieve $
                    filter (\a -> a 'mod' x /= 0)
                           xs)
               sieve [] = []

We aim to generate a list from 2 to n, then perform the following operations:

  1. Take the first element.
  2. Filter out elements divisible by it from the remaining list.
  3. Recursively apply the sieve to the rest and combine the result with the first element.
  4. Recursion stops at an empty list, yielding an empty list.

Currently, we don’t have a ‘filter’ in our toolbox. It’s defined in Listing 7. This template is a bit complex, so let me explain briefly.

template <template <typename> class Pred,
          typename List>
struct filter;

template <template <typename> class Pred,
          typename Head, typename Tail>
struct filter<Pred, list<Head, Tail>> {
  typedef typename conditional<
    Pred<Head>::value,
    list<Head,
         typename filter<Pred, Tail>::type>,
    typename filter<Pred, Tail>::type>::type
    type;
};

template <template <typename> class Pred>
struct filter<Pred, nil> {
  typedef nil type;
};
Listing 7

In the first section, we declare the ‘prototype’ of this template. It has two template parameters: the first is a class template used as the ‘predicate’ for filtering, and the second is a normal type representing the list to filter.

In the second section, we implement the main ‘overload’ of this ‘metafunction’ with partial specialization. We require the second parameter to match the form list<Head, Tail>, naturally separating the ‘head’ and ‘tail’ of the list. We then apply the predicate to the ‘head’ to check the condition. If the condition is met, our result type is Head concatenated with the filtered result of Tail; otherwise, Head is discarded, and the result type is just the filtered result of Tail.

The third section is the recursion termination condition for this ‘metafunction’. When we reach nil, i.e. the list is empty, the result type is also this nil marker, representing an empty list.

Similarly, we need a tool to generate sequences:

  template <int First, int Last>
  struct range {
    typedef list<
      integral_constant<int, First>,
      typename range<First + 1, Last>::type>
      type;
  };
  
  template <int Last>
  struct range<Last, Last> {
    typedef nil type;
  };

We can now write down the metaprogramming algorithm to find primes (Listing 8).

template <typename T>
struct sieve_prime;

template <typename Head, typename Tail>
struct sieve_prime<list<Head, Tail>> {
  template <typename T>
  struct is_not_divisible
    : integral_constant<
        bool, (T::value % Head::value) != 0> {};

  typedef list<
    Head,
    typename sieve_prime<typename filter<
      is_not_divisible, Tail>::type>::type>
    type;
};

template <>
struct sieve_prime<nil> {
  typedef nil type;
};
Listing 8

If we want the code to compiler under C++98, the intuitive alias templates can’t be used. But we can simulate generating the final result type with inheritance:

  template <int N>
  struct primes_to
    : sieve_prime<
        typename range<2, N + 1>::type>::type {};

You can output or further process this result type as needed. More details are available in my online code at https://godbolt.org/z/xE7MKca4s.

After seeing the power of template metaprogramming, you might naturally wonder: How much can be done with template metaprogramming?

The answer, as you might guess, is almost anything. C++ templates are Turing complete [Veldhuizen03], meaning you can theoretically perform any computation through template metaprogramming.

Of course, there’s always a difference between theory and practice. There are things we don’t want or can’t do at compile time. Even if we want to and can, some tasks aren’t convenient with template metaprogramming.

Template metaprogramming doesn’t allow any side effects. We can’t have input/output access in template code. It can only handle types and compile-time constants. Even if you could access input/output, it wouldn’t make sense: we want to display a user interface, accept user input, and read/write databases when the program is running – not when it’s compiled.

As mentioned earlier, template metaprogramming is a form of ‘functional’ programming. I have done a lot just to generate something equivalent to the Racket/Scheme code in Listing 9.

(define (sieve-prime lst)
  (cond
    [(null? lst) '()]
    [else (let ([n (car lst)])
          (let ([is-not-divisible
                 (lambda (m)
                   (not
                    (= (remainder m n) 0)))])
          (cons n (sieve-prime
                   (filter is-not-divisible
                           (cdr lst))))))]))

(define (primes-to n)
  (sieve-prime (range 2 (+ n 1))))
Listing 9

Template metaprogramming is a primitive form of functional programming, and C++ compilers aren’t optimized for it. In fact, the Haskell and Scheme code I have shown runs faster than compiling the previous C++ code. Besides the fact that many algorithms are better suited to imperative styles, template metaprogramming is not ideal for compile-time tasks. We can easily stress C++ compilers to their limits, and complicated template metaprograms may work on one compiler and fail on another. Additionally, poorly written template code can even crash the compiler or system, as issues that occur at run time may arise at compile time now. Due to the undecidability of the halting problem, compilers just can’t catch every issue in compile-time code. ☺

constexpr

Now let’s discuss the new constexpr feature introduced in C++11, which allows compile-time programming without using template metaprogramming.

First, while the code I wrote earlier works, there is a minor syntax issue. In the following code (see https://godbolt.org/z/TW4GTKPYc):

  template <typename T>
  void print_value(const T& value)
  {
    …
  }
  print_value(factorial<10>::value);

the final link step will fail, which can be frustrating. In the era of C++98, the workaround was to use enum instead of static const int, an inelegant hack. The proper ‘modern C++’ solution is constexpr. Apart from the C++17 improvement on definition rules, a constexpr ‘variable’ explicitly indicates a compile-time constant. This is a key use of constexpr (https://godbolt.org/z/c3Gh13eWb).

Another key use of constexpr is in constexpr functions. Here is a recursive version that works in C++11:

  constexpr int factorial(int n)
  {
    return n == 0 ? 1 : n * factorial(n - 1);
  }

In C++14, one can write a more conventional iterative version (see https://godbolt.org/z/3PE43PTn4):

  constexpr int factorial(int n)
  {
    int result = 1;
    for (int i = 2; i <= n; ++i)
      result *= i;
    return result;
  }

C++17 further loosened restrictions, allowing many functions, including most member functions of array, to be marked constexpr for compile-time use. I would like to emphasize that array is an important compile-time tool, as we can get the size of this container and access all its elements at compile time.

Now we can really try the standard sieve algorithm:

  template <int N>
  constexpr auto sieve_prime()
  {
    array<bool, N + 1> sieve{};
    for (int i = 2; i <= N; ++i)
      sieve[i] = true;
    for (int p = 2; p * p <= N; p++)
      if (sieve[p])
        for (int i = p * p; i <= N; i += p)
          sieve[i] = false;
    return sieve;
  }

This function generates a sieve. To find the number of primes up to N, we need to count them. In C++20, we can use the standard count algorithm, but, in earlier versions, we need to implement it ourselves:

  template <size_t N>
  constexpr size_t
  prime_count(const array<bool, N>& sieve)
  {
    size_t count = 0;
    for (size_t i = 2; i < sieve.size(); ++i)
      if (sieve[i])
        ++count;
    return count;
  }

Converting the final result into an array is now simple:

  template <int N>
  constexpr auto get_prime_array()
  {
    constexpr auto sieve = sieve_prime<N>();
    array<int, prime_count(sieve)> result{};
    for (size_t i = 2, j = 0; i < sieve.size();
         ++i)
      if (sieve[i]) {
        result[j] = i;
        ++j;
      }
    return result;
  }

See also https://godbolt.org/z/hjezWEf7v.

For almost any C++ programmer, this kind of code is likely easier to understand than template metaprogramming. It is also easier for compilers. MSVC and Apple Clang failed on my template metaprogramming code for calculating primes with an N of 1000, while GCC took over a second to compile it. In contrast, the above code handles N of 10000 happily across all three compilers. GCC compiles it in just 0.7 seconds for N at 10000. Compiling the template code for N at 10000 with GCC takes over two minutes and uses several gigabytes of memory. The time complexity of template metaprogramming is not linear.

The only ‘unnatural’ part is that N cannot be passed as a regular function parameter, but only as a template parameter. Even in constexpr or consteval (C++20) functions, function parameters aren’t considered compile-time constants and can’t be used where compile-time constants are required.

For compile-time evaluation, the array variable must be initialized immediately upon declaration to avoid indeterminate values. As of C++20, this requirement is relaxed; the array still can’t contain indeterminate values, but you can declare it without {} and initialize it later.

C++20 brings significant improvements for compile-time programming, such as:

  1. Using vector and string at compile time
  2. Using strings and custom-type objects as template parameters

These features offer greater flexibility, removing the need to pass lengths as template parameters. However, there is a big limitation: the vector or string results can’t be directly used at run time. I won’t go into detail, but interested readers can check out my online code example at https://godbolt.org/z/6c833fE4r.

Variadic templates

Another major improvement in compile-time programming is variadic templates, introduced in C++11 and enhanced in later versions.

Variadic templates have two main uses:

  • Forwarding a variable number of arguments to other functions, often with forwarding references
  • Iterating over arguments using recursion or fold expressions

The second use is particularly important for compile-time programming. Here is a simple example to check if any provided arguments are null pointers:

  template <typename... Args>
  constexpr bool is_any_null(const Args&... args)
  {
    return (... || (args == nullptr));
  }

The significance of this approach is that, regardless of the number or types of arguments, the compiler can expand and usually inline them. This simplifies repetitive code and offers new automation possibilities.

By using macro techniques, we can inject metadata into structs. Then, with fold expressions and compile-time programming, we can achieve static reflection capabilities (see Mozi [Mozi] for a complete implementation). For instance, Listing 10 is a generic function to iterate over struct fields.

template <typename T, typename F, size_t... Is>
constexpr void
for_each_impl(T&& obj, F&& f,
              std::index_sequence<Is...>)
{
  using DT = std::decay_t<T>;
  (void(std::forward<F>(f)(
     typename DT::template _field<T, Is>(obj)
       .name(),
     typename DT::template _field<T, Is>(obj)
       .value())),
   ...);
}

template <
  typename T, typename F,
  std::enable_if_t<
    is_reflected_struct_v<std::decay_t<T>>,
    int> = 0>

constexpr void for_each(T&& obj, F&& f)
{
  using DT = std::decay_t<T>;
  for_each_impl(
    std::forward<T>(obj), std::forward<F>(f),
    std::make_index_sequence<DT::_size>{});
}
Listing 10

With tools like for_each, implementing more features becomes easy. For example, the function in Listing 11 can be used to print all fields in a struct. (See https://godbolt.org/z/saejMW6Pq.)

template <typename T>
void print(const T& obj,
           std::ostream& os = std::cout,
           const char* fieldName = "",
           int depth = 0)
{
  if constexpr (is_reflected_struct_v<T>) {
    os << indent(depth) << fieldName
       << (*fieldName ? ": {\n" : "{\n");
    for_each(
      obj, [depth, &os](const char* fieldName,
                        const auto& value) {
        print(value, os, fieldName, depth + 1);
      });
    os << indent(depth) << "}"
       << (depth == 0 ? "\n" : ",\n");
  } else {
    os << indent(depth) << fieldName << ": "
       << obj << ",\n";
  }
}
Listing 11

This function uses the for_each function, generic lambda expressions, and compile-time conditional statements (if constexpr) to ‘iterate’ over different fields of a reflected struct, which can have different types. It can correctly handle nested reflected structs.

With a similar method, we can generically implement comparison, copying, and other functions for reflected structs. Writing such code isn’t easy, but the result is impressive, with performance far exceeding reflection features in languages like Java. This is because the compiler can statically expand the struct when compiling functions like for_each, making the generated code equivalent to manually written code that processes each field individually!

Static reflection under standardization

The code works, but we must use special macros to define structs for it to function. With standardized static reflection, we’d be able to write a generic print function template (see Listing 12) without special definition forms or the for_each function (using the syntax from P2996 [P2996r5]).

template <typename T>
void print(const T& obj,
           ostream& os = cout,
           const char* name = "", int depth = 0)
{
  if constexpr (is_class_v<T>) {
    os << indent(depth) << name
       << (*name ? ": {\n" : "{\n");
    template for (constexpr meta::info member :
        meta::nonstatic_data_members_of(^T)) {
      print(obj.[:member:], os,
            meta::name_of(member),
            depth + 1);
    }
    os << indent(depth) << "}"
       << (depth == 0 ? "\n" : ",\n");
  } else {
    os << indent(depth) << name << ": " << obj
       << ",\n";
  }
}
Listing 12

While there is an experimental implementation for P2996 [clang], an online implementation of an early proposal, P2320 [P2320r0], is conveniently available in Compiler Explorer. And I’ll use it to demonstrate print actually works at https://cppx.godbolt.org/z/c7a4jfo74.

Here are some key points (notice that the syntax is subject to changes):

  • ^T is the proposed reflection syntax to get a compile-time reflection object.
  • [:expr:] reverses the reflection, converting it back to a C++ type or expression; [:^T:] gets us back T.
  • template for is a compile-time loop for iterating over objects during compilation [P1306R2], eliminating the need for generic lambdas and for_each.
  • The std::meta namespace provides tools for compile-time processing:
    • info is a general reflection object.
    • members_of retrieves all members of a type.
    • nonstatic_data_members_of extracts non-static data members.
    • name_of gets the member’s name.

Put together, the outcome is:

  • For class types (including structs), it outputs a left brace, recursively calls print for all members, and outputs a right brace.
  • Otherwise, it simply outputs the field name and the content of non-static data members.

Unfortunately, this will only be available in C++26 (if not later).

References

[clang] clang-p2996: https://github.com/bloomberg/clang-p2996

[Ellis90] Margaret A. Ellis and Bjarne Stroustrup (1990) The Annotated C++ Reference Manual, Addison-Wesley.

[Mozi] Mozi: https://github.com/adah1972/mozi

[P1306R2] Andrew Sutton et al., ‘Expansion statements (revision 2)’, May 2024, http://wg21.link/p1306r2

[P2320r0] Andrew Sutton et al., ‘The Syntax of Static Reflection’, 2021, http://wg21.link/p2320r0

[P2996r5] Wyatt Childers et al., ‘Reflection for C++26’ (revision 5), August 2024, http://wg21.link/p2996r5

[Stroustrup88] Bjarne Stroustrup, ‘Parameterized Types for C++’, October 1988, https://www.usenix.org/legacy/publications/compsystems/1989/win_stroustrup.pdf

[Stroustrup94] Bjarne Stroustrup (1994) The Design and Evolution of C++, Addison-Wesley.

[Unruh94] Erwin Unruh, ‘Primzahlen’, 1994. http://www.erwin-unruh.de/primorig.html

[Veldhuizen03] Todd L. Veldhuizen, ‘C++ Templates are Turing Complete’, 2003, https://www.researchgate.net/publication/2475343_C_Templates_are_Turing_Complete

Footnotes

  1. Links to godbolt.org are to online examples, supplementing this article.

Wu Yongwei Having been a programmer and software architect, Yongwei is currently a consultant and trainer on modern C++. He has nearly 30 years’ experience in systems programming and architecture in C and C++. His focus is on the C++ language, software architecture, performance tuning, design patterns, and code reuse. He has a programming page at http://wyw.dcweb.cn/.






Your Privacy

By clicking "Accept Non-Essential Cookies" you agree ACCU can store non-essential cookies on your device and disclose information in accordance with our Privacy Policy and Cookie Policy.

Current Setting: Non-Essential Cookies REJECTED


By clicking "Include Third Party Content" you agree ACCU can forward your IP address to third-party sites (such as YouTube) to enhance the information presented on this site, and that third-party sites may store cookies on your device.

Current Setting: Third Party Content EXCLUDED



Settings can be changed at any time from the Cookie Policy page.