Compile-time strings have been used in many projects over the years. Wu Yongwei summarises his experience.
std::string
is mostly unsuitable for compile-time string manipulations.
There are several reasons:
- Before C++20, one could not use
string
s at all at compile time. In addition, the major compilers didn’t start to support compile-timestring
s until quite late. MSVC [MSVC] was the front runner in this regard, GCC [GCC] came second with GCC 12, and Clang [Clang] came last with Clang 15 (released a short while ago). - With C++20 one can use
string
s at compile time, but there are still a lot of inconveniences, the most obvious being thatstring
s generated at compile time cannot be used at run time. Besides, astring
cannot be declaredconstexpr
. - A
string
cannot be used as a template argument.
So we have to give up this apparent choice, but explore other possibilities. The candidates are:
- const
char
pointer, which is what a string literal naturally decays to string_view
, a powerful tool added by C++17: it has similar member functions to those ofstring
, but they are mostly marked asconstexpr
!array
, with which we can generate brand-new strings
We will try these types in the following discussion.
Functions commonly needed
Getting the string length
One of the most basic functions on a string is getting its length. Here we cannot use the C function strlen
, as it is not constexpr
.
We will try several different ways to implement it.
First, we can implement strlen
manually, and mark the function constexpr
(see Listing 1).
namespace strtools { constexpr size_t length(const char* str) { size_t count = 0; while (*str != '\0') { ++str; ++count; } return count; } } // namespace strtools |
Listing 1 |
However, is there an existing mechanism to retrieve the length of a string in the standard library? The answer is a definite Yes. The standard library does support getting the length of a string of any of the standard character types, like char
, wchar_t
, etc. With the most common character type char
, we can write:
constexpr size_t length(const char* str) { return char_traits<char>::length(str); }
It’s been possible to use char_traits
methods at compile time since C++17. (However, you may encounter problems with older compiler versions, like GCC 8.)
Assuming you can use C++17, string_view
is definitely worth a try:
constexpr size_t length(string_view sv) { return sv.size(); }
Regardless of the approach used, now we can use the following code to verify that we can indeed check the length of a string at compile time:
static_assert(strtools::length("Hi") == 2);
At present, the string_view
implementation seems the most convenient.
Finding a character
Finding a specific character is also quite often needed. We can’t use strchr
, but again, we can choose from a few different implementations. The code is pretty simple, whether implemented with char_traits
or with string_view
.
Here is the version with char_traits
:
constexpr const char* find(const char* str, char ch) { return char_traits<char>::find( str, length(str), ch); }
Here is the version with string_view
:
constexpr string_view::size_type find(string_view sv, char ch) { return sv.find(ch); }
I am not going to show the manual lookup code this time. (Unless you have to use an old compiler, simpler is better.)
Comparing strings
The next functions are string comparisons. Here string_view
wins hands down: string_view
supports the standard comparisons directly, and you do not need to write any code.
Getting substrings
It seems that string_view
s are very convenient, and we should use string_view
s wherever possible. However, is string_view::substr
suitable for getting substrings? This is difficult to answer without an actual usage scenario. One real scenario I encountered in projects was that the __FILE__
macro may contain the full path at compile time, resulting in different binaries when compiling under different paths. We wanted to truncate the path completely so that the absolute paths would not show up in binaries.
My tests showed that string_view::substr
could not handle this job. With the following code:
puts("/usr/local"sv.substr(5).data());
we will see assembly output like the following from the compiler on [Godbolt] (at https://godbolt.org/z/1dssd96vz):
.LC0: .string "/usr/local" … mov edi, OFFSET FLAT:.LC0+5 call puts
We have to find another way.
Let’s try array
. It’s easy to think of code like the following:
constexpr auto substr(string_view sv, size_t offset, size_t count) { array<char, count + 1> result{}; copy_n(&sv[offset], count, result.data()); return result; }
The intention of the code should be very clear: generate a brand-new character array
of the requested size and zero it out (constexpr
variables had to be initialized on declaration before C++20); copy what we need; and then return the result. Unfortunately, the code won’t compile.
There are two problems in the code:
- Function parameters are not
constexpr
, and cannot be used as template arguments. copy_n
was notconstexpr
before C++20, and cannot be used in compile-time programming.
The second problem is easy to fix: a manual loop will do. We shall focus on the first problem.
A constexpr
function can be evaluated at compile time or at run time, so its function arguments are not treated as compile-time constants, and cannot be used in places where compile-time constants are required, such as template arguments.
Furthermore, this problem still exists with the C++20 consteval
function, where the function is only invoked at compile time. The main issue is that if we allow function parameters to be used as compile-time constants, then we can write a function where its arguments of different values (same type) can produce return values of different types. For example (currently illegal):
consteval auto make_constant(int n) { return integral_constant<int, n>{}; }
This is unacceptable in the current type system: we still require that the return values of a function have a unique type. If we want a value to be used as a template argument inside a function, it must be passed to the function template as a template argument (rather than as a function argument to a non-template function). In this case, each distinct template argument implies a different template specialization, so the issue of a multiple-return-type function does not occur.
By the way, a standard proposal P1045 [Stone19] tried to solve this problem, but its progress seems stalled. As there are workarounds (to be discussed below), we are still able to achieve the desired effect.
Let’s now return to the substr
function and convert the count
parameter into a template parameter. Listing 2 is the result
template <size_t Count> constexpr auto substr(string_view sv, size_t offset = 0) { array<char, Count + 1> result{}; for (size_t i = 0; i < Count; ++i) { result[i] = sv[offset + i]; } return result; } |
Listing 2 |
The code can really work this time. With:
puts(substr<5>("/usr/local", 5).data())
we no longer see "/usr/"
in the compiler output.
Regretfully, we now see how compilers are challenged with abstractions: With the latest versions of GCC (12.2) and MSVC (19.33) on Godbolt, this version of substr
does not generate the optimal output. There are also some compatibility issues with older compiler versions. So, purely from a practical point of view, I recommend the implementation in Listing 3 that does not use string_view
.
template <size_t Count> constexpr auto substr(const char* str, size_t offset = 0) { array<char, Count + 1> result{}; for (size_t i = 0; i < Count; ++i) { result[i] = str[offset + i]; } return result; } |
Listing 3 |
If you are interested, you can compare the assembly outputs of these two different versions of the code:
Only Clang is able to generate the same efficient assembly code with both versions:
mov word ptr [rsp + 4], 108 mov dword ptr [rsp], 1633906540 mov rdi, rsp call puts
If you don’t understand why the numbers 108 and 1633906540 are there, let me remind you that the hexadecimal representations of these two numbers are 0x6C and 0x61636F6C, respectively. Check the ASCII table and you should be able to understand.
Since we have stopped using string_view
in the function parameters, the parameter offset
has become much less useful. Hence, I will get rid of this parameter, and rename the function to copy_str
(Listing 4).
template <size_t Count> constexpr auto copy_str(const char* str) { array<char, Count + 1> result{}; for (size_t i = 0; i < Count; ++i) { result[i] = str[i]; } return result; } |
Listing 4 |
Passing arguments at compile time
When you try composing the compile-time functions together, you will find something lacking. For example, if you wanted to remove the first segment of a path automatically (like from "/usr/local"
to "local"
), you might try some code like Listing 5.
constexpr auto remove_head(const char* path) { if (*path == '/') { ++path; } auto start = find(path, '/'); if (start == nullptr) { return copy_str<length(path)>(path); } else { return copy_str<length(start + 1) >(start + 1); } } |
Listing 5 |
The problem is still that it won’t compile. And did you notice that this code violates exactly the constraint I mentioned above that the return type of a function must be consistent and unique?
I have adopted a solution described by Michael Park [Park17]: using lambda expressions to encapsulate ‘compile-time arguments’. I have defined three macros for convenience and readability:
#define CARG typename #define CARG_WRAP(x) [] { return (x); } #define CARG_UNWRAP(x) (x)()
CARG
means ‘constexpr argument’, a compile-time constant argument. We can now make make_constant
really work:
template <CARG Int> constexpr auto make_constant(Int cn) { constexpr int n = CARG_UNWRAP(cn); return integral_constant<int, n>{}; }
And it is easy to verify that it works:
auto result = make_constant(CARG_WRAP(2)); static_assert( std::is_same_v<integral_constant<int, 2>, decltype(result)>);
A few explanations follow. In the template parameter, I use CARG
(instead of typename
) for code readability: it indicates the intention that the template parameter is essentially a type wrapper for compile-time constants. Int
is the name of this special type. We will not provide this type when instantiating the function template, but instead let the compiler deduce it.
When calling the ‘function’ (make_constant(CARG_WRAP(2))
), we provide a lambda expression ([] { return (2); }
), which encapsulates the constant we need. When we need to use this parameter, we use CARG_UNWRAP
(evaluate [] { return (2); }()
) to get the constant back.
Now we can rewrite the remove_head
function (Listing 6).
template <CARG Str> constexpr auto remove_head(Str cpath) { constexpr auto path = CARG_UNWRAP(cpath); constexpr int skip = (*path == '/') ? 1 : 0; constexpr auto pos = path + skip; constexpr auto start = find(pos, '/'); if constexpr (start == nullptr) { return copy_str<length(pos)>(pos); } else { return copy_str<length(start + 1)>(start + 1); } } |
Listing 6 |
This function is similar in structure to the previous version, but there are many detail changes. In order to pass the result to copy_str
as a template argument, we have to use constexpr
all the way along. So we have to give up mutability, and write code in a quite functional style.
Does it really work? Let’s put the following statement into the main
function:
puts(strtools::remove_head( CARG_WRAP("/usr/local")) .data());
And here is the optimized assembly output from GCC on x86-64 (see https://godbolt.org/z/Mv5YanPvq):
main: sub rsp, 24 mov eax, DWORD PTR .LC0[rip] lea rdi, [rsp+8] mov DWORD PTR [rsp+8], eax mov eax, 108 mov WORD PTR [rsp+12], ax call puts xor eax, eax add rsp, 24 ret .LC0: .byte 108 .byte 111 .byte 99 .byte 97
As you can see clearly, the compiler will put the ASCII codes for "local"
on the stack, assign its starting address to the rdi register, and then call the puts
function. There is absolutely no trace of "/usr/"
in the output. In fact, there is no difference between the output of the puts
statement above and that of puts(substr<5>("/usr/local", 5).data())
.
I would like to remind you that it is safe to pass and store the character array
, but it is not safe to store the pointer obtained from its data()
method. It is possible to use such a pointer immediately in calling other functions (like puts
, above), as the lifetime of array
will extend till the current statement finishes execution. However, if you saved this pointer, it would become dangling after the current statement, and dereferencing it would then be undefined behaviour.
String template parameters
We have tried turning strings into types (via lambda expressions) for compile-time argument passing, but unlike integers and integral_constant
s, there is no one-to-one correspondence between the two. This is often inconvenient: for two integral_constant
s, we can directly use is_same
to determine whether they are the same; for strings represented as lambda expressions, we cannot do the same – two lambda expressions always have different types.
Direct use of string literals as non-type template arguments is not allowed in C++, because strings may appear repeatedly in different translation units, and they do not have proper comparison semantics – comparing two strings is just a comparison of two pointers, which cannot achieve what users generally expect. To use string literals as template arguments, we need to find a way to pass the string as a sequence of characters to the template. We have two methods available:
- The non-standard GNU extension used by GCC and Clang (which can be used prior to C++20)
- The C++20 approach suitable for any conformant compilers (including GCC and Clang)
Let’s have a look one by one.
The GNU extension
GCC and Clang have implemented the standard proposal N3599 [Smith13], which allows us to use strings as template arguments. The compiler will expand the string into characters, and the rest is standard C++. Listing 7 is an example.
template <char... Cs> struct compile_time_string { static constexpr char value[]{Cs..., '\0'}; }; template <typename T, T... Cs> constexpr compile_time_string<Cs...> operator""_cts() { return {}; } |
Listing 7 |
The definition of the class template is standard C++, so that:
compile_time_string<'H', 'i'>
is a valid type and, at the same time, by taking the value
member of this type, we can get "Hi"
. The GNU extension is the string literal operator template – we can now write "Hi"_cts
to get an object of type compile_time_string<'H', 'i'>
. The following code will compile with the above definitions:
constexpr auto a = "Hi"_cts; constexpr auto b = "Hi"_cts; static_assert( is_same_v<decltype(a), decltype(b)>);
The C++20 approach
Though the above method is simple and effective, it failed to reach consensus in the C++ standards committee and did not become part of the standard. However, with C++20, we can use more types in non-type template parameters. In particular, user-defined literal types are amongst them. Listing 8 is an example.
template <size_t N> struct compile_time_string { constexpr compile_time_string( const char (&str)[N]) { copy_n(str, N, value); } char value[N]{}; }; template <compile_time_string cts> constexpr auto operator""_cts() { return cts; } |
Listing 8 |
Again, the first class template is not special, but allowing this compile_time_string
to be used as the type of a non-type template parameter (quite a mouthful ☺), as well as the string literal operator template, is a C++20 improvement. We can now write "Hi"_cts
to generate a compile_time_string
object. Note, however, that this object is of type compile_time_string<3>
, so "Hi"_cts
and "Ha"_cts
are of the same type – which is very different from the results of the GNU extension. However, the important thing is that compile_time_string
can now be used as type of a template parameter, so we can just add another layer:
template <compile_time_string cts> struct cts_wrapper { static constexpr compile_time_string str{cts}; };
Corresponding to the previous compile-time string type comparison, we now need to write:
auto a = cts_wrapper<"Hi"_cts>{}; auto b = cts_wrapper<"Hi"_cts>{}; static_assert( is_same_v<decltype(a), decltype(b)>);
Or we can further simplify it to (as compile_time_string
has a non-explicit
constructor):
auto a = cts_wrapper<"Hi">{}; auto b = cts_wrapper<"Hi">{}; static_assert( is_same_v<decltype(a), decltype(b)>);
They have proved to be useful in my real projects, and I hope they will help you too.
References
[Clang] https://clang.llvm.org/
[GCC] https://gcc.gnu.org/
[Godbolt] Matt Godbolt, Compiler Explorer, https://godbolt.org/
[MSVC] https://visualstudio.microsoft.com/
[Park17] Michael Park, ‘constexpr
function parameters’, May 2017, https://mpark.github.io/programming/2017/05/26/constexpr-function-parameters/
[Smith13] Richard Smith, ‘N3599: Literal operator templates for strings’, March 2013, http://wg21.link/n3599
[Stone19] David Stone, ‘P1045R1: constexpr
Function Parameters’, September 2019, https://wg21.link/p1045r1
Having been a programmer and software architect, Yongwei is currently a consultant and trainer on modern C++. He has nearly 30 years’ experience in systems programming and architecture in C and C++. His focus is on the C++ language, software architecture, performance tuning, design patterns, and code reuse. He has a programming page at http://wyw.dcweb.cn/