std::format allows us to format values quickly and safely. Spencer Collyer demonstrates how to provide formatting for a simple user-defined class.
In a previous article [Collyer21], [I gave an introduction to the std::format
library, which brings modern text formatting capabilities to C++.
That article concentrated on the output functions in the library and how they could be used to write the fundamental types and the various string types that the standard provides.
Being a modern C++ library, std::format
also makes it relatively easy to output user-defined types, and this series of articles will show you how to write the code that does this.
There are three articles in this series. This article describes the basics of setting up the formatting for a simple user-defined class. The second article will describe how this can be extended to classes that hold objects whose type is specified by the user of your class, such as containers. The third article will show you how to create format wrappers, special purpose classes that allow you to apply specific formatting to objects of existing classes.
A note on the code listings: The code listings in this article have lines labelled with comments like // 1
. Where these lines are referred to in the text of this article, it will be as ‘line 1
’ for instance, rather than ‘the line labelled // 1’.
Interface changes
Since my previous article was first published, based on the draft C++20 standard, the paper [P2216] was published which changes the interface of the format
, format_to
, format_to_n
, and formatted_size
functions. They no longer take a std::string_view
as the format string, but instead a std::format_string
(or, for the wide-character overloads std::wformat_string
). This forces the format string to be a constant at compile time. This has the major advantage that compile time checks can be carried out to ensure it is valid.
The interfaces of the equivalent functions prefixed with v
(e.g. vformat
) has not changed and they can still take runtime-defined format specs.
One effect of this is that if you need to determine the format spec at runtime then you have to use the v
-prefixed functions and pass the arguments as an argument pack created with make_format_args
or make_wformat_args
. This will impact you if, for instance, you want to make your program available in multiple languages, where you would read the format spec from some kind of localization database.
Another effect is on error reporting in the functions that parse the format spec. We will deal with this when describing the parse
function of the formatter
classes described in this article.
C++26 and runtime_format
Forcing the use of the v
-prefixed functions for non-constant format specs is not ideal, and can introduce some problems. The original P2216 paper mentioned possible use of a runtime_format
to allow non-constant format specs but did not add any changes to enable that. A new proposal [P2918] does add such a function, and once again allows non-constant format specs in the various format
functions. This paper has been accepted into C++26, and the libstdc++
library that comes with GCC should have it implemented by the time you read this article, if you want to try it out.
Creating a formatter for a user-defined type
To enable formatting for a user-defined type, you need to create a specialization of the struct template formatter
. The standard defines this as:
template<class T, class charT = char> struct formatter;
where T
is the type you are defining formatting for, and charT
is the character type your formatter will be writing.
Each formatter
needs to declare two functions, parse
and format
, that are called by the formatting functions in std::format
. The purpose and design of each function is described briefly in the following sections.
Inheriting existing behaviour
Before we dive into the details of the parse
and format
functions, it is worth noting that in many cases you can get away with re-using existing formatters by inheriting from them. Normally, you would do this if the standard format spec does everything you want, so you can just use the inherited parse
function and write your own format
function that ultimately calls the one on the parent class to do the actual formatting.
For instance, you may have a class that wraps an int
to provide some special facilities, like clamping the value to be between min and max values, but when outputting the value you are happy to have the standard formatting for int
. In this case you can just inherit from std::formatter<int>
and simply override the format
function to call the one on that formatter, passing the appropriate values to it. An example of doing this is given in Listing 1.
#include <format> #include <iostream> #include <type_traits> class MyInt { public: MyInt(int i) : m_i(i) {}; int value() const { return m_i; }; private: int m_i; }; template<> struct std::formatter<MyInt> : public std::formatter<int> { using Parent = std::formatter<int>; auto format(const MyInt& mi, std::format_context& format_ctx) const { return Parent::format(mi.value(), format_ctx); } }; int main() { MyInt mi{1}; std::cout << std::format(“{0} [{0}]\n”, mi); } |
Listing 1 |
Or you may be happy for your formatter to produce a string representation of your class and use the standard string formatting to output that string. You would inherit from std::formatter<std::string>
and just override the format
function to generate your string representation and then call the parent format
function to actually output the value.
The parse function
The parse
function does the work of reading the format specification (format-spec) for the type.
It should store any formatting information from the format-spec in the formatter
object itself1.
As a reminder of what is actually being parsed, my previous article had the following for the general format of a replacement field:
‘{
’ [arg-id] [‘:
’ format-spec] ‘}
’
so the format-spec is everything after the :
character, up to but not including the terminating }
.
Assume we have a typedef PC
defined as follows:
using PC = basic_format_parse_context<charT>;
where charT
is the template argument passed to the formatter
template. Then the parse
function prototype looks like the following:
constexpr PC::iterator parse(PC& pc);
The function is declared constexpr
so it can be called at compile time.
The standard defines specialisations of the basic_format_parse_context
template called format_parse_context
and wformat_parse_context
, with charT
being char
and wchar_t
respectively.
On entry to the function, pc.begin()
points to the start of the format-spec for the replacement field being formatted. The value of pc.end()
is such as to allow the parse
function to read the entire format-spec. Note that the standard specifies that an empty format-spec can be indicated by either pc.begin() == pc.end()
or *pc_begin() == '}'
, so your code needs to check for both conditions.
The parse
function should process the whole format-spec. If it encounters a character it doesn’t understand, other than the }
character that indicates the format-spec is complete, it should report an error. The way to do this is complicated by the need to allow the function to be called at compile time. Before that change was made, it would be normal to throw a std::format_error
exception. You can still do this, with the proviso that the compiler will report an error, as throw
cannot be used when evaluating the function at compile time. Until such time as a workaround has been found for this problem, it is probably best to just throw the exception and allow the compiler to complain. That is the solution used in the code presented in this article.
If the whole format-spec is processed with no errors, the function should return an iterator pointing to the terminating }
character. This is an important point – the }
is not part of the format-spec and should not be consumed, otherwise the formatting functions themselves will throw an error.
Format specification mini-language
The format-spec for your type is written in a mini-language which you design. It does not have to look like the one for the standard format-specs defined by std::format
. There are no rules for the mini-language, as long as you can write a parse
function that will process it.
An example of a specialist mini-language is that defined by std::chrono
or its formatters, given for instance at [CppRef]. Further examples are given in the code samples that make up the bulk of this series of articles. There are some simple guidelines to creating a mini-language in the appendix at the end of this article: ‘Simple Mini-Language Guidelines’.
The format function
The format
function does the work of actually outputting the value of the argument for the replacement field, taking account of the format-spec that the parse
function has processed.
Assume we have a typedef FC
defined as follows:
using FC = basic_format_context<Out, charT>;
where Out
is an output iterator and charT
is the template argument passed to the formatter
template. Then the format
function prototype looks like the following:
FC::iterator format(const T& t, FC& fc) const;
where T
is the template argument passed to the formatter
template.
Note that the format
function should be const
-qualified. This is because the standard specifies that it can be called on a const
object.
The standard defines specialisations of the basic_format_context
template called format_context
and wformat_context
, with charT
being char
and wchar_t
respectively.
The function should format the value t
passed to it, using the formatting information stored by parse
, and the locale returned by fc.locale()
if it is locale-dependent. The output should be written starting at fc.out()
, and on return the function should return the iterator just past the last output character.
If you just want to output a single character, the easiest way is to write something like the following, assuming iter
is the output iterator and c
is the character you want to write:
*iter++ = c;
If you need more complex formatting than just writing one or two characters, the easiest way to create the output is to use the formatting functions already defined by std::format
, as they correctly maintain the output iterator.
The most useful function to use is std::format_to
, as that takes the iterator returned by fc.out()
and writes directly to it, returning the updated iterator as its result. Or if you want to limit the amount of data written, you can use std::format_to_n
.
Using the std::format
function itself has a couple of disadvantages. Firstly it returns a string which you would then have to send to the output. And secondly, because it has the same name as the function in formatter
, you have to use a std
namespace qualifier on it, even if you have a using namespace std;
line in your code, as otherwise function name resolution will pick up the format
function from the formatter rather than the std::format
one.
Formatting a simple object
For our first example we are going to create a formatter
for a simple Point
class, defined in Listing 2.
class Point { public: Point() {} Point(int x, int y) : m_x(x), m_y(y) {} const int x() const { return m_x; } const int y() const { return m_y; } private: int m_x = 0; int m_y = 0; }; |
Listing 2 |
Default formatting
Listing 3 shows the first iteration of the formatter
for Point
. This just allows default formatting of the object.
#include "Point.hpp" #include <format> #include <iostream> #include <type_traits> template<> struct std::formatter<Point> { constexpr auto parse( std::format_parse_context& parse_ctx) { auto iter = parse_ctx.begin(); auto get_char = [&]() { return iter != parse_ctx.end() ? *iter : 0; }; // 1 char c = get_char(); if (c != 0 && c != '}') // 2 { throw std::format_error( "Point only allows default formatting"); } return iter; } auto format(const Point& p, std::format_context& format_ctx) const { return std::format_to(std::move( format_ctx.out()), "{},{}", p.x(), p.y()); } }; int main() { Point p; std::cout << std::format("{0} [{0}]\n", p); try { std::cout << std::vformat("{0:s}\n", std::make_format_args(p)); } catch (std::format_error& fe) { std::cout << "Caught format_error : " << fe.what() << "\n"; } } |
Listing 3 |
In the parse
function, the lambda get_char
defined in line 1
acts as a convenience function for getting either the next character from the format-spec, or else indicating the format-spec has no more characters by returning zero. It is not strictly necessary in this function as it is only called once, but will be useful as we extend the format-spec later.
The if
-statement in line 2
checks that we have no format-spec defined. The value 0 will be returned from the call to get_char
if the begin
and end
calls on parse_ctx
return the same value.
The format
function has very little to do – it just returns the result of calling format_to
with the appropriate output iterator, format string, and details from the Point
object. The only notable thing to point out is that we wrap the format_ctx.out()
call which gets the output iterator in std::move
. This is in case the user is using an output that has move-only iterators.
Adding a separator character and width specification
Now we have seen how easy it is to add default formatting for a class, let’s extend the format specification to allow some customisation of the output.
The format specification we will use has the following form:
[sep] [width]
where sep is a single character to be used as the separator between the two values in the Point
output, and width is the minimum width of each of the two values. Both elements are optional. The sep character can be any character other than }
or a decimal digit.
The code for this example is in Listing 4.
#include "Point.hpp" #include <format> #include <iostream> using namespace std; template<> struct std::formatter<Point> { constexpr auto parse( format_parse_context& parse_ctx) { auto iter = parse_ctx.begin(); auto get_char = [&]() { return iter != parse_ctx.end() ? *iter : 0; }; char c = get_char(); if (c == 0 || c == '}') // 1 { return iter; } auto IsDigit = [](unsigned char uc) { return isdigit(uc); }; // 2 if (!IsDigit(c)) // 3 { m_sep = c; ++iter; if ((c = get_char()) == 0 || c == '}') //4 { return iter; } } auto get_int = [&]() { // 5 int val = 0; char c; while (IsDigit(c = get_char())) // 6 { val = val*10 + c-'0'; ++iter; } return val; }; if (!IsDigit(c)) // 7 { throw format_error("Invalid format " "specification for Point"); } m_width = get_int(); // 8 m_width_type = WidthType::Literal; if ((c = get_char()) != '}') // 9 { throw format_error("Invalid format " "specification for Point"); } return iter; } auto format(const Point& p, format_context& format_ctx) const { if (m_width_type == WidthType::None) { return format_to(std::move(format_ctx.out()), "{0}{2}{1}", p.x(), p.y(), m_sep); } return format_to(std::move(format_ctx.out()), "{0:{2}}{3}{1:{2}}", p.x(), p.y(), m_width, m_sep); } private: char m_sep = ‘,’; // 10 enum WidthType { None, Literal }; WidthType m_width_type = WidthType::None; int m_width = 0; }; int main() { Point p1(1, 2); cout << format("[{0}] [{0:/}] [{0:4}]" "[{0:/4}]\n", p1); } |
Listing 4 |
Member variables
The first point to note is that we now have to store information derived from the format-spec by the parse
function so the format
function can do its job. So we have a set of member variables in the formatter
defined from line 10
onwards.
The default values of these member variables are set so that if no format-spec is given, a valid default output will still be generated. It is a good idea to follow the same principle when defining your own formatter
s.
The parse function
The parse
function has expanded somewhat to allow parsing of the new format-spec. Line 1
gives a short-circuit if there is no format-spec defined, leaving the formatting as the default.
In the code following the check above we need to check if the character we have is a decimal digit. The normal way to do this is to use std::isdigit
, but because this function has undefined behaviour if the value passed cannot be represented as an unsigned char
, we define lambda IsDigit
at line 2
as a wrapper which ensures the value passed to isdigit
is an unsigned char
.
As mentioned above, any character that is not }
or a decimal digit is taken as being the separator. The case of }
has been dealt with by line 1
already. The if
-statement at line 3
checks for the second case. If we don’t have a decimal digit character, the value in c
is stored in the member variable. We need to increment iter
before calling get_char
in line 4
because get_char
itself doesn’t touch the value of iter
.
Line 4
checks to see if we have reached the end of the format-spec after reading the separator character. Note that we check for the case where get_char
returns 0, which indicates we have reached the end of the format string, as well as the }
character that indicates the end of the format-spec. This copes with any problems where the user forgets to terminate the replacement field correctly. The std::format
functions will detect such an invalid condition and throw a std::format_error
exception.
The get_int
lambda function defined starting at line 5
attempts to read a decimal number from the format-spec. On entry iter
should be pointing to the start of the number. The while
-loop controlled by line 6
keeps reading characters until a non-decimal digit is found. In the normal case this would be the }
that terminates the format-spec. We don’t check in this function for which character it was, as that is done later. Note that as written, the get_int
function has undefined behaviour if a user uses a value that overflows an int
– a more robust version could be written if you want to check against users trying to define width values greater than the maximum value of an int
.
The check in line 7
ensures we have a width value. Note that the checks in lines 3
and 4
will have caused the function to return if we just have a sep element.
The width is read and stored in line 8
, with the following line indicating we have a width given.
Finally, line 9
checks that we have correctly read all the format-spec. This is not strictly necessary, as the std::format
functions will detect any failure to do so and throw a std::format_error
exception, but doing it here allows us to provide a more informative error message.
The format function
The format
function has changed to use the sep and width elements specified. It should be obvious what is going on, so we won’t go into it in any detail.
Specifying width at runtime
In this final example we will allow the width element to be specified at runtime. We do this by allowing a nested replacement field to be used, specified as in the standard format specification with either {}
or {
n}
, where n is an argument index.
The format specification for this example is identical to the one above, with the addition of allowing the width to be specified at runtime.
The code for this example is in Listing 5. When labelling the lines in this listing, corresponding lines in Listing 4 and Listing 5 have had the same labels applied. This does mean that some labels are not used in Listing 5 if there is nothing additional to say about those lines compared to Listing 4. We use uppercase letters for new labels introduced in Listing 5.
#include "Point.hpp" #include <format> #include <iostream> using namespace std; template<> struct std::formatter<Point> { constexpr auto parse(format_parse_context& parse_ctx) { auto iter = parse_ctx.begin(); auto get_char = [&]() { return iter != parse_ctx.end() ? *iter : 0; }; char c = get_char(); if (c == 0 || c == '}') { return iter; } auto IsDigit = [](unsigned char uc) { return isdigit(uc); }; if (c != '{' && !IsDigit(c)) // 3 { m_sep = c; ++iter; if ((c = get_char()) == 0 || c == '}') { return iter; } } auto get_int = [&]() { int val = 0; char c; while (IsDigit(c = get_char())) { val = val*10 + c-'0'; ++iter; } return val }; if (!IsDigit(c) && c != '{') // 7 { throw format_error("Invalid format " "specification for Point"); } if (c == '{') // A { m_width_type = WidthType::Arg; // B ++iter; if ((c = get_char()) == '}') // C { m_width = parse_ctx.next_arg_id(); } else // D { m_width = get_int(); parse_ctx.check_arg_id(m_width); } ++iter; } else // E { m_width = get_int(); // 8 m_width_type = WidthType::Literal; } if ((c = get_char()) != '}') { throw format_error("Invalid format " "specification for Point"); } return iter; } auto format(const Point& p, format_context& format_ctx) const { if (m_width_type == WidthType::None) { return format_to(std::move(format_ctx.out()), "{0}{2}{1}", p.x(), p.y(), m_sep); } if (m_width_type == WidthType::Arg) // F { m_width = get_arg_value(format_ctx, m_width); } return format_to(std::move(format_ctx.out()), "{0:{2}}{3}{1:{2}}", p.x(), p.y(), m_width, m_sep); } private: int get_arg_value(format_context& format_ctx, int arg_num) const // G { auto arg = format_ctx.arg(arg_num); // H if (!arg) { string err; back_insert_iterator<string> out(err); format_to(out, "Argument with id {} not " "found for Point", arg_num); throw format_error(err); } int width = visit_format_arg([] (auto value) -> int { // I if constexpr ( !is_integral_v<decltype(value)>) { throw format_error("Width is not " "integral for Point”); } else if (value < 0 || value > numeric_limits<int>::max()) { throw format_error("Invalid width for " Point"); } else { return value; } }, arg); return width; } private: mutable char m_sep = ','; enum WidthType { None, Literal, Arg }; mutable WidthType m_width_type = WidthType::None; mutable int m_width = 0; }; int main() { Point p1(1, 2); cout << format( "[{0}] [{0:-}] [{0:4}] [{0:{1}}]\n", p1, 4); cout << format( "With automatic indexing: [{:{}}]\n", p1, 4); try { cout << vformat("[{0:{2}}]\n", std::make_format_args(p1, 4)); } catch (format_error& fe) { cout << format("Caught exception: {}\n", fe.what()); } } |
Listing 5 |
Nested replacement fields
The standard format-spec allows you to use nested replacement fields for the width and prec fields. If your format-spec also allows nested replacement fields, the basic_format_parse_context
class has a couple of functions to support their use: next_arg_id
and check_arg_id
. They are used in the parse
function for Listing 5, and a description of what they do will be given in that section.
The parse function
The first change in the parse
function is on line 3
. As can be seen, in the new version, it has to check for the {
character as well as for a digit when checking if a width has been specified. This is because the dynamic width is specified using a nested replacement field, which starts with a {
character.
The next difference is in line 7
, where we again need to check for a {
character as well as a digit to make sure we have a width specified.
The major change to this function starts at line A
. This if
-statement checks if the next character is a {
, which indicates we have a nested replacement field. If the test passes, line B
marks that we need to read the width from an argument, and then we proceed to work out what the argument index is.
The if
-statement in line C
checks if the next character is a }
, which means we are using automatic indexing mode. If the test passes, we call the next_arg_id
function on parse_ctx
to get the argument number. That function first checks if manual indexing mode is in effect, and if it is it throws a format_error
exception, as you cannot mix manual and automatic indexing. Otherwise, it enters automatic indexing mode and returns the next argument index, which in this case is assigned to the m_width
variable.
If the check in line C
fails, we enter the else
-block at line D
to do manual indexing. We get the argument number by calling get_int
, and then we call the check_arg_id
function on parse_ctx
. The function checks if automatic indexing mode is in effect, and if so it throws a format_error
exception. If automatic indexing mode is not in effect then check_arg_id
enters manual indexing mode.
The else
-block starting at line E
just handles the case where we have literal width specified in the format-spec, and is identical to the code starting at line 8
in Listing 4.
Note that when used at compile time, next_arg_id
or check_arg_id
check that the argument id returned (for next_arg_id
) or supplied (for check_arg_id
) is within the range of the arguments, and if not will fail to compile. However, this is not done when called at runtime.
The format function
The changes to the format
function are just the addition of the if
-statement starting at line F
. This checks if we need to read the width value from an argument, and if so it calls the get_arg_value
function to get the value and assign it to the m_width
variable, so the format_to
call following can use it.
The get_arg_value function
The get_arg_value
function, defined starting at line G
, does the work of actually fetching the width value from the argument list.
Line H
tries to fetch the argument from the argument list. If the argument number does not represent an argument in the list, it returns a default constructed value. The following if
-statement checks for this, and reports the error if required. Note that in your own code you might want to disable or remove any such checks from production builds, but have them in debug/testing builds.
If the argument is picked up correctly, line I
uses the visit_format_arg
function to apply the lambda function to the argument value picked up in line H
. The visit_format_arg
function is part of the std::format
API. The lambda function checks that the value passed is of the correct type – in this case, an integral type – and that its value is in the allowed range. Failure in either case results in a format_error
exception. Otherwise, the lambda returns the value passed in, which is used as the width.
Summary
We have seen how to add a formatter
for a user-defined class, and gone as far as allowing the user to specify certain behaviour (in our case the width) at runtime. We will stop at this point as we’ve demonstrated what is required, but there is no reason why a real-life Point class couldn’t have further formatting abilities added.
In the next article in the series, we will explain how you can write a formatter for a container class, or any other class where the types of some elements of the class can be specified by the user.
Appendix: Simple mini-language guidelines
As noted when initially describing the parse
function of the formatters, the format-spec you parse is created using a mini-language, the design of which you have full control over. This appendix offers some simple guidelines to the design of your mini-language.
Before giving the guidelines, I’d like to introduce some terminology. These are not ‘official’ terms but hopefully will make sense.
- An element of a mini-language is a self-contained set of characters that perform a single function. In the standard format-spec most elements are single characters, except for the width and prec values, and the combination of fill and align.
- An introducer is a character that says the following characters make up a particular element. In the standard format-spec the ‘
.
’ at the start of the prec element is an introducer.
Remember, the following are guidelines, not rules. Feel free to bend or break them if you think you have a good reason for doing so.
Enable a sensible default
It should be possible to use an empty format-spec and obtain sensible output for your type. Then the user can just write {}
in the format string and get valid output. Effectively this means that every element of your mini-language should be optional, and have a sensible default.
Shorter is better
Your users are going to be using the mini-language each time they want to do non-default outputting of your type. Using single characters for the elements of the language is going to be a lot easier to use than having to type whole words.
Keep it simple
Similar to the above, avoid having complicated constructions or interactions between different elements in your mini-language. A simple interaction, like in the standard format-spec where giving an align element causes any subsequent ‘0
’ to be ignored, is fine, but having multiple elements interacting or controlling others is going to lead to confusion.
Make it single pass
It should be possible to parse the mini-language in a single pass. Don’t have any constructions which necessitate going over the format-spec more than once. This should be helped by following the guideline above to ‘Keep it simple’. This is as much for ease of programming the parse
function as it is for ease of writing format-specs.
Avoid ambiguity
If it is possible for two elements in your mini-language to look alike then you have an ambiguity. If you cannot avoid this, you need a way to make the second element distinguishable from the first.
For instance, in the standard format-spec, the width and prec elements are both integer numbers, but the prec element has ‘.
’ as an introducer so you can always tell what it is, even if no width is specified.
Use nested-replacement fields like the standard ones
If it makes sense to allow some elements (or parts of elements) to be specified at run-time, use nested replacement fields that look like the ones in the standard format-spec to specify them, i.e. {
and }
around an optional number.
Avoid braces
Other than in nested replacement fields, avoid using braces (`{` and `}`) in your mini-language, except in special circumstances.
References
[Collyer21] Spencer Collyer (2021) ‘C++20 Text Formatting – An Introduction’ in Overload 166, December 2021, available at: https://accu.org/journals/overload/29/166/collyer/
[CppRef] std::formatter<std::chrono::systime>
: https://en.cppreference.com/w/cpp/chrono/system_clock/formatter
[P2216] P2216R3 – std::format improvements, Victor Zverovich, 5 Feb 2021, https://wg21.link/P2216
[P2918] P2918R2 – Runtime format strings II, Victor Zverovich, 7 Nov 2023, https://wg21.link/P2918
Footnote
- There is nothing stopping you storing the formatting information in a class variable or even a global variable, but the standard specifies that the output of the
format
function in theformatter
should only depend on the input value, the locale, and the format-spec as parsed by the last call toparse
. Given these constraints, it is simpler to just store the formatting information in theformatter
object itself.
Spencer has been programming for more years than he cares to remember, mostly in the financial sector, although in his younger years he worked on projects as diverse as monitoring water treatment works on the one hand, and television programme scheduling on the other.