User-Defined Formatting in std::format – Part 2

User-Defined Formatting in std::format – Part 2

By Spencer Collyer

Overload, 32(181):4-8, June 2024


Last time, we saw how to provide formatting for a simple user-defined class. Spencer Collyer builds on this, showing how to write a formatter for more complicated types.

In the previous article in this series [Collyer24], I showed how to write a class to format user-defined classes using the std::format library. In this article I will describe how this can be extended to container classes or any other class that holds objects whose type is specified by the user of your class.

A note on the code listings: The code listings in this article have lines labelled with comments like // 1. Where these lines are referred to in the text of this article, it will be as ‘line 1’ for instance, rather than ‘the line labelled // 1’.

Nested formatter objects

The objects created from the formatter template structs are just ordinary C++ objects – there is nothing special about them 1. In particular, there is nothing to stop you including an object of a formatter template type inside one of your user-defined formatter structs.

You might wonder why you would want to do that. One simple case is if you have a templated container class, and want to create a formatter that can output the container in one go, rather than having to write code to iterate over the container and output each value in turn. Having a nested formatter for the contained value type allows you to do this and allow the values to be formatted differently to the default, as the following examples will show. Other uses will no doubt come to mind for your own classes.

A formatter for std::vector

The first example we will look at is a simple formatter for std::vector. The code is given in Listing 1, and sample output is in Listing 2.

#include <format>
#include <iostream>
#include <vector>

using namespace std;

template<typename T>
struct std::formatter<vector<T>>
{
  constexpr auto 
    parse(format_parse_context& parse_ctx)  {
    auto iter = parse_ctx.begin();
    auto get_char = [&]() { return iter
      != parse_ctx.end() ? *iter : 0; };
    char c = get_char();
    if (c == 0 || c == '}')   // 1
    {
      m_val_fmt.parse(parse_ctx);   // 2
      return iter;
    }
    auto get_next_char = [&]() {   // 3
      ++iter;
      char vc = get_char();
      if (vc == 0)
      {
        throw format_error(
          "Invalid vector format specification");
      }
      return vc;
    };
    if (c == 'w')   // 4
    {
      m_lc = get_next_char();
      m_rc = get_next_char();
      ++iter;
    }
    if ((c = get_char()) == 's')   // 5
    {
       m_sep = get_next_char();
       ++iter;
    }
    if ((c = get_char()) == '/' || c == '}') // 6
    {
      if (c == '/')   // 7
      {
        ++iter;
      }
      parse_ctx.advance_to(iter);   // 8
      iter = m_val_fmt.parse(parse_ctx);   // 9
    }
    if ((c = get_char()) != 0 && c != '}')  // 10
    {
      throw format_error(
        "Invalid vector format specification");
    }
    return iter;
  }
  auto format(const vector<T>& vec, 
    format_context& format_ctx) const
  {
    auto pos = format_ctx.out();    // 11
    bool need_sep = false;
    for (const auto& val : vec)
    {
      if (need_sep)   //  12
      {
        *pos++ = m_sep;
        if (m_sep != ' ')
        {
          *pos++ = ' ';
        }
      }
      if (m_lc != '\0')   //  13
      {
        *pos++ = m_lc;
      }
      format_ctx.advance_to(pos); //  14
      pos = m_val_fmt.format(val,
        format_ctx);   // 15
      if (m_rc != '\0')   //  16
      {
        *pos++ = m_rc;
      }
      need_sep = true;
    }
    return pos;
  }
private:
  char m_lc = '\0';
  char m_rc = '\0';
  char m_sep = ' ';
  formatter<T> m_val_fmt; // 17
};
int main()
{
  vector<int> vec{1, 2, 4, 8, 16, 32};
  cout << format("{}\n", vec);            // a
  cout << format("{:w[]}\n", vec);        // b
  cout << format("{:s,}\n", vec);         // c
  cout << format("{:w[]s,}\n", vec);      // d
  cout << format("{:w[]/3}\n", vec);      // e
  cout << format("{:s;/+0{}}\n", vec, 5); // f
  vector<vector<int>> vec2{ {1, 2, 3}, 
    {40, 50, 60}, {700, 800, 900} };
  cout << format("{}\n", vec2);           // g
  cout << format("{:w[]}\n", vec2);       // h
  cout << format("{:s,}\n", vec2);        // i
  cout << format("{:w[]s,}\n", vec2);     // j
  cout << format("{:w[]/s,}\n", vec2);    // k
  cout << format("{:s;/s,/03}\n", vec2);  // l
}
Listing 1
a: 1 2 4 8 16 32
b: [1] [2] [4] [8] [16] [32]
c: 1, 2, 4, 8, 16, 32
d: [1], [2], [4], [8], [16], [32]
e: [  1] [  2] [  4] [  8] [ 16] [ 32]
f: +0001; +0002; +0004; +0008; +0016; +0032
g: 1 2 3 40 50 60 700 800 900
h: [1 2 3] [40 50 60] [700 800 900]
i: 1 2 3, 40 50 60, 700 800 900
j: [1 2 3], [40 50 60], [700 800 900]
k: [1, 2, 3] [40, 50, 60] [700, 800, 900]
l: 001, 002, 003; 040, 050, 060; 700, 800, 900
Listing 2

The format specification we will use has the following form:

  [ 'w' lc rc ] [ 's' sep ] [ '/' 
  [ value-fmt-spec ] ]

The element starting with w allows the user to specify characters to wrap the vector values in the output. The w must be followed by exactly two characters. The first character, lc, is written before the value, and the second, rc, is written after the value. If not given, no wrapper characters are output.

The element starting with s allows the user to specify a single character to act as a separator between the individual vector element values. If given, the s must be followed by exactly one character, which will be used as the separator. If not given, it defaults to the space character. If a separator is given it will be followed by a space in the output.

The / delimits the start of the format-spec for the vector’s value type. This will be read by the member variable m_val_fmt, defined in line 17, to set up the formatting for the vector values. If not given, it will use the default formatting for the value type. It is allowable – although not really useful – to give a / character with no following format-spec.

The parse function

The first few lines of the parse function, up to line 1, are the same as the ones for the Point class described in my previous article.

The first notable change is line 2. This calls the parse function on the nested m_val_fmt object, which is the formatter for the vector’s value type. Doing this allows the m_val_fmt object to set up its formatting for the default case where no format-spec is given.

The get_next_char function defined starting at line 3 is used to read the next character from the format-spec. It throws an exception if there are no more characters to read, as indicated by getting 0 back from the get_char function. As with the get_char function, when this function is done it leaves the iter variable pointing at the character read.

The if-statement starting at line 4 simply processes any w element to read the wrapper characters. It should be obvious what it is doing. Similarly, the code starting at line 5 just processes any s element to read the separator character.

The if-statement starting at line 6 holds the code to initialise the m_val_fmt object when we don’t have an empty format-spec. The if-statement condition has to check for both the / character that indicates the value type has a format-spec, and also for the } character that indicates the end of the format-spec, i.e. the case where there is no specific format-spec for the value type.

Line 7 checks for the / character and, if present, increments iter. This is because the / character is not part of the value type’s format-spec so seeing it would confuse the m_val_fmt.parse function.

Line 8 is important because, by calling the advance_to function on parse_ctx, it resets parse_ctx’s idea of where in the format-spec the start point is located. When line 9 then calls m_val_fmt.parse, it will start the processing at the correct position, i.e. the start of the value type’s embedded format-spec, not the vector’s format-spec.

When the m_val_fmt.parse function returns, it should have processed everything up to the } that terminates the format-spec. Note that in this case the } is doing double duty, as it terminates both the vector format-spec and the embedded value type format-spec. Line 10 carries out our normal check for correct termination of the format-spec.

The format function

Line 11 puts the current output iterator from format_ctx into the pos variable. This indicates where the next data is written to in the output.

The majority of the function is just a loop over the vector’s values. The interesting parts are described below.

Line 12 checks if we need to output a separator character. The first time through the loop this will be false, but on subsequent iterations it will be true. The body of the if-statement just outputs the separator character, then if it is not a space it outputs a space character as well. As we are just outputting single characters each time we can use the *pos++ = c form to write them to the output.

Lines 13 and 16 write the wrapper characters, if they are defined.

Line 14 sets up the format_ctx variable correctly for the output in the next line. By calling advance_to on format_ctx we set its output iterator to match the position we have reached up to this point in the function.

Line 15 outputs the current value by calling the format function on the m_val_fmt object. Because we have updated the output iterator on format_ctx in the line above, the value will be written to the correct position in the output. The format function returns the new value of the output iterator.

Test cases

The first set of test cases in the main function use a simple vector-of-ints as the value to output.

Test case a checks that the default formatting works for the vector and its contained values.

Test cases b, c, and d just check that the various parts of the vector format-spec work, but with no value format-spec, so the values will just use the default output.

Test case e checks that using a format-spec for the value works correctly. Using wrapper characters lets us check that the output values are indeed all output in fields three characters wide.

Test case f shows that you can use nested format specifiers in the value format-spec, in this case picking up the width from the argument list.

The second set of test cases use a vector-of-vectors-of-ints as the value to output.

Test case g checks that the default formatting works.

Note that in the output for case g, there is no way to tell where one nested vector ends and the next one starts. Test cases h, i, and j use the various parts of the vector format-spec to delimit the nested vectors in various ways.

Test case k checks that the nested vectors are output using the value format-spec, as can be seen from each value in them being separated by the comma specified by the format-spec.

Test case l checks that the nested vector’s format-spec can handle a format-spec for their values – in this case indicating a three character wide, zero-padded field.

A formatter for std::map

The next example we will look at is a formatter for std::map. This is more complicated because we want to allow format-specs for both the key type and value type of the map. The code is given in Listing 3, and sample output is in Listing 4.

#include <format>
#include <iostream>
#include <map>
using namespace std;
template<typename K, typename V>
struct formatter<map<K,V>>
{
  constexpr auto 
    parse(format_parse_context& parse_ctx)
  {
    auto iter = parse_ctx.begin();
    auto get_char = [&]() { return 
      iter != parse_ctx.end() ? *iter : 0; };
    char c = get_char();
    if (c == 0 || c == ‘}’)
    {
      m_key_fmt.parse(parse_ctx); // 1
      m_val_fmt.parse(parse_ctx); // 2
      return iter;
    }
    auto get_next_char = [&]() {
      ++iter;
      char vc = get_char();
      if (vc == 0)
      {
        throw format_error(
          "Invalid map format specification");
      }
      return vc;
    };
    if (c == 'w')   // 3
    {
      m_lc = get_next_char();
      m_rc = get_next_char();
      ++iter;
    }
    if ((c = get_char()) == 'c')    // 4
    {
      m_con = get_next_char();
      ++iter;
    }
    if ((c = get_char()) == 's')    // 5
    {
      m_sep = get_next_char();
      ++iter;
    }
    if ((c = get_char()) == '/')    // 6
    {
      //  Next char must be '{' at start of key
      // format spec
      if ((c = get_next_char()) != '{')   // 7
      {
        throw format_error(
          "Invalid map format specification");
      }
      parse_ctx.advance_to(++iter);       // 8
      iter = m_key_fmt.parse(parse_ctx);  // 9
      // Iter should point to '}' at end of key
      // format spec
      if ((c = get_char()) != '}')        // 10
      {
        throw format_error(
          "Invalid map format specification");
      }
      // Next char must be '{' at start of value 
      // format spec
      if ((c = get_next_char()) != '{')   // 11
      {
        throw format_error(
          "Invalid map format specification");
      }
      parse_ctx.advance_to(++iter);
      iter = m_val_fmt.parse(parse_ctx);
      // Iter should point to '}' at end of 
      // value format spec
      if ((c = get_char()) != '}')
      {
        throw format_error(
          "Invalid map format specification");
      }
      // Advance past the '}' at end of value 
      // format spec
      ++iter;
    }
    else if (c == '}')  // 12
    {
      parse_ctx.advance_to(iter);
      m_key_fmt.parse(parse_ctx);
      m_val_fmt.parse(parse_ctx);
    }
    if ((c = get_char()) != 0 && c != '}')  // 13
    {
      throw format_error(
        "Invalid map format specification");
    }
    return iter;
  }
  auto format(const map<K,V>& vals, 
    format_context& format_ctx) const
  {
    auto pos = format_ctx.out();    // 14
    bool need_sep = false;
    for (auto val : vals)
    {
      if (need_sep)   // 15
      {
        *pos++ = m_sep;
        if (m_sep != ' ')
        {
          *pos++ = ' ';
        }
      }
      if (m_lc != '\0')   // 16
      {
        *pos++ = m_lc;
      }
      format_ctx.advance_to(pos);     // 17
      pos = m_key_fmt.format(val.first, 
        format_ctx);
      *pos++ = m_con;                 // 18
      format_ctx.advance_to(pos);     // 19
      pos = m_val_fmt.format(val.second, 
        format_ctx);
      if (m_rc != '\0')   // 20
      {
        *pos++ = m_rc;
      }
      need_sep = true;
    }
    return pos;
  }
private:
  char m_lc = '\0';
  char m_rc = '\0';
  char m_sep = ' ';
  char m_con = '=';
  formatter<K> m_key_fmt;
  formatter<V> m_val_fmt;
};
int main()
{
  map<int, string> map1{ {1, "a"}, {2, "bc"},
    {3, "def"} };
  cout << format("{}\n", map1);           // a
  cout << format("{:w[]}\n", map1);       // b
  cout << format("{:s,}\n", map1);        // c
  cout << format("{:c:}\n", map1);        // d
  cout << format("{:w[]c:s,}\n", map1);   // e
  cout << format("{:w[]/{}{5}}\n", map1); // f
  cout << format("{:s;/{3}{5}}\n", map1); // g
  cout << format("{:s;/{3}{}}\n", map1);  // h
}
Listing 3
a: 1=a 2=bc 3=def
b: [1=a] [2=bc] [3=def]
c: 1=a, 2=bc, 3=def
d: 1:a 2:bc 3:def
e: [1:a], [2:bc], [3:def]
f: [1=a    ] [2=bc   ] [3=def  ]
g:   1=a    ;   2=bc   ;   3=def
h:   1=a;   2=bc;   3=def
Listing 4

The format specification we will use has the following form:

  [ 'w' lc rc ] [ 'c' conn ] [ 's' sep ]
  [ '/' '{' key-fmt-spec '}' '{' value-fmt-spec '}'
  ]

The elements starting with w and s have identical purposes and default to the ones we used for std::vector.

The element starting with c allows you to specify the connecting character that is output between the key and the value. The c must be followed by exactly one character. If not specified, the default value is =.

The / character introduces the format-specs for the key and value types of the map. Unlike the case for std::vector, these format-specs are mandatory if you have a / character. Unsurprisingly, key-fmt-spec is the one for the key type, and the value-fmt-spec is the one for the value type. You can use a default {} for either of these if you don’t want to change that particular item’s format.

Note that these two nested format-specs are surrounded by { and } characters. This breaks one of the guidelines I gave in the previous article for format specification mini-languages (see the appendix ‘Simple Mini-Language Guidelines’ in that article). The reason for this is as follows. The parse functions in formatters need to see a } character terminating the format-spec they are processing. This means when processing the key-fmt-spec, we need a } character at the end of the key-fmt-spec, before the value-fmt-spec starts. This could be confusing as it might look like it is the } that terminates the std::map’s format-spec. Using a { at the start of the key-fmt-spec helps to make it clear it is a single unit. As for the value-fmt-spec, that could use the } at the end of the std::map format-spec as its terminator, just like we do for std::vector above, but for consistency between the two format-specs it made more sense to also surround it with { and } characters.

The parse function

Much of the parse function is similar to the one for std::vector shown previously. Lines 1 and 2 handle the case where we have a default format-spec, calling the respective parse functions on the nested formatters for the key and value types. Note that we assume here that the m_key_fmt.parse function doesn’t alter the parse_ctx value passed to it. If you are concerned that it might do, you can take a copy of parse_ctx and pass that copy to the m_val_fmt.parse function instead.

The if-statements starting at lines 3 and 5 read the w and s elements, just as the corresponding lines do for std::vector. The if-statement starting at line 4 reads the c element, which must have a single character following it.

The if-statement starting at line 6 handles any nested format-specs defined. As mentioned previously, they are mandatory if the / character is present.

Line 7 checks for the { that indicates the start of the key-fmt-spec, and if not present throws a format_error. We just report a generic error text here, but obviously a more expressive text would help the user find the error quicker.

Line 8 uses the advance_to function to set up the iterator in parse_ctx. Note that we increment the value passed in as we need to skip the { detected in the previous line, which is not part of the key-fmt-spec. Line 9 then calls m_key_fmt.parse so the formatter for the key type can parse the key-fmt-spec. Finally, line 10 checks that the key-fmt-spec is correctly terminated with a } character.

The code starting at line 11 then does the same work, but for the value type, using the m_val_fmt member variable.

If the condition in line 6 is false it means we don’t have format specifications for the key or value types. Line 12 checks if we have reached the end of the format-spec for the map, and if so the controlled lines call the parse functions on m_key_fmt and m_val_fmt to set them to their defaults.

Finally, line 13 does the usual check to make sure we have reached the end of the format-spec.

The format function

The format function for std::map is similar to the one for std::vector given previously.

Line 14 picks up the current output iterator from format_ctx. The function then enters a loop over all the values in the map.

Line 15 checks if we need to output a separator character, and if so the controlled block does that work. Line 16 then does the same for the left-hand wrapper character.

Line 17 then sets the output iterator in format_ctx to the now-current value, and the following line uses m_key_fmt.format to output the key, returning the new value of the output iterator. Line 18 then outputs the connector character.

Line 19 updates the format_ctx output iterator again so the following line can output the value using m_val_fmt.format.

Line 20 then outputs the right-hand wrapper character, if required.

Test cases

Test case a checks that the default formatting works for map and its contained key-value pairs.

Test cases b, c, d, and e check that the various parts of the map’s format-spec work correctly, singly and in combination.

Test cases f, g, and h test that using format-specs for the key and value parts works, including that using default format-specs is allowed.

Summary

In this article we have shown how you can write a formatter for a container type, or any other class where the types of some elements are unknown to you when writing the formatter because they are specified by the user of the class.

In the next and final article of this series I will show you how to create format wrappers, special purpose classes that allow you to apply specific formatting to existing classes.

References

[Collyer24] Spencer Collyer ‘User-Defined Formatting in std::format: Part 1’, Overload 180, April 2024, available at https://accu.org/journals/overload/32/180/collyer/

Footnote

  1. Other than being called automatically by the various std::format functions that is, obviously.

Spencer Collyer Spencer has been programming for more years than he cares to remember, mostly in the financial sector, although in his younger years he worked on projects as diverse as monitoring water treatment works on the one hand, and television programme scheduling on the other.






Your Privacy

By clicking "Accept Non-Essential Cookies" you agree ACCU can store non-essential cookies on your device and disclose information in accordance with our Privacy Policy and Cookie Policy.

Current Setting: Non-Essential Cookies REJECTED


By clicking "Include Third Party Content" you agree ACCU can forward your IP address to third-party sites (such as YouTube) to enhance the information presented on this site, and that third-party sites may store cookies on your device.

Current Setting: Third Party Content EXCLUDED



Settings can be changed at any time from the Cookie Policy page.