ACCU Home page ACCU Conference Page
Search Contact us ACCU at Flickr ACCU at GitHib ACCU at Google+ ACCU at Facebook ACCU at Linked-in ACCU at Twitter Skip Navigation

pinModern C++ Features: User-Defined Literals

Overload Journal #136 - December 2016 + Programming Topics   Author: Arne Mertz
User-defined literals were introduced in C++11. Arne Mertz walks us through their use.

User-defined literals are a convenient feature added in C++11.

C++ always had a number of built-in ways to write literals: pieces of source code that have a specific type and value. They are part of the basic building blocks of the language:

  32 043 0x34 // integer literals, type int
  4.27 5E1    // floating point literals, 
              // type double
  'f', '\n'   // character literals, type char
  "foo"       // string literal, type const char[4]
  true, false // boolean literals, type bool

These are only the most common ones. There are many more, including some newcomers in the newer standards. Other literals are nullptr and different kinds of prefixes for character and string literals. There also are suffixes we can use to change the type of a built-in numeric literal:

  32u         // unsigned int
  043l        // long
  0x34ull     // unsigned long long
  4.27f       // float
  5E1l        // long double

Suffixes for user-defined literals

With C++11, we got the option of defining our own suffixes. They can be applied to integer, floating point, character and string literals of any flavor. The suffixes must be valid identifiers and start with an underscore – those without an underscore are reserved for future standards.

Using the literals

User-defined literals are basically normal function calls with a fancy syntax. I’ll show you in a second how those functions are defined. First, let’s see some examples of how they are used:

  • user-defined integer literal with suffix _km

    45_km

  • user-defined floating point literal with suffix _mi

    17.8e2_mi

  • user-defined character literal with suffix _c

    'g'_c

  • user-defined character literal (char32_t) with suffix _c

    U'%'_c

  • user-defined string literal with suffix _score

    "under"_score

  • user-defined string literal (raw, UTF8) with suffix _stuff

    u8R"##("(weird)")##"_stuff

Defining literal operators

The functions are called literal operators. Given an appropriate class for lengths, the definition of literal operators that match the first two examples above could look like this:

  Length operator "" _km(unsigned long long n) {
    return Length{n, Length::KILOMETERS};
  }

  Length operator ""_mi(long double d) {
    return Length{d, Length::MILES};
  }

More generally, the syntax for the function header is <ReturnType> operator "" <Suffix> (<Parameters>). The return type can be anything, including void. As you see, there can be whitespace between the "" and the suffix – unless the suffix standing alone would be a reserved identifier or keyword. That means, if we want our suffix to start with a capital letter after the underscore, e.g. _KM, there may be no white space. (Identifiers with underscores followed by capitals are reserved for the standard implementation.)

The allowed parameter lists are constrained: for a user-defined integral or floating point literal, you can already see an example above. The compiler first looks for an operator that takes an unsigned long long or long double, respectively. If such an operator can not be found, there has to be either one taking a char const* or a template<char...> operator taking no parameters.

In the case of the so-called raw literal operator taking a const char, the character sequence constituting the integral or floating point literal is passed as the parameter. In the case of the template, it is passed as the list of template arguments. E.g. for the _mi example above this would instantiate and call:

  operator ""_mi<'1', '7', '.', '8', 'e', '2'>()

Use cases

The example with the units above is a pretty common one. You will have noted that both operators return a Length. The class would have an internal conversion for the different units, so with these user defined literals it would be easy to mix the units without crashing your spaceship [Wikipedia]:

  auto length = 32_mi + 45.4_km;
  std::cout << "It's " << length.miles()
            << " miles\n";        //60.21
  std::cout << "or " << length.kilometers()
            << " kilometers.\n"; //96.899

The standard library also contains a bunch of these (and yes, they still are called ‘user-defined’ in standard speak). They are not directly in namespace std but in subnamespaces of std::literals:

  • From std::literals::complex_literals, the suffixes i, if and il are for the imaginary part of std::complex numbers. So, 3.5if is the same as std::complex<float>{0, 3.5f}
  • From std::literals::chrono_literals, the suffixes h, min, s, ms, us and ns create durations in std::chrono for hours, minutes, seconds, milli-, micro- and nanoseconds, respectively.
  • In std::literals::string_literals, we have the suffix s to finally create a std::string right from a string literal instead of tossing around char const*.

A word of caution

While user defined literals look very neat, they are not much more than syntactic sugar. There is not much difference between defining and calling a literal operator with "foo"_bar and doing the same with an ordinary function as bar("foo"). In theory, we could write literal operators that have side effects and do anything we want, like a normal function.

However, that is not what people would expect from something that does not look like ‘it does something’. Therefore it is best to use user defined literals only as obvious shorthand for the construction of values.

Playing with other modern C++ features

A while ago I came across a case where I had to loop over a fixed list of std::strings defined at compile time. In the old days before C++11, the code would have looked like this:

  static std::string const strings[] =  
    {"foo", "bar", "baz"};
  for (std::string const* pstr = strings;
    pstr != strings+3; ++pstr) {
      process(*pstr);
  }

This is horrible. Dereferencing the pointer and the hard-coded 3 in the loop condition just don’t seem right. I could have used an std::vector<std::string> here, but that would mean a separate function to prefill and initialize the const vector since there were no lambdas.

Today we have range based for, initializer_list, auto and user-defined literals for strings:

  using namespace std::literals::string_literals;
  //...
  for (auto const& str : {"foo"s, "bar"s, "baz"s})
  {
    process(str);
  }

And the code looks just as simple as it should.

References

[Wikipedia] The Mars Climate Orbiter: Cause of Failure https://en.wikipedia.org/wiki/Mars_Climate_Orbiter#Cause_of_failure

Overload Journal #136 - December 2016 + Programming Topics