ACCU Home page ACCU Conference Page
Search Contact us ACCU at Flickr ACCU at GitHib ACCU at Google+ ACCU at Facebook ACCU at Linked-in ACCU at Twitter Skip Navigation

pinTemplate Programming Compile Time String Functions

Overload Journal #128 - August 2015 + Programming Topics   Author: Nick Weatherhead
Practising old exercises in new ways can keep you sharp. Nick Weatherhead demonstrates some well-known code katas using C++ compile time tricks.

Many of us will recall some of the first programming exercises we had to solve. Possibly they involved converting strings into different representations and detecting some properties. When learning a new programming language or technique, it’s not unusual to revisit and apply some of these. I will look at three examples using C++ templates for compile time constant expressions; an integer to string conversion, then the reverse by generating an integer from a Roman numeral, and finally detecting whether a string is a palindrome.

As a string is a sequence of adjacent characters it is common to physically represent it as an array. However, we soon discover that templates do not permit the use of string literals as parameters. There are a couple of workarounds – they can be defined with external linkage indicating that they are a single entity across translation units or, alternatively, they can be defined as a class member and passed in as a type. However, their use in compile time expressions is limited because pointer arithmetic isn’t allowed, and further, whilst integral types can be evaluated at compile time with a propagated constant expression, there isn’t a similar mechanism that recursively expresses the sequence of elements required in an array initialisation. Let’s consider some alternatives.

#include <iostream>
using namespace std;
template< size_t I > struct itoa {
  friend ostream& operator<<(
    ostream& os, const itoa& ) {
    return os << itoa< I / 10 >( )
              << itoa< I % 10 >( );
    }
};
#define DIGITS X(0) X(1) X(2) X(3) X(4)        \
               X(5) X(6) X(7) X(8) X(9)
#define X( I )                                 \
template< > struct itoa< I > {                 \
  operator char( ) const { return 0x3##I; }    \
};                                             \
DIGITS
#undef X
int main( ) {
  cout << "\n3210 = " << itoa< 3210 >( )
       << "\n26   = " << itoa< 26   >( )
       << "\n9    = " << itoa< 9    >( );
}
			
Listing 1

A basic integer to ASCII conversion can be defined as in Listing 1. The arithmetic can be determined at compile time, but a sequence of calls is required to place each individual character into the output stream. It also shows how ‘X’ Macros are used in generating repeated code when multiple template specialisations are required. Instead of individually outputting each character, each class could be composed of a single character (1 byte). Typically, a compiler will align characters on a one byte boundary such that, when these are constructed together, they form a contiguous block of characters that can be cast as a string. Note that this technique isn’t guaranteed to work; whilst C and C++ compilers retain the order in which members are declared in a structure they can add padding between them. Adapting the above gives Listing 2.

#include <iostream>
using namespace std;
template< size_t I > struct itos
: itos< I / 10 > {
  const char _;
  constexpr itos( ) : _( 0x30 + I % 10 ) {
    static_assert(
        sizeof( itos )
      == sizeof( itos< I / 10 > ) + 1
    ,
      "unwanted padding"
    );
  }
  struct out {
    const itos< I > _; const char nil;
    constexpr out( ) : nil( '\0' ) {
      static_assert(
           sizeof( out )
        == sizeof( _ ) + 1
      ,
        "unwanted padding"
      );
    }
    constexpr operator const char* const( )
    const {
      return _;
    }
  };
};
#define DIGITS X(0) X(1) X(2) X(3) X(4)        \
               X(5) X(6) X(7) X(8) X(9)
#define X( I )                                 \
template< > struct itos< I > {                 \
  const char _;                                \
  constexpr itos( ) : _( 0x3##I ) { }          \
  constexpr static const char* const out( ) {  \
    return #I;                                 \
  }                                            \
  constexpr operator const char* const( )      \
  const {                                      \
    return &_;                                 \
  }                                            \
};                                             \

DIGITS

#undef X

int main( ) {
    cout << "\n3210 = " << itos< 3210 >::out( )
         << "\n26   = " << itos< 26   >::out( )
         << "\n9    = " << itos< 9    >::out( );
}
			
Listing 2

For the specialisation of one digit there is an out function and in the general case to concatenate digits there is a nested class named out too; both can be called with the same out( ) in order to return a null terminated string. The nested class can work on a completed definition of the outer template object, thus becoming a functor.

The constructors are specified with constexpr. This indicates that they can be evaluated and their resulting objects initialised at compile time. If constexpr isn’t present calls to each constructor will be generated instead. Whether the constant expression branch is taken can be determined using the noexcept operator, for example:

<CodeListing> noexcept( itos< 10 >::out( ) )</CodeListing>

will evaluate to true in the example in Listing 2, but if constexpr is removed from either of the constructors it will return false. Static assertions are included in the constructors to verify that the characters are contiguous.

The code in Listing 3 uses variadic templates to interpret Roman numerals with the combinations of I, V and X, from I up to and including XXXIX i.e. from 1 to 39. However, it does so without validating the numerals. Roman numerals are written from left to right in descending denominations with the exception of subtractive cases e.g. IV and IX; they should also be represented using the fewest characters necessary.

#include <iostream>
using namespace std;

template< char N, char... NS > struct ns {
  static const int d =
     ns< N     >::d
  +  ns< NS... >::d;
};

template< > struct ns< 'I' > {
  static const int d = 1;
};

template< > struct ns< 'V' > {
  static const int d = 5;
};

template< > struct ns< 'X' > {
  static const int d = 10;
};

template< > struct ns< 'I', 'V' > {
  static const int d = 4;
};

template< > struct ns< 'I', 'X' > {
  static const int d = 9;
};

int main( ) {
  static const int
    v       = ns<'V'>::d
  , xix     = ns<'X','I','X'>::d
  , xxiv    = ns<'X','X','I','V'>::d
  , xxxviii = ns<'X','X','X','V','I','I','I' >::d;
  cout << "\nV       = " << v
       << "\nXIX     = " << xix
       << "\nXXIV    = " << xxiv
       << "\nXXXVIII = " << xxxviii;
}
			
Listing 3

In Listing 4, an underlying structure n_ is defined to cater for a running total of the decimal value d, and counts for consecutive Is, Vs and Xs (i, x, v) which are then encapsulated in a class n. As each class is called a character is pushed onto the stack and these properties are transitively passed from one nested template to the next in the chain. Expressions are not evaluated until the last character is read and the stack unwinds; thus the numerals are evaluated as if read in reverse. Specialisations for n are defined for I, V and X; each adds its respective denomination of 1 (in the subtractive case of an I appearing before a V or an X this is negated), 5 or 10 to the running decimal and increases its character count.

#include <iostream>
using namespace std;

template< int D, int I, int V, int X >
struct n_ {
  static const int d = D, i = I, v = V, x = X;
};

template< int D,        int V, int X >
struct n_<    D,     4,     V,     X >;

template< int D, int I,        int X >
struct n_<    D,     I,     2,     X >;

template< int D,               int X >
struct n_<    D,     2,     1,     X >;

template< int D, int I               >
struct n_<    D,     I,     1,     1 >;

template< int D, int I, int V        >
struct n_<    D,     I,     V,     4 >;

template< int D,        int V        >
struct n_<    D,     2,     V,     1 >;

template< char C, typename N = n_< 0, 0, 0, 0 > >
struct n { };

template< typename N > struct n< 'I', N >
: n_<
    N::d + ( N::v || N::x ? -1 : 1 )
  , N::i + 1
  , N::v ? 1 : 0
  , N::x ? 1 : 0
  >

{ };

template< typename N > struct n< 'V', N >
: n_<
    N::d + 5
  , 0
  , N::v + 1
  , N::x ? 1 : 0
  >

{ };

template< typename N > struct n< 'X', N >
: n_<
    N::d + 10
  , 0
  , 0
  , N::x + 1
  >

{ };

template< char N, char... NS > struct ns
: n< N, ns< NS... > >

{ };

template< char N > struct ns< N >
: n< N >

{ };

int main( ) {
//  static const int
//    iiv  = ns<'I','I','V'>::d
//  , vv   = ns<'V','V'>::d
//  , vxx  = ns<'V','X','X'>::d
//  , xxxx = ns<'X','X','X','X'>::d;

  static const int
    v       = ns<'V'>::d
  , xix     = ns<'X','I','X'>::d
  , xxiv    = ns<'X','X','I','V'>::d
  , xxxviii = ns<'X','X','X','V','I','I','I'
                  >::d;
  cout << "\nV       = " << v
       << "\nXIX     = " << xix
       << "\nXXIV    = " << xxiv
       << "\nXXXVIII = " << xxxviii;
}
			
Listing 4

Further consideration is given to glyphs that are greater in value to the one preceding them. Where this occurs any active counts for higher denomination characters are floored to one; this allows the subtractive cases to be dealt with and the likes of VX and IIX to be eliminated. Counters for denominations smaller than the one being examined are already accounted for so are zeroed out. In order to ensure a numeral is represented using the fewest characters any occurrence of more than three consecutive Is or Xs or more than one V are identified. To prevent a combination a class definition is supplied which matches the count of the characters but has no body – this causes compilation to fail if any result. To aid legibility the nested templates are wrapped by a variadic template class (see Listing 5).

#include <iostream>
using namespace std;

struct nil;

template< char C, typename L = nil > struct l {
  static const size_t i = L::i + 1;
  template < size_t I = 0 > struct p {
    static const char c =
      I < i ? L::template p< I >::c : C;
    static const bool is =
    (
      I >= i
    ||
        c == C
      &&
        L::template p< I + 1 >::is
    );
  };
};
template< char C > struct l< C, nil > {
  static const size_t i = 0;
  template< size_t I = 0 > struct p {
    static const char c     =    C;
    static const bool is    =    true;
  };
};
template< char C, char... CS > struct ls
:   l< C, ls< CS... > >
{ };
template< char C > struct ls< C >
:   l< C >
{ };

int main( ) {
  static const bool
    arora  = ls<'a','r','o','r','a'>
                 ::p< >::is
  , hannah = ls<'h','a','n','n','a','h'>
                 ::p< >::is
  , ania   = ls<'a','n','i','a'>
                 ::p< >::is;
  cout << "\narora   = "
       << ( arora  ? "y" : "n" )
       << "\nhannah  = "
       << ( hannah ? "y" : "n" )
       << "\nania    = "
       << ( ania   ? "y" : "n" );
}  
			
Listing 5

In Listing 5, the outer class l defines a concatenation of characters and the nested template class p test to see if these represent a palindrome (reading the same backwards as forwards). The strategy to detect a palindrome is to step from the outermost to the innermost characters, comparing the beginning and end letters to see if they match until either they don’t, in which case it isn’t a palindrome, or the middle is met, in which case it is. For a single character, the specialisation l with a nil continuation, this is trivially true; however, for a concatenated list indexing and iteration of elements is required. This is achieved by indexing each element, as each is added to the list, with a propagated count i. When calling p the outer class will reference the last character C at the index i (being the length of the list minus one). Template p takes a parameter I which is the index of the corresponding character to make a comparison with; in the outermost case this is the first at index zero. The recursive function is checks to see if the index requested indicates that the middle has been met or, in the case of even length strings, crossed. If it hasn’t then the characters are compared with one another and, if they match, the link L is followed to the previous character and p is called again with the next index. In this way the elements are being evaluated in reverse order to which the chain is defined. When finding a corresponding character c the link to previous elements is recursively followed until the index I is found.

In summary, ‘integer to string’ looked at alternative ways in which a string can be emitted with templates. It also demonstrated how nested classes can be used to define functions on the template definition of the outer class, the use of X macros to generate code for repeating specialisations, and introduced the use of the C++11 constexpr to evaluate constructors at compile time. ‘Roman numeral’ highlighted the interpretation and pattern matching capabilities of templates and their ability to prevent as well as action state transitions. Finally ‘palindrome’ built on the use of nested class functors to iterate over a sequence of characters.

Acknowledgements

I’d like to thank the Overload review team for their advice, particularly for suggesting that I explore the use of variadic templates, commenting that static assertions can be used to verify whether character alignment within a structure is contiguous, and also for cautioning against the use of reinterpret_cast.

Further reading

constexpr specifier (since C++11), May 2015. http://en.cppreference.com/w/cpp/language/constexpr

C++11FAQ : constexpr – generalized and guaranteed constant expressions, September 2014http://www.stroustrup.com/C++11FAQ.html

C++ Template: The Complete Guide, David Vandevoorde and Nicolai M. Josuttis, Addison Wesley, 2002, pp.40-41, 209-210.

Data structure alignment, June 2015, https://en.wikipedia.org/wiki/Data_structure_alignment

Overload Journal #128 - August 2015 + Programming Topics