pinAn Introduction to Fast Format (Part 1): The State of the Art

Overload Journal #89 - February 2009 + Programming Topics + Design of applications and programs   Author: Matthew Wilson
Writing a good library is hard. Matthew Wilson compares some existing formatting libraries, and promises to do better.

This article series describes FastFormat, a new open-source C++ formatting library that offers a maximal blend of robustness, efficiency and flexibility.

This first instalment will look at the state of the art in C++ formatting, including standard and leading open-source libraries. It will assess the alternatives in terms of software quality characteristics, and consider how they compare with FastFormat.

Examination of FastFormat's extensibility mechanisms and performance will be examined later in the series.

Introduction

FastFormat is one of a generation of libraries that I've been working on over the last few years whose overarching design principle is a refusal to make undesirable and unnecessary compromises between (what I deem to be) the essential characteristics of good software. These characteristics, their tradeoffs, the refusal to compromise, and the full technical details of the concepts, patterns, practices and principles that support this aim will be discussed in my next book, Breaking Up The Monolith: Advanced C++ Design without Compromise, which is in preparation and aimed for publication in 2009.

The characteristics I'm interested in include robustness, efficiency, expressiveness, flexibility, discoverability and transparency, portability, and modularity. (If you're unfamiliar with any of these, they are also documented in the Prologue of my second book, Extended STL, volume 1: Collections and Iterators [XSTLv1] , which is freely available from http://www.extendedstl.com/, along with the preface and several sample chapters.) In the case of formatting, there's an important additional characteristic: internationalisation (I18N) and localisation (L10N).

The basic philosophy of Monolith, which FastFormat and its sister libraries uphold, is:

C++ is hard; so only use it if you must. The (primary) reason you must use it in preference to easier languages is that it affords extremely high efficiency; the reason you use it in preference to C (which is usually equally fast) is that it is massively more expressive (though still less so than many other, easier languages). Consequently, general-purpose C++ libraries must be extremely efficient.

Nonetheless, you shouldn't have to sacrifice expressiveness and flexibility half as much as you might think in order to get that efficiency.

For a subject with such scope, the presentation is necessarily truncated. Some of the topics will be expounded on in a follow on article; others rely on previously published work; some may not recieve the full treatment until Monolith is published.

Although this article focuses on FastFormat, it's appropriate to mention its (older) sister library, Pantheios, a logging API library that offers a similar set of characteristics (and is up to two orders of magnitude faster than the competition), and which uses the same technology. In fact, FastFormat came about from a suggestion by Walter Bright (of Digital Mars C++ and the D Programming Language renown) to apply the Pantheios design to string formatting in general. Though FastFormat is less mature than Pantheios, which is already established in large-scale, high-performance commercial systems throughout the world, I hope that it will achieve similar significance. As well as providing an introduction to the library, these articles are also a call to any interested engineers who might like to get involved with the project. (I intend to cover the specifics of Pantheios in a later article.)

Design parameters

Formatting is a core aspect of many C++ programs. A formatting library exists at a relatively low layer of abstraction within an application and should not exhibit characteristics that cause it to intrude in a deleterious manner on application code, or application programmer consciousness. It must not compromise on robustness, efficiency or flexibility, because if it's flaky, slow, or has limited/no compatibility with your application types or with other libraries, you won't use it.

Without any compromise of these factors, it must be expressive and discoverable so it is easy to understand and use, and must facilitate the writing of code that is transparent (and beautiful!). If not, you won't enjoy using it and will be distracted from your real purpose: writing your application. It must also have high modularity and be portable, so you can use it in a wide range of contexts and with a wide range of compilers; if it only works with certain compilers on certain operating systems/architectures, you won't use it for any software that might need to be ported to them.

I believe that FastFormat meets these (sometimes conflicting) obligations more optimally than any other formatting library, and I will illustrate the reasons why this is so throughout these articles, in comparison with two standard libraries, C's Streams and C++'s IOStreams, and two open-source libraries, Loki.SafeFormat [LOKI] , [OKI2] (version 0.1.6) and Boost.Format [BF] (version 1.36.0). Each is a very impressive piece of software engineering, but each also has crucial flaws, as we will see.

Streams, Loki.SafeFormat and Boost.Format are replacement-based formatting APIs, where a format string is used to specify the number, type and location of parameters that will be replaced by arguments presented in the statement. By contrast, IOStreams is a concatenation-based formatting API, where each argument in turn is converted into string form and concatenated together to form the result.

FastFormat provides two different APIs:

  • The Format API (hereafter FastFormat.Format), is a replacement-based API
  • The Write API (hereafter FastFormat.Write), is a concatenation-based API.
Terminology

Sink

A sink is an entity that will receive the results of the formatting. Sinks have traditionally been file streams (including console) and character buffers, but can in principle be any type that can make use of a string.

Format

A format is a string, or an instance of a type interpretable as a string, that defines a format. It is required only for replacement-based APIs.

Argument

An argument is a value inserted/concatenated to form the output. For some libraries it may be only built-in types, for others only strings. For most it can be of arbitrary type, requiring translation into a form understood by the library, usually via a user-defined function.

Replacement Parameter

A replacement parameter is a replacement specification within a format. It may specify an argument index, and may also specify width and/or alignment and/or special formatting.

Example

First, let's take a look at the libraries in use, with a simple example statement. Listing 1 shows it with a notional AcmeFormat() function.

    std::string  forename  = "Professor";
    char         surname[] = "Yaffle";
    int          age       = 134;
    std::string  result;

    AcmeFormat(result, "My name is %0 %1; I am %2
    years old; call me %0", forename, surname, age)
Listing 1

To emulate this functionality with Streams we'd use sprintf(), as in Listing 2.

    #include <stlsoft/memory/auto_buffer.hpp>
    #include <string>
    #include <stdio.h>

    const size_t  total = 39 // the literal part(s)
                        + forename.size()
                        + ::strlen(surname)
                        + 21 // enough for any number
                        + 1; // for the nul-terminator
    stlsoft::auto_buffer<char>  buff(total);
    // allocate space, on stack if poss.

    int r = ::sprintf( &buff[0], "My name is %s %s; I
    am %d years old; call me %s", forename.c_str(),
    surname, age, forename.c_str());
    // TODO: handle r < 0
    result.assign(buff.data(), size_t(r));

    assert("My name is Professor Yaffle; I am 134 years
    old; call me Professor" == result);
Listing 2

Note the mistake in the calculation: forename.size() should be multiplied by 2. Tellingly, this was a genuine error made during the preparation of the example, nicely illustrating one of the dangers of the printf()-family.

The other five are as shown in Listing 3. All the requisite includes are elided for space. Furthermore, in the rest of the article, I will assume the inclusion of the optional header fastformat/ff.hpp, which does nothing more than include the main header fastformat/fastformat.hpp and then alias the namespace fastformat to the more succinct ff.

    // IOStreams:
    std::stringstream sstm;
    sstm << "My name is " << forename << " " <<
    surname << "; I am " << age << " years old; call
    me " << forename;

    result = sstm.str();
    . . . // assert assumed in all other examples

    // Boost.Format:
    result = boost::str(boost::format("My name is %1%
    %2%; I am %3% years old; call me %1%") % forename
    % surname % age);

    // Loki.SafeFormat
    Loki::SPrintf(s, "My name is %s %s; I am %d years
    old; call me %s")(forename)(surname)(age)(forename);

    // FastFormat.Format:
    fastformat::fmt(result, "My name is {0} {1}; I am
    {2} years old; call me {0}", forename, surname, age);

    // FastFormat.Write:
    fastformat::write(result, "My name is ", forename,
    " ", surname, "; I am ", age, " years old; call me
    ", forename);
Listing 3

Software quality characteristics

We'll now consider each of the libraries in turn, against the quality characteristics.

Robustness

Discussion of the robustness of a piece of software is usually concerned with whether it operates correctly when used in accordance with its own design. In other words, whether it is defective. With libraries it is worthwhile to consider also whether the library engenders correct use: it should be easy to use correctly, and hard to use incorrectly.

In the case of the robustness of formatting libraries, we can identity several aspects to robustness:

  • Defective format specification
  • Defective arguments
  • Atomicity

Defective format specification

There are three kinds of defective format specification:

  1. (For libraries whose replacement parameters specify types) the required types may not match the argument types
  2. Too few arguments are specified for the format being used
  3. One or more specified arguments are unreferenced in the format.

The first is easy to illustrate, using the Streams library:

      char const* name = "The Thing";
      int         mass = 200;

      printf("name=%s, mass=%skg\n", name, mass);

This is defective, and will not produce the intended output: It may well fault in a way that will stop your process (and you must hope that it does!). Some compilers proffer warnings in such cases, but can only do so if the format string is a literal in the same statement, so the help is limited. We may claim that Streams is not robust because it so readily facilitates the writing of defective code. Furthermore, it is possible to use it in a manner that violates its design, leading (hopefully) to hard faults. In neither case is the compiler able to prevent you.

Loki fails a little more gracefully; it detects the mismatch, stops any further argument processing and output, and the statement evaluates to -1. Boost.Format and FastFormat.Format are not vulnerable to this issue.

All four are subject to the other kinds. First, too few arguments:

      printf("name=%s, mass=%skg\n", name);
      std::cout << (boost::format("name=%1%,
         mass=%2%kgln") % name);
      Loki::FPrintf("name=%s, mass=%skg\n")(name);
      ff::fmtln(std::cout, "name={0},
         mass={1}kg", name);

Boost.Format and FastFormat.Format both throw exceptions, to ensure that client code cannot fail to be informed of the defective format specification. Loki.SafeFormat output stops at the point in the formatting corresponding to the first missing argument, and the statement returns the value -1. printf() will fault in some way or another, hopefully fatally.

It is important to note a significant difference between the software contracts of the libraries, insofar as where the contract violation occurs. It is a precondition of printf() (and its relatives) that every replacement parameter in the format string has a corresponding argument of the same type, or a type for which there is a known good conversion (e.g. shortint, floatdouble). Failure to provide such a correspondence is to have violated the contract, and thereby written a defective program. This is quite different from the case of Boost.Format and FastFormat.Format. They do not deem a case of mismatched format and arguments as a violation of the library's software contract. Rather, the libraries provide the means to detect and report such mismatches in a precisely defined way: it is part of their (well-functioning) behaviour. The onus on recognising this condition is on the client code, which, in all likelihood, is defective, and should be terminated accordingly. But that determination is outside the purview of the formatting library. (Note: an important side effect of this 'raising the defect level' is that such libraries are far more amenable to the application of automated testing.)

Finally, let's consider the case of too many arguments:

      printf("name=%s", name, mass);
      std::cout << (boost::format("name=%1%") % name
         % mass);
      Loki::FPrintf("name=%s")(name)(mass);
      ff::fmtln(std::cout, "name={0}", name, mass);

Once again, such a circumstance is likely to be as a result of a defective application. In the case of the Streams library, this is not deemed to be defective, and the function operates as if the extra arguments were not there. Loki.SafeFormat appends unreferenced arguments on to the 'completed' formatted string, which I assume is accidental. With Boost.Format and FastFormat.Format (in default mode) an exception is raised and sent to the caller.

There are use cases where having unreferenced arguments is valid, and both Boost.Format and FastFormat.Format support these. With Boost.Format, you can change the exceptional conditions on a per-formatter basis. With FastFormat, you can either change it on a per-program basis at compile-time, or on a per-thread/per-process basis by changing the process/thread mismatch handler. We'll look in more detail at this subject in a subsequent article. (Both libraries also support the suppression of exception reporting when there are too-few arguments, but the use cases for this are pretty few and far between.)

Defective argument types

It may surprise you to learn that some libraries allow you to pass variables of the wrong type, leading to a fault in operation of the application. Consider the following code:


      wchar_t const* name = L"The Thing";
      int            mass = 200;

      std::cout << "name=" << name << ",
         mass=" << mass  << "kg" << std::endl;

This does not print what the programmer wanted. In fact, it will print something along the lines of


      name=001237f0, mass=200kg

This is a side effect of the ability of the IOStreams to manipulate pointers. Good intentions; terrible consequences. In my opinion, this 'feature' fairly justifies the claim that the IOStreams are not type-safe, and are unfit for purpose.

What I was surprised to learn during the research of this article is that Boost.Format suffers from exactly the same design flaw, and produces similarly useless output. Loki.SafeFormat fares a little better, in at least being aware of the mismatch. However, its weak defective format specification mechanism of returning a result code rather than throwing an exception means that it just prints nothing past the first literal fragment, and unless you're diligently checking the return you won't know it has failed. In all three cases you find out about programmer error at runtime: this is too late.

Neither FastFormat API suffers from this issue. Both of the following lines precipitate a compilation error because the generic components that interpret the arguments into a canonical representation are not defined for wide string types in a multibyte string build (and vice versa).

      ff::fmtln(std::cout, "name={0}, mass={1}kg",
         name, mass);
      ff::writeln(std::cout, "name=", name, ",
         mass=", mass, "kg");

The programmer finds out about the error before it becomes a defect in the code. This is a good thing.

Alas, this issue is not limited to defects of mixed character string encodings. Some libraries provide extensibility mechanisms to allow user-defined types to be passed as arguments. Passing an instance of, say, a Person type by reference will result in a compiler-error unless you've provided a suitable definition of the requisite extensibility mechanism. This is a good thing.

However, if you pass a pointer to a Person instance, the picture changes significantly. In the following example the statements using IOStreams and Boost.Format will compile whether or not you've provided a definition of how to print a Person*,

      Person* pw = new Person("Wilson", . . .

      std::cout << "person: " << pw << "\n";
      // Compiles!

      std::cout << (boost::format(
         "person: %1%\n") % pw); // Compiles!

If you have, then it will work according to the programmer's intent. If not, however, it will proceed to write out the pointer value of person, which is unlikely to be of any use to your users. This is due to the insertion operator overload taking void cv*. Once again, the worst part of this problem is that you find out that your code is defective only after running the program. This is a bad thing.

With FastFormat, such defects are reported at the earliest possible moment, because they will fail to compile.


      ff::fmtln(std::cout, "person: {0}", pw);
      // Does not compile
      ff::writeln(std::cout, "person: ", pw);
      // Does not compile

It goes without saying that this is a very good thing. And it goes further: even if you introduce the extensions that allow FastFormat to understand void pointer arguments, doing so will still not allow the Person* arguments to be (incorrectly) understood. Hence:

    #include
       <fastformat/shims/conversion/void_pointers.hpp>

    Person* pw = . . .
    void*   pv = person;

    ff::fmtln(std::cout, "person: {0}", pw);
    // Still does not compile
    ff::fmtln(std::cout, "pv: {0}", pv);
    // Now compiles

Atomicity

With IOStreams, Boost.Format and Loki.SafeFormat, each statement element is presented to the stream in turn, with the unfortunate consequence that when the stream is a file/console (either std::basic_ostream or, where supported, FILE*) the output from multiple threads/processes can interleave at the granularity of the statement element rather than of the statement.

Library Is Atomic?

Streams

Yes

IOStreams

No

Boost.Format

stdout (via boost::str())

Yes

std::cout

No

Loki.SafeFormat

stdout

No

std::cout

No

FastFormat.Format

stdout

Yes

std::cout

Yes

FastFormat.Write

stdout

Yes

std::cout

Yes

This characteristic means that IOStreams, Boost.Format and Loki.SafeFormat are unsuitable for use in multi-threaded environments, unless you first convert to a string and send that to the output stream. None of the other libraries considered here suffer from this critical flaw.

Underneath the covers, the other, seemingly atomic, libraries are also converting to a local buffer before presenting that to the low-level I/O layer en bloc. But the crucial point is that they do this for you implicitly, thereby engendering correct use.

Flexibility

Flexibility is about how easily a library lets you do what you need to do, with the types with which you need to do it. For a formatting library, this comes in three areas:

  1. The sink types
  2. The argument types
  3. The format types (for replacement-based APIs only)

Sink types

In terms of sinks, high flexibility would mean facilitating output to different destinations. We're all familiar with writing to console, file and strings, but there's much more to it than that. We might want to write output to a speech synthesiser, a compression component, a GUI message box, or anything else you can think of. Even if you're writing to a 'string', there are many forms of string beyond std::string: it might be a character buffer, a string stream, an ACE ACE_CString, and so on.

Streams allows for only character buffer and FILE* stream sinks. It is not extensible. IOStreams allows for extension to any type of sink via the streambuf mechanism [L&K] . There are many examples of such in the canon, from spawned process I/O [PSTREAMS] to speech synthesis [SHAVIT] . It's quite involved, requiring implementing a whole class (with less than obvious semantics), although helper libraries are available [BSTMS] .

Boost.Format outputs to std::basic_ostream and std::basic_string, and is therefore indirectly extensible via IOStreams extension mechanisms. Loki.SafeFormat allows for stream (FILE*), IOStream (std::ostream), character buffer (char*) and string (std::string) sinks out of the box, and it also allows for general extension by requiring a single method to be implemented to match the custom sink type.

Similarly, sink flexibility is a first-class aspect of FastFormat's design. By default, the library understands only sink types that provide the reserve(size_t) and append(char const*, size_t) methods, of std::string and other conformant types (e.g. stlsoft::simple_string). However, adding support for other sink types is easy, and several stock sinks are provided in the FastFormat distribution (see Table 1). To use them, you need only #include the requisite header in your compilation unit.

Sink type Required #include

Fixed-capacity character buffers

fastformat/sinks/char_buffer.hpp

Fixed-capacity C-style strings

fastformat/sinks/c_string.hpp

STLSoft's auto_buffer

fastformat/sinks/auto_buffer.hpp

FILE*

fastformat/sinks/FILE.hpp

std::ostream (incl. std::cout/cerr)

fastformat/sinks/ostream.hpp

Speech (currently Windows only, using SAPI)

fastformat/sinks/speech.hpp

Vectored file (using UNIX's writev())

fastformat/sinks/vectored_file.hpp

std::stringstream

fastformat/sinks/stringstream.hpp

ACE's ACE_CString

fastformat/sinks/ACE_CString.hpp

ATL's CComBSTR

fastformat/sinks/CComBSTR.hpp

MFC's CString

fastformat/sinks/CString.hpp

Table 1

Argument types

Argument flexibility is undoubtedly the most important. We're all familiar with the limited flexibility of the Streams library: arguments can only be integer, floating-point and character types, C-style strings and pointers (as addresses). Loki.SafeFormat adds to this the ability to pass std::string.

The IOStreams, Boost.Format and both FastFormat APIs expand on this by providing the ability to pass instances (either via reference or via pointer) of user-defined types, by defining suitable extension functions. I'm assuming for brevity that you know how to overload insertion operators for your type(s) for IOStreams and Boost.Format.

FastFormat goes much further. Its application layer function templates apply string access shims [IC+] , [STLv1] , which define a protocol for generalised representation of objects as strings. Consequently, all types for which string access shim overloads have been defined are understood implicitly. So a large number of types are already compatible with FastFormat out of the box, including std::basic_string, std::exception, ACE_CString, VARIANT, struct dirent, struct tm, struct in_addr, CString, FILETIME, SYSTEMTIME, and many more. Because shims are able to introduce compatibility without incurring coupling, you can define shim overloads for your own types and they will automatically work with FastFormat (and with STLSoft, and Pantheios, and any other libraries that use string access shims).

Furthermore, FastFormat provides a second, higher-level, filtering mechanism for extension: it understands any types for which the overloads of the conversion shim [IC+] , [STLv1] fastformat::filtering::filter_type have been defined, and considers this before resolving arguments based on string access shim overloads. This type-filter mechanism facilitates FastFormat-specific conversion of types for which string access shim overloads have not been defined, e.g. an application-specific user-defined type. The mechanism can also be used for types whose conversion form does not suit your purposes: if you don't like the way, say, struct tm, is represented then you can override it.

We'll look at how these mechanisms work, and examples of how they facilitate extension to user-defined types, in a subsequent article.

Format types

The last area of flexibility, for replacement-based APIs, is the format string. With Streams, the format string must be a C-style string. Boost.Format and Loki.SafeFormat also support std::basic_string.

FastFormat.Format applies string access shims to its format parameter, which means a potentially infinite set of types. In practice, this flexibility has been most helpful in cases using string classes from other libraries (e.g. ACE, ATL), resource strings, and localised format bundles (again, a follow-on article issue).

Expressiveness

Expressiveness is 'how much of a given task can be achieved clearly in as few statements as possible' [XSTLv1] . Both succinctness and clarity are important, each without trespassing too much on the other.

With a formatting library, expressiveness can be judged in terms of:

  • Direct syntactic support for built-in and standard types
  • Direct syntactic support for user-defined types
  • Specification of width and alignment
  • Special formatting, e.g. hexadecimal for integral/pointer types

Direct syntactic support for built-in and standard types

Streams, IOStreams, Boost.Format and Loki.SafeFormat all provide good support for built-in types.

By nature, FastFormat does not understand built-in types, any more than it understands any types that are not, or cannot be represented (via string access shims) as, strings. As noted in the previous section, however, it can be easily extended to understanding any type via the type-filter mechanism.

The library comes with stock type-filters for:

  • All integral types (including int64 / long long)
  • float and double floating-point types
  • bool type
  • char and wchar_t types (except for compilers that define wchar_t as a typedef)
  • void pointer types (void* and its cv-variants)

They're each defined in their requisite header located in the fastformat/shims/conversion/filter_type include directory. As a convenience, the type-filter header for integral types is included into fastformat/fastformat.hpp by default. This can be switched off via the pre-processor. Automatic inclusion for the other types can be switched on in the same way, if you don't want to have to explicitly include them in your application code.

As for other standard types, all except Streams understand std::string (or std::wstring): our Streams example illustrates the annoying requirement to explicitly invoke the c_str() method.

As mentioned in the section on FastFormat also understands several other standard types. If you want to pass an exception as argument to a format, all other libraries will require you to explicitly invoke the what() method.

Direct syntactic support for user-defined types

This one's simple. Streams and Loki.SafeFormat do not allow for arguments of user-defined type. All the others do. To format strings representing instances of user-defined types with Streams and Loki.SafeFormat you have two choices. One option is to perform explicit formatting in application code, which is obviously anything but expressive.

    printf("person: %s %s, %d\n", bob.forename.c_str(),
       bob.surname.c_str(), bob.age);
    Loki::Printf("person: %s %s,
       %d\n")(bob.forename)(bob.surname)(bob.age);

The other option is to use a conversion function, which requires more code, is inefficient and still somewhat lacking in expressiveness:


    std::string Person2String(Person const& person);

    printf("person: %s\n", Person2String(bob).c_str());
    Loki::Printf("person: %s\n")(Person2String(bob));

Specification of width and alignment

All of the libraries except FastFormat.Write offer some ability to specify width and/or alignment. The statements in Listing 4 all print a left-aligned integer in a width of 5, and a right-aligned string in a width of 12 ("[-3  ,  abcdefghi]").

    int         i = -3;
    std::string s = "abcdefghi";

    printf("[%-5d, %12s]\n", i, s.c_str());

    std::cout << (boost::format("[%|-5|, %|12|]\n") %  
    i % s);

    std::cout << "[" <<
    std::setiosflags(std::ios::left) << std::setw(5)
    << i << ", " << std::setiosflags(std::ios::right)
    << std::setw(12) << s << "]" <<  std::endl;

    Loki::Printf("[%-5d, %12s]\n")(i)(s);

    ff::fmtln(std::cout, "[{0,5,,<}, {1,12}]", i, s);
  
Listing 4

Four of the libraries acquit themselves well in this case, with Loki.SafeFormat probably taking the biscuit. However, the IOStreams statement is a pig. Personally, I've always loathed the IOStreams, and avoided using them wherever possible, and this is a perfect illustration of why.

It's worth nothing that in terms of width and alignment, Boost.Format provides extended facilities for centred alignment and absolute tabulations over multiple fields. FastFormat.Format provides left/right/centred alignment, and can also do absolute tabulations, although it requires a certain indirection. We'll see how in a later article.

Without compromising robustness or efficiency, FastFormat.Format is able to support a good range of formatting/alignment instructions, by defining replacement parameter syntax as:


      index[, [minWidth][, [maxWidth][, [alignment]]]]

The index is required, but each of the other fields is optional. The index and widths must be non-negative decimal numbers. The alignment field is zero or one of '<' (left-align), '>' (right-align), '^' (centre-align). The minimum width can be anything up to 999, for implementation reasons; once again, we'll see why in a subsequent article.

Special formatting

Streams is able to format integers as decimal, octal and hex, to select precision for floating-point types, to use zero padding instead of spaces, and so on. Boost.Format and Loki.SafeFormat all provide the same functionality with equal expressiveness. IOStreams also provides these facilities, though you'll find yourself in the same kind of chevron-hell as with width and alignment.

As you can see from all the examples presented thus far, FastFormat's expressiveness is pretty good, on a par with the best performers of the other libraries. Here is where we reach its limit. I have made a strong case for FastFormat's superior robustness characteristics (and will do so regarding its performance characteristics), and the cost is in lower expressiveness in the area of special formatting.

Currently (as of 0.3.1), FastFormat supports no special formatting at all. The two I'm considering adding to FastFormat.Format both involve the case where the argument exceeds the parameter's maximum width, if specified. One option is to fill the whole field with a (per-thread/per-process) customisable character, which would probably default to the hash/pound character '#'. The other option is to insert an ellipsis "..." into the result. Both cases could be accommodated without compromising robustness or performance. The following example, using syntax that is speculative at this time, shows both options, giving the output "-3, ########, ...efghi":


      ff::fmtln(std::cout, "[{0}, {1,,8,>#}],
         {1,,8,>.}]", i, s);

Three features that have no hope of being accommodated within the current design are:

  • Leading zeros (or any other non-space padding)
  • Octal/hexadecimal encoding
  • Runtime width/alignment specification

To the FastFormat core, everything is just a string slice. It doesn't know anything about integers, floating-points, or user-defined types. So we cannot zero-pad. Well, actually, to add support for this would be trivially simple, since we already support space padding (for minimum width). Unfortunately, it would mean that you could do something like the following (again, the syntax is speculative).

      ff::fmtln(std::cout, "[{0,5,,>0}, {1,12,^}]",
         i, s);

This would produce the result "[000-3, abcdefghi ]", rather than the intended "[-0003, abcdefghi ]". Because an overriding principle of FastFormat is that it does not allow you to easily do the wrong thing, this will not be supported. The correct way to do this is to use an inserter class, which we'll discuss in detail in a subsequent article. For now, let's look how to do it with the Pantheios integer inserter class. I do this to illustrate the implicit, uncoupled interoperability between FastFormat and other libraries that use string access shims. (I also do it because, at the time of writing, there are not yet any inserter classes written for FastFormat; I've been using the Pantheios ones, and getting on with trickier problems.)


    #include <pantheios/pan.hpp>
    // API; alias namespace pantheios -> pan

    #include <pantheios/inserters/integer.hpp>
    // pantheios::integer class

    ff::fmtln(std::cout, "[{0}, {1,12,^}]"
            , pan::integer(i, 5, pan::fmt::zeroPad), s);

Octal/hexadecimal representation is not possible because the arguments have already been turned into string form before the format string is collated. If you want a number to be represented in this way, you need to use an inserter class. Again, until such time as FastFormat has its own, you can 'borrow' the integer class from Pantheios:


    ff::fmtln(std::cout, "10 in hex={0}"
            , pan::integer(10, 8, pan::fmt::fullHex));

Finally, specifying widths at runtime is also not possible, again because all arguments are treated as strings. If you need to do that, you must create the format string on the fly. The good news is that you can do this using FastFormat, and without significantly compromising performance. We'll see such examples of using FastFormat in 'recursive' mode in a subsequent article.

Discoverability and transparency

Discoverability and transparency are the two sides of the comprehensibility of a software component. Essentially, discoverability is how easy the component is to understand in order to use it (including customisations); transparency is how easy it is to understand in order to change it. With both characteristics, judgements are subjective, though not wholly so.

In terms of discoverability, I honestly believe that FastFormat is very good in the majority of its features, though I would have to concede that its more esoteric ones are likely just as undiscoverable as those of Boost.Format. Through force of habit, perhaps, Streams is very discoverable, and Loki.SafeFormat, being very similar to Streams, has that same characteristic. I have always found IOStreams to be the opposite of obvious, and am never able to do any non-trivial IOStreams programming without consulting the documentation. Score them last, in my opinion.

Any non-trivial C++ library, such as these, will suffer in the transparency stakes. Having spent a lot of time delving inside implementations of them all in recent weeks, I would have to say that none are scoring all that well. In my opinion, Loki.SafeFormat is slightly more transparent than the rest, and IOStreams and Boost.Format are considerably worse. Both are effectively opaque to anyone with less patience than a saint. (As, perhaps, is FastFormat too, to anyone other than its creator.)

It is no accident that the discoverability and transparency of Streams and Loki.SafeFormat seem to be superior to the rest, because they are the least flexible libraries: the two characteristics are usually in inverse proportion.

Portability

Being standard, Streams and IOStreams are available on just about every platform you're going to come across. (The only exceptions, pardon the pun, will be certain embedded platforms that don't support exceptions and/or templates.) Loki.SafeFormat is highly generic and contains no compiler-dependencies; as long as your compiler is modern enough to support static array-size determination [IC++] it works just fine. Boost.Format also has extremely high coverage. In terms of compiler capabilities, FastFormat is very portable, and will work with any modern compiler, and several not-so modern ones: It even works with Visual C++ 6!

FastFormat does not rely on compiler/operating-system specific constructs (although it may use them where available), and has been used successfully on Linux, Mac OS-X, Solaris, and Windows, including 32-bit and 64-bit variants of most. Nonetheless, it's likely that there are platforms and/or compilers that are not yet supported, but I'm highly confident that such can be accommodated readily. Part of the reason for writing this article is that I'm hoping to interest people in joining the project to help with such things (and to drive the design to new places, of course).

Modularity

Modularity is about dependencies, usually unwanted ones. This tends to have two forms:

  • What else do I need to do/have in order to work with the library
  • What else do I need to do/have in order to use the library to work with other things

In terms of the first, we can immediately stipulate that, being standard, the Streams and IOStreams libraries are perfectly modular by definition.

Boost.Format comes as part of Boost, and requires nothing else. Loki.SafeFormat comes as part of Loki, and requires nothing else. Both of these require only the usual download/unpack/build/install aspects of any open-source library.

FastFormat is less modular than the others, in that it requires the STLSoft libraries. However, since STLSoft is 100% header-only, this is a pretty small burden; the only impost is that you define the STLSOFT environment variable that the FastFormat makefiles expect.

When it comes to the other aspect, only FastFormat offers true modularity. Because its default argument interpretation is done via string access shims [XSTLv1] , it is automatically compatible with any other libraries/applications that use them. For example, you can report results of API functions from the Open-RJ library in FastFormat statements, as in:

      openrj::ORJRC rc =
         openrj::ReadDatabase(databasePath, . . .

      if(ORJ_RC_SUCCESS != rc)
      {
        ff::fmtln(std::cerr, "failed to open {0}: ",
           databasePath, rc);
      }

The resultant string will be formed from the format, the database path (C-style string) and the string form of the result code. If, say, databasePath is "myfile.rj" and rc is ORJ_RC_CANNOTOPENJARFILE, then the result will be "failed to open myfile.rj: the given file does not exist, or cannot be accessed". This all works without the FastFormat and Open-RJ libraries knowing anything about each other. In fact, it works without Open-RJ even having any dependency on STLSoft!

I18N/L10N

Depending on where you get your information, you may see slightly conflicting definitions of Internationalisation (aka I18N) and Localisation (I10N). The definitions I prefer are that I18N is the business of giving software the capability to support different locales, and L10N is the business of using that capability and actually providing support for one or more specific locales. We'll consider the libraries on that basis.

There are two major features required for I18N in a formatting library:

  • The ability to convert arguments in a form suitable to the locale
  • The ability to arrange arguments in an order suitable to the locale

When it comes to argument conversion, FastFormat is not yet fully internationalised: the converter classes and localised integer conversion functions are not yet written; the only integer conversions currently provided are not I18N, and just do vanilla integer to string conversion. The good news is that all these are addressable, and FastFormat is totally customisable to provide full I18N support: by string access shims, by the filter_type() mechanism and by inserter classes. All other libraries are, platform/compiler/standard-library permitting, already fully I18N compatible in how they convert arguments.

Arranging arguments necessitates a replacement-based API, whose format string may contain positional identifiers, such that arguments may be utilised in arbitrary order - determined at runtime, if necessary - dependent on the locale. Boost.Format and FastFormat.Format are the only two libraries from our set that support this requirement. FastFormat also comes with several 'bundles' - user-defined types that associate format strings with keys - from which format strings can be elicited dependent on locale; this will be discussed in a later article.

Efficiency

By this point may be wondering whether the mechanisms that enforce total robustness and allow infinite extensibility impose a performance cost. I am pleased to be able to tell you that this is not so: far from it, in fact.

The next article will take a deeper look into issues of performance, but I want to show you a sneak peek of the performance results for the Professor Yaffle example. (This test is included in the performance tests in the FastFormat distribution.) Table 2 shows the times for 100,000 iterations of the string formatting operation, compiled with GCC and Visual C++ on 32-bit and 64-bit machines.

Library Time (ms) for 100,000 Yaffles

 

VC++ 9 (x86)

VC++ 9 (x64)

GCC 4.2 (x86)

GCC 4.1 (x64)

Streams

257

175

209

83

IOStreams

734

378

233

186

Boost.Format

2,005

1,145

706

736

Loki.SafeFormat

356

235

342

235

FastFormat.Format

129

88

153

112

FastFormat.Write

112

84

63

66

Table 2

Let's look at the memory allocations involved with the example statement. Table 3 shows the results for four compilers.

Library # allocations

 

VC++ 7.1

VC++ 9

GCC 4.2 (x86)

CodeWarrior 8

Streams

2

1

2

3

IOStreams

8

8

2

11

Boost.Format

16

19

16

41

Loki.SafeFormat

3

3

4

6

FastFormat.Format

1

1

1

3

FastFormat.Write

1

1

1

3

Table 3

It's clear that Boost.Format is the greedy sluggard of the group, reflected in the amount of memory allocations it makes and in the time it takes to prepare statements. Streams is consistently quicker than IOStreams and Loki.SafeFormat, but their relative performance is dependent on compiler/platform (IOStreams on UNIX is surprisingly quick.) But the clear winners are the two FastFormat APIs, which (thankfully!) live up to their name; the Write API is somewhat quicker, as we'd expect.

A last word on Loki

Along with Streams, IOStreams and Boost.Format, Loki.SafeFormat has come in for a fair grilling. I would like to point out that it differs from the other three in being a knowingly research/alpha project. Its original author, Andrei Alexandrescu, has pointed out on more than one occasion that it's not yet a polished idea nor of production status: its version number, 0.1.6, is a good indication of that. I've included it in this test because (i) I did not want to be accused of singling out Boost for criticism, and (ii) it's got an interesting interface layer. Once you've read Monolith (hint, hint), you'll see how some of the library's deficiencies are not necessarily of the basic design, merely a consequence of a weak tunnel mechanism, and could be remedied by adopting a different one. Consequently, most (though not all) criticisms of its robustness and flexibility issues may be taken with the reservation that such things are improvable.

Summary

In this article we've looked at several formatting libraries and compared at how well they fared against some important characteristics of software quality, with decidedly mixed results; see Table 4. We also introduced Fast Format, a new formatting library with big aims: to provide the highest possible quotient from the software quality equation in this crucial area of almost all programs.

And the impartial recommendation is ...

Well, if it's not obvious by now, I mustn't have belaboured the point quite enough. As I said at the start of the article, a formatting library 'must not compromise on robustness, efficiency or flexibility'. All four established libraries fail this test for the most important of these, robustness. In my opinion, that is the fatal blow. By contrast, FastFormat.Format is as robust as it is possible for a replacement-based format library to be, and FastFormat.Write is completely robust: it is impossible to compile defective code using it!

That FastFormat offers better flexibility (although only slightly in the case of Boost.Format and IOStreams) and substantially better performance is the cherry on the cake. That FastFormat is permanently a little less expressive than Boost.Format is a small price to pay for the robust+flexible+fast trifecta.

The key, then, is to finish off its I18N support, sort out its packaging and ensure its full portability. I am hoping that readers of these articles will be motivated to help me get it over the line. Then we can just enjoy flexible, reliable formatting that also happens to be exceedingly fast.

The next article(s) will look in detail at FastFormat's extensibility mechanisms and cover some of the ways in which it achieves its high performance.

References

[BF] The Boost.Format library; http://www.boost.org/doc/libs/1_36_0/libs/format/index.html

[BSTMS] The Boost.IOStreams library; http://www.boost.org/doc/libs/1_36_0/libs/iostreams/doc/index.html

[IC++] Imperfect C++, Matthew Wilson, Addison-Wesley 2004; http://www.imperfectcplusplus.com/

[L&K] Standard C++ IOStreams and Locales, Langer & Kreft, Addison-Wesley, 2000

[LOKI1] The Loki library; http://www.sourceforge.net/projects/loki-lib

[LOKI2] 'Typesafe Formatting', Andrei Alexandrescu, C/C++ Users Journal, August 2005; http://www.ddj.com/cpp/184401987

[PSTREAMS] http://pstreams.sourceforge.net/

[SHAVIT] audio_stream: A Text-to-Speech ostream, Adi Shavit, March 2007; http://www.codeproject.com/KB/audio-video/audio_ostream.aspx

[XSTLv1] Extended STL, volume 1, Matthew Wilson, Addison-Wesley 2007; http://www.extendedstl.com/

Streams IOStreams BoostFormat Loki.SafeFormat FastFormat Write FastFormat Format

Robustness (type-safety)

Very low

Medium

Medium

Medium

100%

100%

Robustness (format)

Very low

n/a

High

Low

High

n/a

Robustness (atomicity)

Yes

No

No

No

Yes

Yes

Flexibility (sink)

Medium

High

High

High

High

High

Flexibility (format)

Low

n/a

Medium

Medium

n/a

High

Flexibility (argument)

Low

High

High

Low

Very high

Very high

Redefines operator semantics

No

Yes

Yes

No

No

No

Expressiveness (UDTs)

Built-in types high UDT is n/a

High

High

Built-in types high UDT is n/a

High

High

Expressiveness (Width/Align)

Medium

Low

Very high

Medium

n/a

High

Expressiveness (Special Fmt)

Built-in types high UDT is n/a

Low

Very high

Built-in types high UDT is n/a

n/a

Low

I18N/L10N (conversions)

Yes

Yes

Yes

Yes

Currently incomplete

Currently incomplete

I18L/L10N (ordering)

No

No

Yes

No

Yes

Yes

Porterbility

Total (Standard)

Total (Standard)

High

High

Medium now High possible

Medium now High possible

Modularity (required)

Total (Standard)

Total (Standard)

Relies only on Boost

Relies only on Loki

Relies on STLSoft

Relies on STLSoft

Efficiency

High

Medium

Low

High

Very high

Very high

Table 4

Overload Journal #89 - February 2009 + Programming Topics + Design of applications and programs