ACCU Home page ACCU Conference Page ACCU 2017 Conference Registration Page
Search Contact us ACCU at Flickr ACCU at GitHib ACCU at Google+ ACCU at Facebook ACCU at Linked-in ACCU at Twitter Skip Navigation

pinDynamic C++, Part 1

Overload Journal #115 - June 2013 + Programming Topics   Author: Alex Fabijanic
Static and dynamic languages have different trade-off. Alex Fabijanic attempts to get the best of both worlds.

As to which is more important, Dynamic or static,
both are absolutely essential,
even when they are in conflict.

~ Robert Pirsig, Metaphysics of Quality [Pirsig]

C++ is a statically-typed language. The static nature of the C++ type system provides a data integrity ‘safety net’. The compiler is an indispensable runtime-surprise-prevention tool and the static nature of C++ provides runtime performance gain. Before we go any further, let’s clarify the nomenclature – static typing here should not be confused with its close relative strong-typing, see the definitions on the right for details. It is precisely the ‘weaknesses’ of the type system (in combination with polymorphism and templates) that provides the functionality needed for dynamic-like behavior within a statically typed language such as C++. And there are circumstances calling for a ‘softened’ type system, where a degree of performance sacrifice is acceptable and runtime type system relaxation desirable (or even necessary). To provide generic functionality automatically adaptable to different data types at runtime within the confines of standard C++, the runtime type detection system does not suffice – one has to resort to library solutions based on various techniques described later.

Static vs. dynamic

This classification has to do with the timing of value-to-type attachment. Static means values are attached to types (‘compiled’) at compile time. Dynamic means they are attached (‘interpreted’) at runtime. Since C++ attaches values to types at compile, it follows that C++ is a statically typed language.

Strong vs. weak

This classification has to do with ‘loopholes’ the programming language type system leaves open for its type system to be ‘subverted’. Both C and C++ allow different types and pointers thereof to be cast to each other. While C++ is stricter than C, it is also backward compatible. But even without the C compatibility, C++ provides ways to subvert the type system and therefore can not be considered a strongly typed language. As a (non-exhaustive) example, void* and union disqualify C and C++ from strongly-typed qualification.

Data from external data sources arrives in a variety of types and brings along the need for efficient and transparent datatype conversion. The proliferation of web-based interfaces and databases with the addition of popular textual formats such as JSON and XML exacerbates the need for a relaxed type system with transparent and safe conversion facilities. This is a domain where dynamic languages have gained a significant footing. In order to clarify the premises for this writing, we must take a brief detour here (see ‘C$$++ Dynamism’).

C++ dynamism

To get a broader perspective, let’s look at the need for and benefits of ‘dynamic’ typing in C++. Even the dynamic language environments are ultimately built on a statically typed foundation. Yet, when the static/dynamic langugage interaction need arises, we embed those ‘foreign’ language environments and we must speak to them in a ‘foreign tongue’ (i.e. through a specialized translation layer). Wouldn’t it be nice to (a) smooth the rough edge between the two in a reusable way, while also (b) addressing the concern of dealing with external data of different types and (c) gain a generic-purpose standard C++ ‘dynamic’ environment natively and seamlessly, as a side-effect?

While standard C++ claims to be a general-purpose language, it stops abruptly at the point where (among other things) dynamic-language-like behavior is needed – as things stand at the time of this writing, even a well-known, half-way-there oldie like boost::any could only make it to 2014 Technical Specification (a pre-standardization mechanism for almost-there-but-not-yet-standard-ready libraries and language features), which means it will not be standardized will not be standardized until 2017 at least.

The original spark triggering this systematic overview (boost::any port to POCO some years ago) was decidedly not about C++ as a ‘dynamic’ language but rather about a way to work around the rigidly static C++ type system in a reasonably efficient and reusable way – through a library solution. But libraries are languages, so the C++ library solutions for type dynamics are C++’s native ‘dynamic’ languages of sorts. Although the underlying types are still statically defined, due to the weakness of C++ type system, they can be dynamically held; and one can coax them to be readily available at runtime, holding values of different types, that they very much ‘quack and walk’ as a dynamic language... Suddenly, the idea of a native C++ ‘dynamic language’ does not sound so outlandish ...

So, back on track – is it possible to provide dynamic-like behavior within the constraints of standard ANSI/ISO C++? How can a C++ programmer accurately and efficiently transfer data from a database to XML, JSON or HTML without stumbling over the rigid C++ static type-checking mechanism at compile time while ensuring accuracy at runtime? Can type-erasure and (checked) type-conversion techniques fit the bill? Given both historical (ANSI C union and void*, MS COM Variant, boost::[variant, any, lexical_cast], Qt QVariant, adobe::any_regular) and recent (Boost.TypeErasure, Facebook folly::dynamic) development trends (including the pending Boost.Any C++ standard proposal), the need for a way around the static nature of C++ language is obvious. Since the DynamicAny [Fabijanic08a, Fabijanic08b] article, some new solutions [Folly] have appeared, POCO [POCO] has seen several release cycles and Poco::DynamicAny is now known under a new name – Poco::Dynamic::Var. Additionally, the performance and type-safety of number/string conversion has been improved by replacing sscanf/sprintf-based conversion with double-conversion [DoubleConversion] (also used by folly::dynamic).

POCO

In this article series, both externals and internals of boost::[variant, any, type_erasure], folly::dynamic, Poco::Dynamic::Var, Qt QVariant and adobe::any_regular are explored and compared. Design, capabilities, ease of use as well as pros and cons of each solution will be examined. Performance benchmark comparisons results will be provided as well.

POCO
POrtable COmponents C++ Libraries are:
  • A collection of C++ class libraries, concpetually similar to the Java Class Library, the .NET Framework or Apple’s Cocoa.
  • Focused on solutions to frequently-encountered practical problems.
  • Focused on ‘internet-age’ network-centric applications.
  • Written in efficient, modern, 100% ANSI/ISO Standard C++.
  • Based on and complementing the C++ Standard Library/STL.
  • Highly portable and available on many different platforms.
  • Open Source, licensed under the Boost Software License.

In regards to Boost, in spite of some functional overlapping, POCO is best thought of as a Boost complement (rather than replacement). Side-by-side use of Boost and POCO is a very common occurence.

We will start our journey through Dynamic C++ world with a smooth sail – simple, minimalistic and well-known boost::any, a ‘bipolar’ class with deceptively soft, entirely type-agnostic conception and surprisingly rigid, ultra-strongly typed delivery interface (or, should we say, lack thereof). As we move on, the journey takes us into the rough waters of solutions that endeavor, each in its own way, to provide dynamic facilities within the confines of standard C++ and its static type system. The solutions gradually build on existing foundations, attacking the problem from various angles while trying to keep size, performance and datatype integrity under control.

But, first things first – let us start by looking at the concerns shaping the solutions and the ingredients they're made of.

Dynamic concerns

What are the concerns involved with dynamic behavior and how are they solved? Let’s enumerate, disect and analyze them ...

Storing value

This concern has to do with the location where the actual bits representing the value reside. Within the C++ memory model, there are two distinct choices – heap and stack – and a hybrid between the two; more on this later. There are various memory allocation optimization methods that look just like heap allocation from programmer’s standpoint but actually allocate from different places; such constructs are beyond the scope of this article. Let us just mention here that the term ‘stack’ above should be used cautiously; it is very common to refer to placement new techniques constructing objects in a dedicated storage inside a class as ‘stack-based’; that convention is used in this article as well. It is, however, important to remember that there is nothing preventing an object of such class to be allocated on the heap.

Performing operations

The most frequently encountered operations are type conversions, between string and numeric or other values. Furthermore, there are assignment, arithmetic and logical operators. There are other language operations such as bitwise but those are not of concern for this article’s theme. Finally, there are various conversions or transformations; as we will see later, some solutions even provide capability to add custom operations to types at compile time.

Retrieving value

Value retrieval ranges from a strict requirement to explicitly specify the held type, to transparent conversion between different types, sometimes with runtime exceptions thrown if conversion is impossible; with some solutions, it is very easy to venture into undefined behavior if the user is not careful. Sometimes, built-in value retrieval is readily available, while in some cases the user is required to use pre-existing or provide custom external ‘scaffolding’ in order to extract the held value.

Runtime performance

From the runtime performance standpoint, there will typically be two concerns: heap memory allocation and conversion/transformation costs. From this aspect, anything that could be done at compile time, should. Additionally, as mentioned above, small object optimization affects runtime performance in both ways – positively when heap allocation is avoided and negatively every time the value is retrieved.

Memory usage

Memory usage will vary, from the exact type size (plus platform-dependent alignment, if applicable) to a fixed size, large enough to hold the largest stack-based type supported.

Code size

The binary code size generated by various solutions will mostly be proportional to the functionality provided. For example, boost::any code will be small due to non-existent conversion logic. Poco::Dynamic::Var code will be the largest, due to exhaustive involvement in type conversions and accuracy checks. The rest of the solutions are somewhere in between.

Ease of use

Last but not least, this concerns the user experience when dealing with dynamic functionality. Some solutions have rigid compile-time constraints, while some others may exhibit surprising runtime behavior. If the user has to understand its implementation details in order to use a software component correctly, the total value of the abstraction is diminished regardless of the implementation quality. Or, as Scott Meyers succintly puts it, “ Make interfaces easy to use correctly and hard to use incorrectly”.

Data storage

We have several choices for where and how we store the data values. Again, each comes with its own implementation and concerns.

Storing the value on the heap

If the value resides on the heap, we will pay in runtime performance for memory allocation. However, the amount of memory will be variable, commensurate with the size of held type plus platform-dependent padding/alignment.

Implementation:

  • void* and operator new

    This technique provides dynamic-like behavior by virtue of void*, a C language construct allowing pointers to unknown types. The default operator new allocates memory on the heap. Due to the type-independent nature of void*, the newly created entity can be of any type, so the new operator can construct the type needed in the allocated memory; note the difference in word order – new operator first calls operator new and, after the memory is allocated, constructs the object. From that point on, it is up to the ‘dynamic’ solution and programmer to ensure the newly created type value is properly treated. There are some variations on this theme in later described solutions and we will examine them in due course.

Concerns:

  • Allocation overhead

    Memory allocation on the heap can be an expensive runtime operation; for optimization purposes the language allows overloading of the operator new. This allows for various schemes of memory (pre)allocation that alleviate the performance hit imposed by the default operator new.

  • Memory cleanup

    Memory that was allocated on the heap by new must be released with delete.

Storing the value on the stack

If the value resides on the stack, we will invariably pay the storage size of the largest value we wish to store. As mentioned earlier, although commonly referred to as ‘stack’, a more appropriate name would be ‘internal’ because there is nothing preventing the creation of object on the heap.

Implementation:

  • union + tag

    This technique utilizes the C++ union facility. Unlike struct, whose members are laid out in memory next to each other, union can only hold one value at a time because its data members overlap. This union feature provides the same storage location for different types – a feature that can be exploited for dynamic-type-like behavior at runtime without paying the full sum-of-storage price for all the types supported. Additional tag is needed to indicate the currently active union member. In C++03 standard, the limitation is that only POD and classes with trivial construction/destruction can be used. C++11 standard relaxes the only-trivial-construction/destruction limitation.

  • union + placement new

    This technique utilizes the C++ union in combination with placement new. Placement new does not allocate memory but only constructs object in pre-allocated storage. The reason for the use of union is twofold:

    • there is a need for a special-purpose union member ensuring proper alignment for the largest type held when it is not known at compile time
    • there are ‘hybrid’ solutions, mixing types known at compile time with ‘raw’ storage for the unknown types (placement-new-constructed at runtime); the tag indicating currently active type is necessary in this case

Concerns:

  • Size

    Since a union must accomodate the largest type supported, it has to occupy at least the largest type size.

  • Alignment

    In practice, the amount of space needed is often more than largest union member size due to platform-dependent alignment requirements. This means that, if only smaller types are used, sometimes there may be some space not effectively used but consumed nevertheless. Alignment requirements and details are beyond the scope of this writing, but let us just mention here that it is a fairly complex topic, especially in the C++03 context; for details, see [Sutter].

  • Destruction

    When the held object is placement-new constructed in pre-allocated storage, there is no need to explicitly call delete. This, however, means that the destructor has to be called explicitly by the programmer.

Using a hybrid solution

The hybrid solution (also known as small object optimization, configurable at compile time) compromises, to an extent, the stack size concern in order to avoid the heap allocation penalty for types under certain size; this solution, however, imposes runtime penalties of size inspection (a) before instantiation and (b) at every value retrieval.

Implementation:

  • Small Object Optimization

    This is a combined technique of heap- and stack-based storage strategies. The programmer decides and specifies at compile time the maximum object size that can be created on the stack. At runtime, based on the compile-time value, the decision is made whether the new object will be constructed on the stack or the storage for it to be constructed will be allocated on the heap.

Concerns:

  • Runtime detection performance

    Obviously, every creation and retrieval of the value will incur the penalty of the value location detection. There are additional difficulties with assignment and swap operations as well as with exception safety.

  • Stack use

    The fixed stack space is used indiscriminately, even when the value is allocated on the heap (in which case, the stack space usually serves as the pointer storage).

Operations

Finally, we get to choose what sort of operations can be applied to these types. These choices will strongly affect the usability of the types so care and a deep understanding must be used.

Type conversions

Type conversions are the most frequently encountered operations, the most frequent conversions being those between numbers and strings. Conversions between compatible types (e.g. short to int) can often be performed statically. If static conversion is not possible, then dynamic functionality must take its place; this typically involves parsing a string to generate a corresponding number or vice-versa – formatting a number into string. Not all solutions described here are equally cooperative in this area; they range from those not providing any (no pun intended) conversions, via those providing accompanying mechanisms for defining conversion facilities to those providing built-in conversions.

Standard language operations (+, -, ==, ...)

These operations are indispensible for built-in types. They can also be brittle due to many runtime cases where they may make no sense for the held types/values. Therefore the choice is to either not provide them at all, or provide them and throw exception at runtime if the attempted operation makes no sense for the current values. The latter behavior is consistent with the way a dynamic language would behave.

Custom operations

This is an area where things get really complicated – ‘dynamically’ attaching operations to types that do not ‘natively’support them. There are some solutions that provide this functionality. There are also some pitfalls. These will be analyzed and discussed later.

Ingredients

The ‘ingredients’ for the dynamic functionality within C++ ‘recipe’ are summarized in the following list:

  • new
  • placement new
  • void*
  • union
  • virtual functions
  • templates

From the entities listed above, we already discussed new and union; the ones that were not touched on so far are virtual functions and templates.

  • Virtual functions

    Virtual functions are, of course, an indispensable mechanism for runtime polymorphism, providing objects with identical interface that behave differently. They help tremendously in defining conversions and other operations, where it is very convenient to provide default behavior (often throwing an exception) in the parent class and appropriately override it in descendants. Virtual functions inflict both size and performance penalty.

  • Templates

    Templates are another powerful C++ mechanism providing compile-time genericity. When combined with other facilities described here, templates can produce very powerful (but often complicated) programming constructs.

Boost.Any

This well-known class has been around for a long time; at the time of this writing, it is an active proposal for standardization [Dawes12]. According to proposal authors, std::any is a container for “Discriminated types that contain values of different types but do not attempt conversion between them”. This classifies any as a generic (in the sense of ‘general’, not template-based) solution for the first half of the problem – how to accommodate any type in a single container. The ‘syntactic sugar’ is avoided template syntax – any itself is not a template class but it has a template constructor and assignment operator; this is conveniently used to avoid the aesthetically displeasing angle brackets:

  any a = "42";
  any b(42);

What happens ‘under the hood’ is:

  • at compile time, assignment (or construction) code for the appropriate type is generated
  • at run time, the value is assigned to a polymorphic holder instantiated on the heap.

See Listing 1 for an example.

template<typename ValueType>
any(const ValueType & value):content
  (new holder<ValueType>(value))
{
}
			
Listing 1

Runtime dynamism is achieved through polymorphism as shown in Listing 2.

class placeholder
{
public:
  virtual ~placeholder()
  {
  }
// ...
  virtual const std::type_info & type() 
    const = 0;
// ...
}

template<typename ValueType>
class holder : public placeholder
{
public:
  holder(const ValueType & value):held(value)
  {
  }
  // ...
  ValueType held;
};
			
Listing 2

Right away, it is obvious that assignment will incur a performance penalty due to heap allocation, and a size/performance penalty due to virtual inheritance of the internal placeholder. The convenience of any extends from the construction/assignment moment during its lifetime and stops the moment one wants to retrieve the value. Until then, any looks and acts like, well – any value. While it works in a wonderfully transparent manner on the assignment side, the data extraction side is out of any’s ‘scope of supply’ – the class does not offer value retrieval or type conversion functionality; the only way to retrieve the value is through any_cast – a set of free-standing functions that either return the value of the exact held type or throw if something else is requested. Poco::Dynamic::Var takes off where any stops, providing user-extensible conversion facilities for non pre-specialized types; the design, rationale, use and performance of this class hierarchy is described in a later installment of this series of articles.

Poco::Any [POCO.Any] is a port of Boost.Any to POCO.

Boost.Variant

According to the authors [Boost.Variant], Boost.Variant class template is “a safe, generic, stack-based discriminated union container, offering a simple solution for manipulating an object from a heterogeneous set of types in a uniform manner”. It determines the needed storage at compile time, uses boost::mpl and limits the runtime capabilities to types defined at compile time.

The performance penalty of Boost.Any creation and polymorphic nature, as well as its incapability to provide reliable compile-time type detection, were the motivating factors for boost::variant authors. For that reason, variant is stack-based and provides reliable compile-time type detection and value extraction. There is a caveat – to enforce the ‘never empty’ requirement, variant may temporarily allocate storage on the heap to keep the old value for the case when an exception being thrown during assignment. The authors claim to have plans for alleviating this shortcoming.

Faced with a boost::variant, hoping it comes with built-in (or at least accompanying) type conversion facilities, a naïve user may try something like this:

  variant<int, string> v = 1;
  string s = v; // compile error
  boost::get<std::string>(v); // throws

While Boost.Variant offers slightly more cooperation than Boost.Any on the extraction side, it is not seamless or without dangers – intuitive code won’t compile, while the next simplest way is brittle. Authors admit the shortcomings and brittleness of the above approach and provide a visitor mechanism as a vehicle to unleash the full strength of Boost.Variant. The visitor is created by inheriting from the boost::static_visitor<> class template (see Listing 3).

class my_visitor 
   : public boost::static_visitor<int>
{
  public:
  int operator()(int i) const
  { return i; }

  int operator()(const std::string & str) const
    return str.length(); }
};

int main()
{
  boost::variant< int, std::string > u
    ("hello world");
  std::cout << u; // output: hello world

  int result = 
     boost::apply_visitor( my_visitor(), u );
  std::cout << result;
  // output: 11 (i.e., length of "hello world")
}
			
Listing 3

In order to provide the type conversions however, user must define a visitor per destination type, e.g. to facilitate the most common conversion between numbers and strings, the following minimal set of classes is needed, as seen in Listing 4.

struct string_int_converter 
  : public boost::static_visitor<int>
{
  int operator()(int i) const;
  int operator()(const std::string & str) const;
  int operator()(double d) const;
};

struct string_dbl_converter 
  : public boost::static_visitor<double>
{
  double operator()(int i) const;
  double operator()
     (const std::string & str) const;
  double operator()(double d) const;
};

struct num_string_converter 
  : public boost::static_visitor<std::string>
{
  std::string operator()(int i) const;
  std::string operator()
     (const std::string& str) const;
  std::string operator()(double d) const;
};
			
Listing 4

Internally, the variant data is in-place constructed into the storage allocated at compile time and large enough to accommodate the largest datatype specified; storage is a union of char array plus alignment padding (see Listing 5).

variant<char>: (1) 8
variant<int>: (4) 8
variant<float>: (4) 8
variant<double>: (8) 16
variant<std::string>: (32) 40
			
Listing 5

Typically, depending on the size of the held type, there will be some extra space used (e.g. on 64-bit Win8/VS2012), variant<char> will occupy 8 bytes, while variant<std::string> will occupy 40 (see Listing 6).

template <std::size_t size_,
          std::size_t alignment_>
struct aligned_storage_imp
{
  union data_t
  {
    char buf[size_];

    typename mpl::eval_if_c<
        alignment_ == std::size_t(-1)
      , mpl::identity<detail::max_align>
      , type_with_alignment<alignment_>
      >::type align_;
  } 
  data_;
  void* address() const
  {
    return
       const_cast<aligned_storage_imp*>(this);
  }
};
			
Listing 6

The most significant constraint of Boost.Variant is that it can only accept a predefined set of types. If a type is not explicitly listed in the declaration of the variant variable via a template instantiation, that type can not be assigned to it. The never-empty guarantee imposes some significant limitations and there is a discussion going on as to whether it is a reasonable constraint to start with. Default constructing variant as empty would alleviate this problem but it would also introduce the problem of always having to deal with empty in the visitors.

Comparison between Boost.Variant and Boost.Any

For easier understanding of the concepts behind the two classes described so far, boost::any is often compared to ‘type-safe void*’ whereas boost:variant is compared to ‘type-safe union’. While there are certainly similarities, this comparison should be taken cautiously.

Boost.Variant advantages over Boost.Any:

  • guarantees the type of its content is one of a finite, user-specified set of types
  • provides compile-time checked generic visitation of its content (Boost.Any provides no visitation mechanism at all; even if it did, it would need to be checked at run-time)
  • offers an efficient, stack-based storage scheme (avoiding the overhead of dynamic allocation).

Boost.Any advantages over Boost.Variant:

  • allows any type for its content, providing great flexibility
  • provides the no-throw guarantee of exception safety for its swap operation
  • no template meta-programming techniques, which avoids potentially hard-to-read error messages and significant compile-time processor and memory demands.

Conclusion

The analysis of the available techniques, solutions ‘ingredients’ and trade-offs for dynamic-language-like functionality within standard C++ was provided. Two existing solutions, boost::variant and boost::any were described and compared. In the next installment, we will look into more existing solutions with designs/funactionality comparisons and performance benchmarks. Stay tuned ...

Credits

Help in assembling and systematizing the information in this article came from numerous sources. Kevlin Henney provided feedback and constructive discussions on boost::any and the topic in general. Steven Watanabe provided valuable help and guidance on boost::variant and boost::type_erasure, which will be presented in the Part II. Günter Obiltschnig and Andrei Alexandrescu provided valuable feedback and encouragement. The list is, of course, not exhaustive – many other people, discussions, libraries and code samples were an indispensable source of help in gathering and systematizing the information provided in this article.

References

Boost.Any] ‘Boost.Any’, Boost C++ libraries, Kevlin Henney, http://www.boost.org/doc/libs/1_53_0/doc/html/any.html

[Boost.Variant] ‘Boost.Variant’, Boost C++ libraries, Eric Friedman and Itay Maman, http://www.boost.org/doc/libs/1_52_0/doc/html/variant.html

[Dawes12] ‘Any Library Proposal’, Revision 1, Beman Dawes and Kevlin Henney (2012) http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2012/n3390.html

[DoubleConversion] ‘Double-conversion library’, https://code.google.com/p/double-conversion/

[Fabijanic08a] ‘DynamicAny’, Part I, Alex Fabijanic, Overload 86, August 2008, http://accu.org/index.php/journals/1502

[Fabijanic08b] ‘DynamicAny’, Part II, Alex Fabijanic, Overload 87, October 2008, http://accu.org/index.php/journals/1511

[Folly] Facebook folly library, dynamic class, https://github.com/facebook/folly/blob/master/folly/docs/Dynamic.md

[Pirsig] ‘A brief summary of the Metaphysics of Quality’, Robert Pirsig (2005) http://robertpirsig.org/MOQSummary.htm

[POCO] POCO C++ libraries: http://pocoproject.org/

[POCO.Any] Poco::Any, http://pocoproject.org/docs/Poco.Any.html

[Sutter] ‘Construction Unions: A C++ Challenge’, http://www.informit.com/articles/article.aspx?p=360435

More information

[ACCU13] ‘Dynamic C++’, ACCU 2013 Conference, http://www.slideshare.net/aleks-f/dynamic-caccu2013

[C#] ‘Using Type dynamic’, C# Programming Guide, http://msdn.microsoft.com/en-us/library/dd264736.aspx

Overload Journal #115 - June 2013 + Programming Topics