ACCU Home page ACCU Conference Page
Search Contact us ACCU at Flickr ACCU at GitHib ACCU at Google+ ACCU at Facebook ACCU at Linked-in ACCU at Twitter Skip Navigation

pinDynamic C++ (Part 2)

Overload Journal #116 - August 2013 + Programming Topics   Author: Alex Fabijanic and Richard Saunders
Previously we saw how to use some simple dynamic features in C++. Alex Fabijanic and Richard Saunders explore more powerful dynamic tools.

[Y]ou can’t build a system that is completely statically typed.

~ Bjarne Stroustrup [Venners04]

In this installment of the ‘Dynamic C++’ series of articles, we continue to explore the dynamic solutions in C++ language. We start with Boost type_erasure [Boost.TypeErasure], a combination of Boost.Any [Boost.Any] and Boost.Function [Boost.Function], addressing the C++ runtime polymorphism shortcomings. Next, we look into Val, a class at the heart of the PicklingTools library [PicklingTools] aimed at interaction with Python environments. We conclude with Facebook’s folly library solution for interfacing the world of web and JSON from C++ – the dynamic class.

Boost.TypeErasure

According to the author, type_erasure is a generalization of Boost.Any and Boost.Function classes, allowing easy composition of arbitrary type erased operations; it addresses the shortcomings of C++ runtime polymorphism, in particular:

  • intrusiveness
  • dynamic memory management
  • inability to apply multiple independent concepts to a single object.

Library uses some advanced constructs such as concepts and template metaprogramming constructs from boost::mpl. In a similar fashion to boost::variant specifying a set of types at construction time that can be contained at runtime, the type_erasure library specifies at construction time a set of operations that can be performed on it at runtime. This is achieved through a vector of concepts provided at object declaration site as shown in Listing 1 for an incrementable and ostreamable object.

any<
    mpl::vector<
        copy_constructible<>,
        typeid_<>,
        incrementable<>,
        ostreamable<>
    >
> x(10);
++x; // incrementable
std::cout << x << std::endl; // ostreamable
			
Listing 1

In the example, copy_constructible allows copying and destruction of the object, whiletypeid_ provides run-time type information so that any_cast can be used; these effectively make type_erasure any equivalent to any [Boost.Any]. Additionally, incrementable and ostreamable concepts are specified, allowing incrementing and streaming of the value x. Operations can have arguments, so replacing incrementable concept with addable allows adding of two any’s. The functionality is brittle in a subtle way, though – while adding two values of different types will compile, unfortunately it results in undefined behaviour at runtime. This problem can be alleviated by specifying relaxed_match concept (according to author, recently renamed to a more appropriate relaxed name), which causes exception to be thrown. The proper way of dealing with this problem is using placeholders, as shown in Listing 2.

int array[5];

typedef mpl::vector<
    copy_constructible<_a>,
    copy_constructible<_b>,
    typeid_<_a>,
    addable<_a, _b, _a>
> requirements;

tuple<requirements, _a, _b> t(&array[0], 2);
any<requirements, _a> x(get< 0 > (t) +
  get< 1 >(t));
// x now holds array + 2
			
Listing 2

Placeholders are used extensively throughout the library. A placeholder is a substitute for a template parameter in a concept. The library automatically replaces all placeholders with the actual wrapped types.

Furthermore, type_erasure supports references (both const and non-const), as well as user-defined concepts. Listing 3 demonstrates adding stringable concept to type_erasure, allowing a to_string() member function call syntax directly on an integer value wrapped in any. Things can be simplified when implementation and interface are the same (i.e. a member of type_erasure::any called to_string calls a to_string member of the contained type), BOOST_TYPE_ERASURE_MEMBER ‘shortcut’ macro can be used.

template<class F, class T>
struct to_string
{
  // conversion function
  static T apply(const F& from, T& to)
  {
    return to = NumberFormatter::format(from);
  }
};
namespace boost {
namespace type_erasure {
  template<class F, class T, class Base> 
  // binding
  struct concept_interface
     <::to_string<F, T>, Base, F> : Base
  {
    typedef 
       typename rebind_any<Base, T>::type IntType;
    T to_string(IntType arg = IntType())
    {
      return call(::to_string<C, T>(),
                  *this, arg);
    }
}; }
typedef any<to_string<_self, std::string>,
   _self&> stringable;
int i = 123;
stringable s(i);
std::string str = s.to_string(); // s == "123"

			
Listing 3

Internally, a void* pointer points to the held heap-allocated value and a static equivalent of virtual table serves as a binding for attached operations as shown in Listing 4.

// storage
struct storage
{
    storage() {}
    template<class T>
    storage(const T& arg) : data(new T(arg)) {}
    void* data;
};
// binding of concept to actual type
typedef ::boost::type_erasure::binding<Concept> table_type;
// actual storage
::boost::type_erasure::detail::storage data;
                           table_type table;
			
Listing 4

Boost.Type erasure is an interesting and valuable ‘merger’ of any and function features, providing dynamic-language like features within the confines of standard C++. Implementation is rather complex and use has some non-intuitive weak spots that can quickly get an inexperienced user in serious trouble. A more robust interface (even if only with _typedef_s provided for most frequently used types) would greatly improve the usability and safety for less experienced users.

PicklingTools Val

The next reviewed solution to the dynamic typing problem is the PicklingTools [PicklingTools] Val [Saunders1], [Saunders2], [Saunders3]. The PicklingTools library is an open-source library made up of Python, C++ and Java code allowing cross language communication. The PicklingTools evolved from a need to allow Python and C++ to share Python dictionaries across language boundaries. Many modern applications are built using multiple languages: Python, C++, Java, JavaScript, Lua, Icon/Unicon. The front-end languages (JavaScript, Python, Lua) tend to be dynamic languages for handling scripting and basic data flow. The back-end languages (C++, C, Java, FORTRAN) handle the heavy-lifting of fast communications, data I/O and CPU intensive work. In these hybrid systems, front-end languages need to communicate with the back-end languages intensively. Dictionaries, supported in some form by most dynamic languages, became the currency of many systems. The PicklingTools library solution focuses on the Python dictionary and C++.

The Val Name Rationale
Why Val and not something like dynamic or any? Three reasons:
  • Everything is, in general, passed by value
  • PicklingTools encourage use of valgrind to help ensure quality
  • Val is only three letters, which is closer to a dynamic language with no letters for the type.

While making Python dictionaries easy to express in C++ was not a primary goal, given how much users enjoyed the ease of use, the C++ PicklingTools embraced them wholeheartedly. The goal, then, became to make Python dictionaries as easy to express in C++ as they are in Python.

Consider the ease of a dynamic dictionary manipulation in Python shown in Listing 5.

# Create a Python literal
>>> d = { 'a': 1, 'b':2.2, 'c': 
...{ 'X':1, 'Y':[1,2,3]  }}

>>> print d['a']   # lookup a single key: 'a' -> 1
>>> d['b'] = 3.3   # insert into dict

# Also easy to lookup/insert nested entities
>>> print d['c']['X']   # lookup nested key
>>> d['c']['Y'] = 0     # insert nested
}
			
Listing 5

Because C++ is a statically-typed language, it requires a compile-time type for all variables. PicklingTools use Val to indicate a dynamic value.

A side-by-side comparison of basic dynamic typing in Python and C++ using Val is shown in Listing 6.

# Python
>>> a = 1
>>> a = 2.2
>>> a = 'three'    # a takes three different types

// C++
Val a = 1;         // overload constructor
a = 2.2;           // overload op=
a = "three";
			
Listing 6

The PicklingTools Val is implemented as a union and a type-tag, where the value is constructed using placement new inside the union. The destructor has to manually notice which type to destruct (for non-POD types) and explicitly call the correct constructor. The Val is really just a dynamic container, with storage as shown in Listing 7. C++11 standard introduces alignof/alignas to address alignment concerns; in pre-C++11, although standard does not explicitly guarantee it and some experts discourage it [GotW28], a union with a double member is practically close enough guarantee of the alignment to the largest member of the union.

As can be concluded from Listing 7, by design Val can only contain certain types, shown in Listing 8.

struct Val
{
  // Flags: the ascii typetag, subtype for arrays,
  char tag;
  char subtype;
  char isproxy;
  char pad;

  Allocator *a;  // if using shared memory or
                 // special allocator

  union
  {
    int_1  s;      // type tag 's'
    int_u1 S;      // type tag 'S'
    int_2  i;      // type tag 'i'
    int_u2 I;      // type tag 'I'
    int_4  l;      // type tag 'l'
    int_u4 L;      // type tag 'L'
    int_8  x;      // type tag 'x'
    int_u8 X;      // type tag 'X'
    real_4 f;      // type tag 'f'
    real_8 d;      // type tag 'd'
    complex_8  F;  // type tag 'F'
    complex_16 D;  // type tag 'D'
    char t[sizeof(Tab)]; 
                   // type tag 't', usually 32
    char n[sizeof(Arr)];
                   // type tag 'n', usually 32
  } u;
};
			
Listing 7
// POD types
int_1, int_u1, int_2, int_u2,
int_4, int_u4, int_8, int_u8,
real_4, real_8,
complex_8, complex_16, size_t

// Non-POD types
string, None, Tab (like Python dictionary),
Arr (like Python list)
			
Listing 8

There are no-user defined types by design. This allows library writers to concentrate on making the interface for C++ Python dictionaries as close to Python as possible without worrying about the problems generality brings.

Listing 9 shows how easy it is to construct a Val from basic types and get values out by means of user-defined conversion.

// overload Val constructor on all supported types
Val a = 1;
Val b = 2.2;
Val c = "three";
// Get a value out via user-defined conversions
int_u4 i = a;
			
Listing 9

The user-defined conversions and overloading are syntactic sugar. Of course, overloaded cast operators direct calls are really just taking advantage of ‘syntactic sugar’ for function calls as shown below:

  Val v = 1;
  int_u4 i = v.operator int_u4();

The Val supports user-defined conversions for all basic types. If the conversion isn’t direct, then the user-defined conversions follow the Principle of Least Surprise: convert as C++ would (example below).

  Val v = 3.14159265;
  int_u4 i4 = v; // Which conversion?  As C++ would:
     // int_u4 i4 = static_cast<int_u4>(3.14159265);

If the conversion doesn’t make sense, behaviour is identical to that in Python – an exception is thrown, see Listing 10.

# Python
>>> a = 3.14159265
>>> i4 = int(a)      # converts to 3
>>> d = dict(a)      # Exception!  doesn't make
                     # sense to convert to dict

// C++
Val a = 3.14159265;
int_u4 i4 = a;       // converts to 3
Tab d = i4;          // Compile-time error, can't
                     // construct Tab from float
			
Listing 10

The Val provides the basic container to support dynamic dictionaries. The C++ Tab is equivalent to the Python dict (think Tab == Table). The keys of the C++ Tab, aswell as the values of the dictionary are Val’s, allowing us to construct dynamic dictionaries:

  # PythonId-1007573">
  d = {'a':1, 'b':2.2, 'c':'three'}

  // C++
  Tab d = "{'a':1, 'b':2.2, 'c':'three'}";

Note that the C++ literal is inside a string: the library overloads the constructor of the Tab to take a string which contains the literal. While C++11 has some great new literal constructs, it can’t quite mimic Python’s syntax.

Literal construction of a Python dictionary is supported using exactly the Python syntax: you can cut-and-paste the Python dictionary literal and paste it into the C++ quotes. There is a little Python dictionary parser embedded in the Tab class so that it recognizes the same syntax. Rather than invent a new syntax for literal construction, library leverages the well-known Python syntax.

Facebook folly::dynamic

The Facebook folly::dynamic [Folly.Dynamic] class is another one in the spectrum of dynamic-typing classes. The class aims to relax the static typing constraints, especially in the JSON format data manipulation scenarios; it provides a runtime dynamically typed value for C++, similar to the way languages with runtime type systems work (e.g. Python). It can hold types from a predetermined set (ints, bools, arrays of other dynamics, etc), similar to boost::variant and PicklingTools Val, but the syntax is intended to be more akin to using the native type directly.

Facebook String Flavour
Facebook folly::fbstring is a drop-in replacement for std::string, providing the benefit of significantly increased performance on virtually all important primitives. This is achieved by using a three-tiered storage strategy and cooperating with the memory allocator; fbstring is designed to detect use of jemalloc [jemalloc] and cooperate with it to significantly improve speed and memory usage.

Storage strategies

  • Small strings (<=23 chars) are stored in-situ without memory allocation.
  • Medium strings (24-255 chars) are stored in malloc-allocated memory and copied eagerly.
  • Large strings (>255 chars) are stored in malloc-ated memory and copied lazily.

Implementation highlights

  • Compatible with std::string.
  • Thread-safe, reference-counted copy-on-write for large (>255 chars) strings.
  • Uses malloc instead of allocators.
  • Jemalloc-friendly
  • The find() is implemented using Boyer-Moore [Wikipedia].
  • Offers conversions to and from std::string.

Supported architectures are x86 and x64; porting fbstring to big-endian architectures would require changes.

An example of creating a dynamic holding most common types is shown below. Strings are stored internally as fbstring, the Facebook drop-in replacement for std::string.

  dynamic twelve = 12;
  dynamic str = "string";   // fbstring
  dynamic nul = nullptr;
  dynamic boolean = false;

The library extensively uses C++11 features for both speed and syntactic advantages. For example, as shown in Listing 11, arrays can be initialized with initializer lists. This particular feature, however, also imposes a limitation – dynamic has no default constructor. The rationale for this design decision is due to the standard requirement for the expression dynamic d = {} to call default constructor. The conflict arises in the default construction either having to result in d.isArray() (a) being false for the expression dynamic d = {} or (b) being true for dynamic d. The solution the authors of folly::dynamic deemed most appropriate is to entirely disallow the default construction.

dynamic array = {
  "array ", "of ", 4, " elements" };
assert(array.size() == 4);
dynamic emptyArray = {};
assert(array.empty());
			
Listing 11

Maps from dynamics to dynamics are called objects. As shown in Listing 12, the dynamic::object constant is how an empty map fromdynamics to dynamics is created. The same listing also shows how dynamic objects can be created by using object::operator().

dynamic map = dynamic::object;
map["something"] = 12;
map["another_something"] = map["something"] * 2;
dynamic map2 = dynamic::object
  ("something", 12)("another_something", 24);
			
Listing 12

The internal dynamic storage is shown in Listing 13. Types that can be held are: null, Array, bool, double, integer (64-bit), Object and String.

enum Type
{
  NULLT,
  ARRAY,
  BOOL,
  DOUBLE,
  INT64,
  OBJECT,
  STRING,
};
// ...
Type type_;
union Data
{
  explicit Data() : nul(nullptr) {}
  void* nul;    // void* used instead of
                // std::nullptr_t due to gcc bug
  Array array;
  bool boolean;
  double doubl;
  int64_t integer;
  fbstring string;
  typename std::aligned_storage<
    sizeof(std::unordered_map<int,int>),
    alignof(std::unordered_map<int,int>)
  >::type objectBuffer;
} u_;
			
Listing 13

Examples of object and string construction are shown in Listing 14. Most notably, ObjectImpl is not a mere typedef but inherits from hash map; the reason for this to avoid undefined behavior of parameterizing std::unordered_map with an incomplete type.

struct dynamic::ObjectImpl : std::unordered_map<dynamic, dynamic> {};

// ...

inline dynamic::dynamic(ObjectMaker (*)())
  : type_(OBJECT)
{
  new (getAddress<ObjectImpl>()) ObjectImpl();
}

inline dynamic::dynamic(char const* s)
  : type_(STRING)
{
  new (&u_.string) fbstring(s);
}

			
Listing 14

The gist of the folly’s conversion facilities is shown in the Listing 15. The listing shows the code involved in conversion of dynamic to fbstring. The actual code of converting ‘anything to anything’, as the documentation states, is in a separate header and too large for inclusion here. For binary/decimal and vice-versa conversion of IEEE doubles, the class uses V8 double-conversion [Double.Conversion].

template<> struct dynamic::GetAddrImpl<bool> {
  static bool* get(Data& d) {
    return &d.boolean; }
};
template<> struct dynamic::GetAddrImpl<int64_t> {
  static int64_t* get(Data& d) {
    return &d.integer; }
};
template<class T>
T* dynamic::getAddress() {
  return GetAddrImpl<T>::get(u_);
}
template<class T>
T* dynamic::get_nothrow() {
  if (type_ != TypeInfo<T>::type) {
    return nullptr;
  }
  return getAddress<T>();
}
template<class T>
T dynamic::asImpl() const
{
  switch (type())
  {
    case INT64:
      return to<T>(*get_nothrow<int64_t>());
    case DOUBLE:
      return to<T>(*get_nothrow<double>());
    case BOOL:
      return to<T>(*get_nothrow<bool>());
    case STRING:
      return to<T>(*get_nothrow<fbstring>());
    default:
      throw TypeError("int/double/bool/string",
                      type());
  }
}
inline fbstring dynamic::asString() const
{
  return asImpl<fbstring>();
} 
			
Listing 15

As will be shown in one of the next installments, dynamic provides a very nice user interface, yet also provides a lot in terms of performance. It is a class designed with definite business goal in mind and it succeeds in that endeavor. The only downside for the whole folly library is a patchy build system which requires a significant effort to build the library. The library is also not portable, at least not in the out-of-the-box fashion.

Conclusion

In this installment, we reviewed three C++ dynamic typing solutions: Boost type_erasure, PicklingTools Val and Facebook folly::dynamic. While dynamic and Val provide dynamically-typed storage within the confines of the standard C++, type_erasure also ventures in a new direction by adding operations to C++ types. In the next installment, we’ll look into more similar solutions, so stay tuned …

Credits

Steven Watanabe provided valuable advice on boost::type_erasure. The list is, of course, not inclusive - many other people, discussions, libraries and code samples were an indispensable source of help in gathering and systematizing this writing.

References and further information

[Boost.Any] http://www.boost.org/doc/libs/1_53_0/doc/html/any.html

[Boost.Function] http://www.boost.org/doc/libs/1_53_0/doc/html/function.html

[Boost.TypeErasure] http://www.boost.org/doc/libs/1_54_0/doc/html/boost_typeerasure.html

[Double.Conversion] ‘Double-conversion library’ https://code.google.com/p/double-conversion/

[Folly.Dynamic] Facebook folly library, dynamic class – https://github.com/facebook/folly/blob/master/folly/docs/Dynamic.md

[GotW28] The Fast Pimpl Idiom http://www.gotw.ca/gotw/028.htm

[jemalloc] A general-purpose scalable concurrent malloc(3) implementation http://www.canonware.com/jemalloc/

[PicklingTools] The PicklingTools Library www.picklingtools.com

[Saunders1] ‘Dynamic, Recursive, Heterogeneous Types in Statically-Typed Languages’, Clinton Jeffery, Richard Saunders, C++ Now 2013 Presentation, http://cppnow.org/session/dynamic-recursive-heterogeneous-types-in-statically-typed-languages/

[Saunders2] ‘Dynamic, Recursive, Heterogeneous Types in Statically-Typed Languages’ Clinton Jeffery, Richard Saunders http://cppnow.org/files/2013/03/saunders-jeffery.pdf

[Saunders3] C++ Now 2013 Presentation, Richard Saunders http://www.youtube.com/watch?v=W3TsQtnMtqg

[Venners04] ‘Abstraction and Efficiency: A Conversation with Bjarne Stroustrup’ by Bill Venners, February 16, 2004 http://www.artima.com/intv/abstreffi.html

[Wikipedia] Boyer–Moore string search algorithm http://en.wikipedia.org/wiki/Boyer%E2%80%93Moore_string_search_algorithm

Further information

‘Dynamic C++’, ACCU 2013 Conferencehttp://www.slideshare.net/aleks-f/dynamic-caccu2013

Facebook folly library, fbstring class – https://github.com/facebook/folly/blob/master/folly/docs/FBString.md

Overload Journal #116 - August 2013 + Programming Topics