Classdesc: A Reflection System for C++11

C++ lacks direct support for reflection. Russell Standish brings an automated reflection system for C++, Classdesc, up to date.

Classdesc is a system for providing pure C++ automated reflection support for C++ objects that has been in active development and use since the year 2000. Classdesc consists of a simplified C++ parser/code generator along with support libraries implementing a range of reflection tasks, such as serialisation to/from a variety of formats and exposure of the C++ objects to other language environments.

With the increasing adoption of the new C++ standard, C++11, amongst compilers, and in C++ development, it is time for Classdesc to be adapted to support C++11. This is in terms of being able to parse new language constructs, and also to support new concepts, such as enum class. Finally, there is the opportunity to leverage things like variadic templates to generalise and simplify support for function member processing.

Introduction

Classdesc [ Madina01 ] is a system for automated reflection in C++ that has been in continual development since the year 2000, and has been deployed in a number of open source and commercial projects. Classdesc was first published as part of EcoLab 4.D1 in April 2001, before being separated and released as Classdesc 1.D1 in February 2002.

To understand Classdesc, it is necessary to consider one of C++’s more powerful features, the compiler generated constructors and assignment operators. Consider, for example, the default copy constructor, which the C++ compiler generates as needed for any user defined class if the user has not explicitly provided a custom version. To create a new object identical to an existing object, the most common situation is that each component of the existing object is copied to the new object, along with calling the inherited copy constructor of any base class. The copy constructors of each component is called in a hierarchical fashion. At any point in the hierarchy, the programmer can provide a custom copy operator, that suppresses the compiler generated version. An example of a custom copy constructor might be say in the implementation of a vector class, which will typically have a size member, and a pointer to some storage on the heap. The compiler generated copy will copy these members, implementing what is known as a shallow copy. But semantically, one needs a deep copy, where a copy of the data is created, so C++ allows the programmer to provide a deep copy implementation that copies the data as well, possibly even as a ‘copy-on-write’ implementation that only pays the copy cost when needed.

Just like copy constructors, the default constructor, default destructor and assignment operator are similarly compiler generated to implement a hierarchical call of the respective default constructors/destructors and assignment operators of the component parts. In C++11, two new compiler generated hierarchical methods were added: the move constructor, and move assignment operator, where the object being moved from is left in a valid, but necessarily empty state. Move operations are typically used when the original object is no longer accessed. In the vector example above, the move operation may simply move the pointer to the data into the new structure, rather than copying the data, filling the original pointer in with NULL, a far more efficient operation than regular assignment.

Classdesc takes this notion of compiler generated methods to arbitrary user defined methods, called descriptors, extending the six compiler generated methods defined in C++11. It uses a code generator to create these, based on parsing the C++ class definitions. Because class definitions are closed to extension, descriptors are actually implemented as global functions, with the following signature (eg the pack descriptor):

  template <class T>
  void pack(classdesc::pack_t&, 
    const std::string& name, T& a);

The Classdesc code generator generates recursive calls of this function on each component of a . The first argument is a descriptor dependent structure that can be used for any purpose. In this serialisation example, it will hold the serialised data, or a reference to a stream. The second string argument is used to pass a dot-concatenated name of the members being passed as the third argument – eg "foo.bar.a" . This is important for applications that need member names, such as the generation of XML [ Bray08 ] or JSON [ ECMA13 ] plain text representations.

The user has the option of providing their own implementation of the descriptor for specific classes when needed, without having to provide code for the usual (tedious) case. Just as with compiler generated assignment operators, the Classdesc generated code is automatically updated as the class definition evolves during the lifetime of the code. The way this is arranged is to hook the classdesc processor into the build system being used. For example, with a Makefile, one defines an automated rule such as

  .SUFFIXES: .cd $(SUFFIXES)
  .h.cd:
    classdesc -nodef -typeName -i \
    $< json_pack json_unpack >$@

Furthermore, if automatic generation of Makefile dependency rules is enabled, then simply adding the line

  #include "foo.cd"

to a C++ source file is sufficient to cause make to invoke classdesc on the header file foo.h and create or update the necessary reflection code, and make it available to the compilation module.

The Classdesc distribution comes with descriptors for serialising to/from a binary stream (even machine independent via the XDR library), to/from an XML or JSON streams, and exposure of C++ objects to the JVM via the JNI interface. In EcoLab [ Standish03 ], a C++ simulation environment oriented towards agent based modelling, C++ objects are exposed to a TCL programming environment, providing almost instant scriptability of a C++ written object.

Alternative C++ reflection systems

In other languages, such as Java, reflection is supported by the system providing objects representing the classes. One can use these class objects to navigate the members and base classes of the objects, querying attributes of those members as you go. Such a reflection system can be called a runtime reflection system, as the code navigates the reflected information at runtime.

Classdesc generated code is a static reflection system, in that a descriptor corresponds to a fixed traversal of the object's component tree. This can be used to generate a class object like above, and so formally is at least as powerful as a runtime reflection system. However, for the more usual special purpose applications, such as serialisation, the entire object's serialisation is compiled, leading to better performance than a pure runtime system would.

Since the C++ language only exposes limited type information to the programmer, some form of a preprocessing system is required. Classic special purpose reflection systems include the CORBA IDL [ Open12 ] (which defines another language – the interface definition language ) which is translated into C++ code to allow exposure of an object’s methods to another object running in a remote address space (i.e. remote procedure calls), Qt’s moc processor, which extends C++ with a slot and signal model suitable for wiring up GUI components, and Swig [ Beazley96 ] which allows for exposure of object methods and attributes to a range of other programming languages. All of these systems are preprocessors of some language that is sort of an extension of C++, generating standard C++ on its output, so strictly speaking are not reflection systems, but are used for the same sorts of tasks reflection is. Classdesc differs from each of these approaches by processing the same standard C++ header files that the compiler sees.

OpenC++ [ Chiba95 ] tries to generalise this by creating a metaobject processing language which controls a source-to-source code translater, adding in the reflection information before the code is seen by the compiler. Unfortunately, it is now rather dated, and no longer capable of supporting modern dialects of C++.

Chochlík [ Chochlík12 ] reviews a number of other reflection systems, such as SEAL [ Roiser04 ], under the umbrella of ‘runtime reflection systems’. SEAL is definitely like that, with a runtime class object being generated by a perl script that analyses the output of gcc_xml , a variant of gcc’s C++ front end that output an XML representation of the parse tree.

On the other hand, Mirror [ Chochlík12 ] is a fully static reflection system, with metaprogramming techniques allowing traversal of type information at compile time. A script Maureen uses the Doxygen parser to generate the templates supporting the metaprogramming. Unfortunately, the library did not compile on my system, nor did the Maureen script run, but this is perhaps only an indication of the experimental status of a quite ambitious system, rather than unsoundness of the general approach.

The Classdesc system described here was also described by Chochlík as a runtime reflection system, which is not quite accurate. Whilst it is true that Classdesc can be used to generate runtime class objects like runtime reflection systems, so is at least as general as those, that is not how it is usually used. Rather it should be considered as a special purpose static reflection system, admittedly not as general as the full blown static reflection system provided by Mirror .

Dynamic polymorphism

Dynamic polymorphism is the capability of handling a range of different types of object via a common interface, which in C++ is a common base class of the object types being represented, with special methods (called virtual methods) that reference the specific method implementations appropriate for each specific type.

For Classdesc to work properly, the correct descriptor overload needs to be called for the actual type being represented, which requires that the base class implement a virtual method for calling the actual descriptor. For example, consider an interface Foo implementing a JSON serialiser (Listing 1).

class Foo
{
  public:
  virtual string json() const=0;
};
class Bar: public Foo
{
  public:
  string json() const override 
    {return ::json(*this);}
  ...
};

Listing 1

This ensures that no matter what type a reference to a Foo object refers to, the correct automatically generated JSON serialiser for that type is called.

Whilst this technique is simple enough, and has the advantage that changes to Bar are automatically reflected in the json method, it is still tedious to have to provide even the one liner above (particularly if multiple descriptor methods are required), moreover error prone if the base class ( Foo in this case) needs to be concrete for some reason, eliminating the protection provided by the pure virtual specifier.

As an alternative, Classdesc descriptors provide ‘mixin’ interface classes, that can be used via the Curiously Recurring Template Pattern :

  class Bar: public classdesc::PolyJson<Bar>
  {
    ...
  };

This adds the PolyJsonBase interface class, which defines the following virtual methods, which covariantly call the appropriate descriptor for the concrete derived class Bar (Listing 2).

struct PolyJsonBase
{
  virtual void json_pack(json_pack_t&,
    const string&) const=0;
  virtual void json_unpack(json_unpack_t&,
    const string&)=0;
  virtual ~PolyJsonBase() {}
};

Listing 2

Speaking of json_unpack , which is the converse deserialisation operation, the desirable action would be for an object of the appropriate type to be created and then populated from the JSON data. To pull off this trick, we have to pack and unpack from/to a smart pointer class, such as std::shared_ptr or std::unique_ptr . On packing, an extra ‘type’ attribute is added to the JSON stream, which is used to call a factory create method in the json_unpack attribute. The type of the ‘type’ attribute can be anything, but popular choices are enums (which translate to a symbolic representation within the JSON stream) or strings, using the Classdesc provided typeName<T>() method to return a human readable type string for T .

Much of the work can be eliminated by adding the following mixin (available in the polyBase.h header file) to your polymorphic type, where T is the type of your ‘type’ attribute (Listing 3).

template <class T>
struct PolyBase: public PolyBaseMarker
{
  typedef T Type;
  virtual Type type() const=0;
  virtual PolyBase* clone() const=0;
  /// cloneT is more user friendly way of getting
  /// clone to return the correct type. 
  /// Returns NULL if \a U is invalid
  template <class U>
  U* cloneT() const {
    std::auto_ptr<PolyBase> p(clone());
    U* t=dynamic_cast<U*>(p.get());
    if (t)
      p.release();
    return t;
  }
  virtual ~PolyBase() {}
};
template <class T, class Base>
struct Poly: virtual public Base
{
  /// clone has to return a Poly* to satisfy
  /// covariance
  Poly* clone() const 
  {return new T(*static_cast<const T*>(this));}
};

Listing 3

The only thing needed to be implemented in the derived class is the type() method. An enum type implementation might be something like Listing 4.

enum class MyTypes {foo, bar, foobar};
class FooBase: public PolyBase<MyTypes> {};
template <enum class MyTypes T>
class Foo: public Poly<Foo, FooBase>
{
  MyTypes type() const {return T;}
  ...
};

Listing 4

A string type implementation might look like Listing 5 and a full example, including the JSON descriptor methods would be as shown in Listing 6. Note the use of virtual inheritance to ensure that only a single version of PolyJsonBase is in the inheritance hierarchy.

class FooBase: public PolyBase<std::string> {};
template <class T>
class FooTypeBase: public Poly<T, FooBase>
{
  std::string type() const 
  {return classdesc::typeName<T>();}
};
class Foo: public FooTypeBase<Foo>
{
  ...
};

Listing 5

class FooBase:
  public classdesc::PolyBase<std::string>,
  public virtual classdesc::PolyJsonBase
{
  static FooBase* create(std::string);
};

template <class T>
class FooTypeBase:
  public classdesc::Poly<T, FooBase>,
  public classdesc::PolyJson<T>
{
  std::string type() const 
  {return classdesc::typeName<T>();}
};

class Foo: public FooTypeBase<Foo>
{
  ...
};

Listing 6

The static method FooBase::create needs to be supplied, but even here, Classdesc provides assistance in the form of a Factory class (Listing 7).

class FooBase:
  public classdesc::PolyBase<std::string>,
  public classdesc::Factory<FooBase, std::string>
{};

template <class T>
class FooTypeBase:
  public Poly<T, FooBase>,
  public PolyJsonBase<T>
{
  std::string type() const 
  {return classdesc::typeName<T>();}
};

class Foo: public FooTypeBase<Foo>
{
  Foo() {registerType(type());}
};

Listing 7

Parsing of C++11 code

The first task was to test the Classdesc parser/code generator on the new C++11 features. Stroustrup [ Stroustrup13, p 1268 ] provides a convenient 40-point list of the new language features, which provided the starting point for the work to update Classdesc.

Since the Classdesc parser only parses user defined types, namely classes, structs and enums, language features that only appear within the code bodies of functions or class methods can be ignored. A test header file was created with those features that might cause problems to Classdesc, namely:

enum class (item 7)
brace style initialisation of members both inline and within constructors (item 1)
inline member initialisation (item 30)
alignas type attribute (item 17)
constexpr (item 4)
default and delete declarations (item 31)
spaceless closing of nested templates (item 11)
attributes, in particular [[noreturn]] (item 24)
noexcept attribute (item 25)
deduced method return type (item 23)
variadic template arguments (item 13)

and then a set of unit test cases linking this header file, and all the classdesc provided descriptors was created to ensure completeness of the work.

Of these new features listed above, about half required changes to Classdesc, which will be described in more detail in the following section.

New C++11 features

Enum class

C++ introduces a new user defined type called enum class. This pretty much fixes a name scope problem with the original C enum type. Classdesc was modified to process enum classes in the same way that it processes enums, which includes the generation of a symbolic lookup table (map of the enum tag names as strings to/from the numerical tag values). This is desirable for generation and parsing of formats such as XML or JSON, for which the numerical values are not meaningful, and may even differ from the C++ numerical values if processed by a different language.

Smart pointers

C++11 adds a couple of new smart pointers: the unique_ptr and shared_ptr , and deprecates an existing one ( auto_ptr ). These pointers, in particular the shared_ptr , have been available in external libraries for some time, notably in the boost library [ Boost ], and then later in a precursor to the C++11 standard library known as TR1 [ TR05 ]. Classdesc has required the use of shared_ptr s for some time to adequately support dynamic polymorphism, as well as containers of noncopiable objects. Since the design of Classdesc is to not rely on 3rd party libraries like boost, by preference it will use C++11 std::shared_ptr if available, otherwise the TR1 shared pointer, and only use boost shared pointers as a last resort.

The problem is that these three different smart pointer implementations are distinct types. The solution in Classdesc is to define a typedef alias of shared_ptr in the classdesc namespace, which refers to whichever implementation is being used. A similar consideration applies to a variety of metaprogramming support functions (eg is_class ) which have been introduced into the language via external libraries before being finally standardised in C++11.

If a pre-C++ compiler is used, one can select the version of shared_ptr to be used by defining the TR1 or the BOOST_TR1 macros respectively. If neither macro is defined, then the tr1 namespace is assumed to be defined in the standard <memory> header file, as is the case with Microsoft’s Visual C++.

New STL containers

C++11 introduces four new containers starting with the name ‘unordered’, which provide hash map functionality in the standard library, along with a single linked list. Whilst, one could have extended the standard Classdesc provided descriptors to cover these new containers, it highlighted that it was high time to deal with these in a more generic way. Consequently, Classdesc now introduces metaprogramming type attributes to represent the sequence and associative container concepts from the Standard Template Library (STL). Descriptors are now implemented in terms of these type attributes, instead of referring directly to the container types: vector, list, set etc. The STL container types have these attributes defined in the classdesc.h header. Users can avail themselves of this descriptor support for their own custom container types merely by defining an is_sequence or an is_associative_container type attribute as appropriate:

  namespace classdesc
  {
    template <>
    struct is_sequence<MySequenceContainer>
    { static const bool value=true; }
  };

New fundamental types

C++11 adds a whole slew of new types, including explicitly 16 and 32 bit wide characters ( char16_t and char32_t ), a long long integer and a number of type-name aliases for explicitly referring an integer’s size (eg int64_t ). Typename aliases do not cause a problem for classdesc, as their use will be covered by the type they are an alias for. However, traditional descriptor implementations required explicit implementations for all the fundamental types, so supporting these new types requires a lot of additional code. So the decision was made to rewrite as much descriptor code as possible using type traits such as is_fundamental , and use metaprogramming techniques [ Veldhuizen95 ].

There is, however, one place where all the basic types need to be enumerated, and that is in the implementation of the typeName template, which returns a human readable string representation of the type. Explicit template specialisation for each fundamental type needs to be provided. One might ask why type_info::name() could not be used for this purpose. Unfortunately, the standard does not specify how compiler should map type names to strings, and compilers often choose quite mangled names that are unsuitable for some reflection purposes, such as in XML processing, in a compiler independent way.

Opportunities

Move operators

Certain Classdesc types, such as pack buffers, are unable to be copied. In particular, lacking copiability restricts the use of these objects in containers, unless stored as smart pointers. C++11 provides the concept of move construction and assignment, where the source object in a valid, but empty state, which can usually be implemented in an inexpensive and simple fashion. An object with a move operator can be used in a C++11 container. The update to Classdesc provides implementations of these move operators where appropriate to extend their use cases.

Functional support

Classdesc provides a metaprogramming library to support the analysis of method signatures for supporting the exposure of methods to other programming environments, such as the JVM. Key metaprogramming requirements are arity (number of arguments), the individual types of each argument Arg<F, i>::type and the return type of the function or method type F . For pre-C++11 compilers, this was implemented for all cases up to a certain number of arguments (usually 10), with the code being generated by a Bourne shell script. This peculiar break from a pure C++ solution was deemed a necessary evil – in practice the provided code limited to 10 arguments suffices for most practical cases, and if not, then access to a Bourne shell to generate support for higher arities is usually not difficult to arrange.

Nevertheless, C++11’s variadic templates allows the possibility of handling an arbitrary number of function arguments without the need to generate specific templates from a script. At the time of writing, though, this has not been implemented in Classdesc.

A proposal for an extension to the C++ language

Since Classdesc-provided reflection naturally maps to the same recursive hierarchical concept as do the compiler generated constructors and assignment operators, and C++11 has introduced a new syntactic construct based on the default keyword that forces the compiler to generate these methods, this leads to a natural proposal. That is, allow any method signature suffixed by default to have a compiler generated body that recursively applies that method signature to the object’s components.

For example, consider a struct declared as:

  struct Example
  {
    A a;
    B b;
    void pack(pack_t& buf)=default;
  };

The compiler is instructed to generate the method:

  void Example::pack(pack_t& buf)
  {
    a.pack(buf);
    b.pack(buf);
  }

There are some subtleties that need to be worked out. For example, if the class is a mix of private and public attributes, we need to be able to specify whether private attributes should be processed (eg in serialisation applications) or not (eg in exposing object APIs to another language), a feature currently implemented as a flag on the classdesc code generator command line. One suggestion is to qualify the default keyword with public/protected/private to indicate which attributes are processed:

  void pack(pack_t& buf)=default private;

The question is what do if the access qualifier is not specified. Should it default to private, which is effectively what the existing compiler generated methods do, or should it be public, which would be the more common usage.

The next issue is how to handle the situation where a type does not have a method of that name defined? The obvious solution is to borrow from operator overloading, and if (say) a.pack(buf) is an invalid expression, substitute pack(a,buf) , resolved according to the usual namespace resolution rules. This will allow writers of descriptors to add new descriptors to existing types, particularly the fundamental types.

The final issue to address is how to implement an equivalent of the covariant member name feature of Classdesc, which is available as a string passed as the second argument of the descriptor. The most obvious suggestion is a magic type declared in namespace std (in the same way that std::initializer_list is magically populated by brace initialisers), that will be populated by the appropriate hierarchical list of names of the member.

  void xml_pack(pack_t& buf,
    std::refl_name)=default;

This will be populated in the compiler generated code as follows:

  void Example::xml_pack(pack_t& buf,
    std::refl_name nm)
  {
    a.xml_pack(buf,nm+"a");
    b.pack(buf,nm+"b");
  }

refl_name will be a sequence, and can be iterated over by the usual means, with perhaps a concatenation method to return the dot separated list currently used by Classdesc.

Proposals for reflection in C++17

Support for reflection in standard C++ has been discussed a number of times, but generally been dismissed as requiring metaobjects to be present in the resulting executable, even though the classes themselves may not end up being used by the programmer, and hence can be optimised away. Clearly, this is only an objection for a traditional runtime reflection systems, not static systems – for example, even using Classdesc to generate metaobjects, the metaobject will only exist if explicitly created by the programmer calling the appropriate descriptor, which is just a standard function that can be eliminated by the linker if not used.

Nevertheless, Carruth and Snyder [ Carruth13 ] issued a general call for reflection proposals for consideration of inclusion in the next (C++17) standard. To date, two proposals have been put forward: Chochlík’s [ Chochlík15 ], which is largely based on the Mirror library and Silva and Auresco’s [ Auresco15 ], who propose extending the keywords typename and typedef to return variadic templates that can be used in a metaprogramming context, but does not specify any extensions to the standard type traits library. In Chochlík’s proposal, a new operator (tentatively mirrored ) returns a compile time static metaobject, that can iterated over, or otherwise queried. In a way, the two proposals mesh together quite well. The mirror library is a quite well thought out reflection library extending std::type_trait s, but the actual structure of the MetaObjectSequence is not well specified in Chochlík’s proposal. On the other hand, Silva and Auresco’s idea of using variadic templates to encode the MetaObjectSequence concept at least fits in with how ‘loops’ are currently implemented in C++11 metaprogramming, and has the further advantage of not requiring new keywords.

In either proposal, Classdesc-like functionality could be achieved by arranging for the generic descriptor template to be a metaprogrammed loop of the class members.

Conclusion

Classdesc has been under development for 15 years, and has a reputation for being a solid, no-fuss, portable reflection system for C++. With the increasing use of C++11 code, it was time to bring Classdesc up to date with the new C++ standard, which has now been achieved.

References

[Auresco15] Daniel Auresco, Cleiton Santoia Silva. ‘From a type T, gather members name and type information, via variadic template expansion. Technical Report N4447, ISO/IEC JTC, 2015. http://www.open-std.org/JTC1/SC22/WG21/docs/papers/2015/n4447.pdf

[Beazley96] David M. Beazley. ‘SWIG : An easy to use tool for integrating scripting languages with C and C+’+. In Proceedings of 4th Annual USENIX Tcl/Tk Workshop . USENIX, 1996. http://www.usenix.org/publications/library/proceedings/tcl96/beazley.html

[Boost] Boost C++ Libraries. http://www.boost.org/

[Bray08] Tim Bray, Jean Paoli, C.M. Sperberg-McQueen, Eve Maler, and François Yergeau. ‘Extensible markup language (XML) 1.0 (fifth edition)’. Technical report, W3C, 2008. http://www.w3.org/TR/2008/REC-xml-20081126/ .

[Carruth13] C. Carruth, J. Snyder. ‘Call for compile-time reflection proposals’. Technical Report N3814, ISO/IEC JTC, 2013. http://www.open-std.org/JTC1/SC22/WG21/docs/papers/2013/n3814.html

[Chiba95] Shigeru Chiba. ‘A metaobject protocol for C++’. In Proceedings of the ACM Conference on Object-Oriented Programming Systems, Languages, and Applications , pages 285–299, 1995.

[Chochlík12] Matúš Chochlík. ‘Portable reflection for C++ with mirror’. Journal of Information and Organizational Sciences , 36(1):13.26, 2012.

[Chochlík15] Matúš Chochlík. ‘Static reflection’. Technical Report N4451, ISO/IEC JTC, 2015. http://www.open-std.org/JTC1/SC22/WG21/docs/papers/2015/n4451.pdf .

[ECMA13] ECMA. The JSON data interchange format. Technical Report ECMA-404, ECMA International, 2013. http://www.ecma-international.org/publications/standards/Ecma-404.htm

[Madina01] Duraid Madina and Russell K. Standish. ‘A system for reflection in C++’. In Proceedings of AUUG2001: Always on and Everywhere , page 207. Australian Unix Users Group, 2001.

[Open12] Open Management Group. ‘C++ language mapping’. Technical report, OpenManagement Group, http://www.omg.org/spec/CPP/1.3 , 2012. Version 1.3.

[Roiser04] S. Roiser and P. Mato. ‘The SEAL C++ reflection system’. In Proceedings of Computing in High Energy Physics , CHEP ’04, Interlaken, Switzerland, 2004. http://chep2004.web.cern.ch/chep2004/

[Standish03] Russell K. Standish and Richard Leow. ‘EcoLab: Agent based modeling for C++ programmers’. In Proceedings SwarmFest 2003 , 2003. arXiv:cs.MA/0401026.

[Stroustrup13] Bjarne Stroustrup. The C++ Programming Language . Addison-Wesley, 4th edition, 2013.

[TR05 ] Draft technical report on C++ library extensions. Technical Report DTR 19768 , International Standards Organization, 2005.

[Veldhuizen95] Todd Veldhuizen. Using C++ template metaprograms. C++ Report, 7:36.43, 1995.