ACCU Home page ACCU Conference Page
Search Contact us ACCU at Flickr ACCU at GitHib ACCU at Google+ ACCU at Facebook ACCU at Linked-in ACCU at Twitter Skip Navigation

pinC++ Reflection for Python Binding

Programming Topics + Overload Journal #152 - August 2019   Author: Russell Standish
There are various approaches to generating Python bindings in C++. Russell Standish shows how Classdesc can be used to achieve this.

Since 2000, Classdesc has provided reflection capability for C++ objects, with no dependencies other than the standard C++ library. Typical applications include serialisation of objects to a variety of different stream formats, and language bindings to non-C++ languages. With the increased popularity of Python, this work looks at automatically providing Python bindings to C++ objects using the Classdesc system.

Introduction

Classdesc is a mature C++ reflection system with nearly two decades of continuous development [Madina01], [Standish16]. It has often been used to implement automatic serialisation of objects, but also for the automatic creation of language bindings, in particular for Objective C [Leow03] (largely obsoleted by the Objective C++ language), TCL [Standish03] and Java, leveraging the JNI API [Liang99].

At its core, Classdesc is a C++ preprocessor that reads C++ header files and generates overloaded functions that recursively call themselves on members of the class. The collection of overloaded functions is called a descriptor. More details can be found in [Standish16].

With the recent popularity of Python as a scripting language, and also this author’s adoption of Python as a general purpose scripting language, this work seeks to apply Classdesc to the problem of automatically generating Python language bindings for C++ objects.

The CPython implementation supplies a C language API [Rossum02] which is quite low level. The Boost library provides a much higher level API over the top of the C API [Abrahams03], more closely suited to reflecting C++ objects into Python objects, providing a much appreciated leg up for creating a Python binding descriptor.

Background

Boost Python

The ever popular Boost library contains a C++ abstraction layer over the top of Python’s C API [Abrahams03]. This provides a mapping between C++ objects and equivalent Python objects. Since C++ does not provide native reflection capability, a programmer using Boost.Python must explicitly code bindings for methods and attributes to be exposed to Python. This is made as simple as possible, using C++ templates, but still involves quite a bit of boilerplate code on behalf of the user, with the concommitent maintenance overhead as class definitions evolve during software development. This work seeks to extend the Classdesc library to automatically provide these bindings from the Classdesc processor. In this section I summarise the features of Boost.Python used for the Classdesc Python descriptor.

  • Exposing C++ classes to Python

    To start defining a Python class, the programmer needs to instantiate an object of the type class_<T>, where T is the C++ class type being exposed, and the single string argument is used to give the Python type name. By default, T is assumed to be default constructible and copyable. If either one of these assumptions is not true, then the boost::python::no_init object or the boost::noncopyable type must be passed to the class_ constructor as appropriate. No particular reason is given why it is an object in one case, and a type in the other. In this Classdesc library, we use standard C++ type traits to pass these additional arguments to Boost.Python, according to the type T being processed. This will be discussed in more detail in ‘Default and Copy constructibility’.

  • Exposing methods

    With the previously defined class object, you can expose methods with the def method, for example:

          class<Foo>("Foo").
            def("bar", &Foo::bar).
            def("foobar", &Foo::foobar);

    The def method (as well as the equivalent attribute methods) returns a reference to this, so that the calls can be chained as shown above.

    It should be noted that by default, the exposed methods return a copy of the returned object. This is not always what you want – particularly if the method returns a reference to an internally contained object, such as the indexing method of a vector or map. In such a case, you would want to be able to call mutating method or set attributes of the returned object, not a copy.

    Boost.Python does provide a method for manually specifying return by reference, but this requires additional manual specification by means of passing an additional argument of type boost::python::return_internal_reference<> to def. In this Classdesc library, we use C++-11 metaprogramming techniques to distinguish methods returning a reference, and those returning a value, and supply the extra Boost.Python specification automatically. This will be described in more detail in ‘Reference returning methods’.

  • Overloaded methods

    Boost.Python does support overloaded methods. In order for this to work, you need to specify the functional signature for the method pointer, eg:

          void (Foo::*bar0)()=&Foo::bar;
          void (Foo::*bar1)(int)=&Foo::bar;
          class_<Foo>("Foo").
            def("bar",bar0).
            def("bar",bar1);

    will expose the overloaded bar method to Python. In this work, the Classdesc processor, which hitherto ignored overloaded methods, is extended to provide these functional signatures in order to support overloaded methods. This is described in more detail in ‘Handling overloaded methods’.

  • Exposing attributes

    Similar to exposing methods, attributes are exposed using the specific boost::python::class_<T> methods def_readwrite and def_readonly. Similar to the case with return reference values, we use metaprogramming techniques to automatically distinguish between const attributes and mutable ones. Details can be found in ‘Reference returning methods’.

  • Global functions

    Global or namespace scope functions can be exposed to the python module namespace via a global def function.

  • Global variables

    Surprisingly, Boost.Python does not provide any way of exposing global variables to the module namespace. In the Classdesc Python descriptor, a method python_t::addObject is provided that explicitly adds a reference to the global object to the __dict__ of the namespace C++ objects are being added to. More details to come in ‘Global objects’.

  • Python object wrappers

    Boost.Python provides the boost::python::object class, which wraps the Py_Object type from the C API. There is an boost::python::extract<T>() template which attempts to downcast the object to a C++ object of type T, rather analogous to C++’s dynamic_cast operator. An exception is thrown (which is automatically converted to a Python exception if propagated into python) if the object doesn’t refer to the named type T. Various C++ wrapper types are provided to represent native Python types such as tuple, list and dict, each of which support the usual Python operations such as a [] operation or len function, using C++ operator overloading where necessary.

Classdesc

Classdesc is a C++ processor that parses user defined types (classes, structs, unions and enums), and emits definitions of descriptors, which are overloaded functional templates that recursively call the descriptors on the members of the structured type. For enums, a table of symbolic name strings to enum label values is constructed. A hand written descriptor needs to be provided for dependent library types for which you don’t want to run Classdesc on. Classdesc provides implementations of these for most of the base language and standard library types.

In earlier versions of Classdesc, descriptors were quite literally overloaded global functions. However because functions do not support partial specialisation, and the difficulty in generating templated friend statements for functions, in more recent versions of Classdesc, descriptors are templated functor objects. A global template function is provided to instantiate and call into the functor objects. In the explanation that follows, the function form of the descriptor will be used, as conceptually that is easier to understand.

Classdesc has been in active development since 2000 [Madina01], [Standish16].

Extending the Classdesc processor

Modification of Classdesc to emit descriptors for unbound member pointers

Traditionally, Classdesc’s descriptors work by being passed an instance of an object, and the automatically generated descriptors recursively call that descriptor on references to the members of that instance object. Boost.Python, however is oriented around defining class objects, registering C++ member pointers for the attributes and methods of the class. This allows Python to potentially control the lifetime of C++ objects, for example to create temporary copies for value returning methods.

So fairly early on, it became clear that Classdesc needed to be extended to define descriptors for classes, rather than instance objects. So the traditional Classdesc descriptor

  template <class T> void python(python_t&,
    const string&,T& obj);

needs to be augmented with an additional form:

  template <class T> void python(python_t&,
    const string&);

The former descriptor is used like:

  Foo f;
  python_t p;
  python(p,"f",f);

The latter form requires the explicit specification of the class:

  python_t p;
  python<Foo>(p,"");

In the traditional mode, the Classdesc processor emits definitions of the form:

  void python(python_t& p, const string& d, Foo& a)
  {
    python(p,d+".bar",a.bar);
    python(p,d+".method1",a,&Foo::method1);
  }

we need definitions that do not pass an instance object:

  template <class C=Foo>
  void python<C,Foo>(python_t& p, const string& d)
  {
    python_type<C,Foo>(p,d+".bar",&Foo::bar);
    python_type<C,Foo>(p,d+".method1",
      &Foo::method1);
    }

where python_type is a type descriptor, as opposed to the traditional instance object descriptor. The reason why two type arguments are required in the template arguments is to handle inheritance. If Bar is derived from Foo, then we would see

  template <> void python<Bar,Bar>(python_t& p,
    const string& d)
  {
    python<Bar,Foo>(p,d); // process base class
    ...
  }

As part of the process of adding support for type descriptors, it was decided for consistency to change the form of instance descriptors for object attributes to pass both object and member pointer, just like how method pointers are handled. This mode is enabled by the Classdesc processor switch -use_mbr_ptrs, and this will become the default way things are done in the next major version (4.x) of Classdesc. Support for the old way of doing things is enabled with a macro CLASSDESC_USE_OLDSTYLE_MEMBER_OBJECTS defined in the use_mbr_pointers.h header file. So

  CLASSDESC_USE_OLDSTYLE_MEMBER_OBJECTS(pack)

creates a descriptor overload that binds the member pointer to the object, and calls the traditional overload:

  template <class C, class M>
  void pack(pack_t& p, const string& d, C& o, M y)
  {pack(p,d,o.*y);}

Handling overloaded methods

Given:

  void (Foo::*bar0)()=&Foo::bar;
  void (Foo::*bar1)(int x)=&Foo::bar;
  python(py,"bar",bar0);
  python(py,"bar",bar1);

the first descriptor is called on argumentless method and the second on the single integer argument one.

Method overloading support was added to Classdesc by modifying the processor to emit these function signature qualified method pointers, instead of the inline ‘address of’ traditionally used. This required parsing the method declaration lines to extract the return type, the method name, the method arguments and the method type. For example:

  virtual const Bar& foo(const FooBar& x={}) const;

In this example, the return type is const Bar&, the method name foo, the argument list const FooBar& x and the type const. Some support for extracting this information had been added to Classdesc for the Objective C work [Leow03]; however, that was deficient in a number of ways. In particular keywords like virtual, inline, static and override need to be filtered out. So the member name is immediately recognisable as the token prior to the first parenthesis, however we must take all tokens preceding the member name (minus those keywords) as the return type. Similarly for the argument list, we must strip out all initialisers present. We can leave the argument name in place (if present), which is fortunate, as we cannot assume the last token of an argument is a name, and not part of the type. Finally, the type is important, as the emitted declaration must vary accordingly:

type declaration
none const Bar& (Foo::*m)(const FooBar& x)=&Foo::foo;
const const Bar& (Foo::*m)(const FooBar& x)const=&Foo::foo;
static const Bar& (*m)(const FooBar& x)=&Foo::foo;
constructor void (*m)(const FooBar& x)=0;

The constructor case will be discussed in ‘Constructors’.

Overloading support is enabled by the -overload command line switch. It is not enabled by default (nor will it be in Classdesc 4.x) because Classdesc is not aware of the full context of the class it is processing. Type Bar may be declared in the current namespace, but in a different header file to what Classdesc is processing, it may come into scope by inheritance. In which case, the type name will not be in scope at the site of the descriptor declaration. Classdesc tries to import symbols it knows about – so publicly defined types of the current class are explicitly imported, as are types defined in the enclosing namespace the class is defined in. However, there will always be situations it cannot work out where a type is defined, and so a compilation error will ensue. The answer is that you need to explicitly qualify these types such that they can be found in the classdesc_access namespace – for example you may need to change the above declaration to

  virtual const FooBase::Bar&
    foo(const FooBar& x={}) const;

in the case where Bar is defined in the base class FooBase.

Of course, if the descriptor ignores methods (eg any of the serialisation descriptors), then it is not necessary to enable overloading, eliminating this problem.

At the time of writing no effort is made to parse default arguments. If you wish to emulate default arguments in the Python interface, then you will need to provide explicit method overloads. This may change in the future.

Constructors

Traditionally, all constructors were ignored by Classdesc, as it is impossible to obtain a method pointer to a constructor. But it is such a powerful feature to be able to construct a C++ object by whatever constructors it provides (and to construct objects that have no default constructor), that this work added the ability to expose constructors. We use the code for parsing method signatures, and declare a temporary function pointer with the same arguments as the constructor, initialised to NULL.

When passed to the python descriptor, we need to extract the types of each argument and construct a boost::python::init<> specialisation with those argument types. Instantiating an object of this type and passing it to def() is all that is required to expose the constructor to Python.

Metaprogramming is used for this purpose, leveraging the classdesc::functional metaprogramming library and modern C++ variadic templates. The code implementing this is in Listing 1.

template <class M,
  int N=functional::Arity<M>::value> struct Init;
template <class... A> struct InitArgs;
template <class A, class... B>
struct InitArgs<InitArgs<B...>, A>:
  public InitArgs<B...,A> {};

template <class M, int N, class... A>
struct InitArgs<Init<M,N>, A...>:
  public InitArgs<Init<M,N-1>,
     typename functional::Arg<M,N>::T,A...>
{};

template <class M, class... A>
struct InitArgs<Init<M,0>, A...>
{typedef boost::python::init<A...> T;};

template <class M, int N>
struct Init: public InitArgs<Init<M, N>>
  {};
			
Listing 1

The idea is that M is the type of the function pointer passed to the descriptor, and the second argument being the number of arguments to process, initialised by default template argument set to the function’s arity. This value is decremented as the arguments are unpacked into the variadic type argument pack A..., and when finally reaching 0, defines the output type to boost::python::init<A...>.

This technique does require a helper template class to carry the argument pack – in this instance called InitArgs. An initial attempt at repurposing boost::python::init failed because that class carried too much baggage.

The python descriptor

Reference returning methods

As mentioned in ‘Boost Python’, Boost.Python wrapped methods return a copy of the returned object, even if the method returns a reference to an object intended for mutating the object state. Boost.Python provides an alternate version of def that handles this case using the following syntax:

  class_<Foo>.def("bar",&Foo::bar,
    return_internal_reference<>());

In order to emit this alternative syntax, we use the std type_traits library, the enable_if metaprogramming trick (Classdesc provides its own implementation of this, modelled on the version supplied as part of the Boost metaprogramming library) and the Classdesc functional library, which provides metaprogramming templates returning a function object’s arity, its return value and the types of all its arguments. So we see code like that in Listing 2.

// value returns
template <class C, class M>
typename enable_if<
  And<
    Not<is_reference<typename
      functional::Return<M>::T>>,
    Not<is_pointer<typename
      functional::Return<M>::T>> >,
      void>::T
addMemberFunction(const string& d, M m)
{
  auto& c=getClass<C>();
  if (!c.completed)
    c.def(tail(d).c_str(),m);
  DefineArgClasses<M, functional::Arity<M>
    ::value>::define(*this);
}
// reference returns
template <class C, class M>
typename enable_if<is_reference
  <typename functional::Return<M>::T>,void>::T
addMemberFunction(const string& d, M m)
{
  auto& c=getClass<C>();
  if (!c.completed)
    c.def(tail(d).c_str(),m,
    boost::python::return_internal_reference<>());
  DefineArgClasses<M,functional::Arity<M>::
    value>::define(*this);
}
// ignore pointer returns
template <class C, class M>
typename enable_if<is_pointer
  <typename functional::Return<M>::T>,void>::T
addMemberFunction(const string&, M) {}
			
Listing 2

Here we use the Classdesc-provided logical metaprogramming operations for combining different type traits, as you can see in the default def case, where we want to exclude both reference and pointer returning methods from being exposed via a copied return object.

The third case corresponds to pointer returning methods. Because ownership of pointees may or may not be passed with the pointer being returned, calling these functions from python potentially creates a memory leak, or worse. So if you want to expose a pointer returning function to python, do it the old-fashioned way, ie explicitly, not automatically. Better is to recode the method to return a reference, for which it is clear by language semantics that ownership remains with the method’s C++ object, if that is your intention.

Another point of interest is that the python descriptor is called recursively on the return type, and on the types of each argument which ensures that a python definition for those types exists. To do this, we again leverage the Classdesc functional library, and use template recursion to call the python descriptor on each argument. Care must be taken to avoid an infinite loop caused when the method takes an argument of the same type as the class being processed. For this purpose, we use a single shot singleton pattern to return false the first time it is called and true thereafter for a given type:

  template <class T>
  inline bool classDefStarted()
  {
    static bool value=false;
    if (value) return true;
    value=true;
    return false;
  }

This is adequate for when each class needs to exposed just once per execution (such as on loading a dynamic library).

The code for recursively exposing the types of each argument is in Listing 3

// recursively define classes of arguments
template <class F, int N>
struct DefineArgClasses {
  static void define(python_t& p) {
    typedef typename remove_const<
      typename remove_reference<
        typename functional::Arg<F,N>::T>
      ::type>
    ::type T;
    if (!pythonDetail::classDefStarted<T>())
      p.defineClass<T>();
    DefineArgClasses<F,N-1>::define(p);
  }
};
template <class F>
struct DefineArgClasses<F,0> {
  static void define(python_t& p) {
    typedef typename remove_const<
      typename remove_reference<
        typename functional::Return<F>::T>
      ::type>
    ::type T;
    if (!pythonDetail::classDefStarted<T>())
      // define return type
      p.defineClass<T>();
  }
};
			
Listing 3

and is called from within a python_t method by

  DefineArgClasses<F,functional::Arity<F>::value>
    ::define(*this);

Const and mutable attributes

We can similarly deal with the different syntactic requirements of const or noncopyable versus non const attributes, using standard metaprogramming techniques (see Listing 4).

template <class X>
typename enable_if<
  And<
  std::is_copy_assignable <typename
     pythonDetail::MemberType<X>::T>,
    Not<is_const<typename
     pythonDetail::MemberType<X>::T>>
    >,void>::T
addProperty(const string& d, X x) 
  {this->def_readwrite(d.c_str(),x);}

template <class X>
typename enable_if<
  Or<
    Not<std::is_copy_assignable<typename
       pythonDetail::MemberType<X>::T>>,
      is_const<typename
       pythonDetail::MemberType<X>::T>
    >,void>::T
addProperty(const string& d, X x) 
{this->def_readonly(d.c_str(),x);}
			
Listing 4

In this case, not only do we check whether an attribute is declared const, but we also need to check whether the attribute is even copy assignable.

Default and Copy constructibility

By default, Boost.Python assumes that an exposed C++ object is default constructible and copy constructible. As already mentioned in ‘Boost Python’, non-default constructible classes can be handled by passing an object of type boost::python::no_init to its constructor. Noncopyable objects can be exposed if the class_ template takes an extra template parameter of boost::noncopyable. To make the code more symmetric, and shareable in these cases, in Classdesc the class_ template is subclassed as shown in Listing 5.

template <class T, bool copiable> struct PyClass;
template <class T> struct PyClass<T,true>:
  public boost::python::class_<T>
{
  PyClass(const char* n):
    boost::python::class_<T>(n,
    boost::python::no_init()){}
};
template <class T> struct PyClass<T,false>:
  public
    boost::python::class_<T,boost::noncopyable>
{
  PyClass(const char* n):
    boost::python::class_<T,boost::noncopyable>(n,
    boost::python::no_init()){}
};
template <class T, bool copiable> struct Class:
  public ClassBase, 
  public ClassBase::PyClass<T,copiable>
  {
    addDefaultConstructor(*this);
    def("__eq__",
      pythonDetail::defaultEquality<T>);
  }
			
Listing 5

The ClassBase base class is a non-templated virtual base class, allowing the use of this type in containers, and the use of a boolean template parameter allows us to instantiate the Class object via a factory function:

  template <class T>
  Class<T,is_copy_constructible<T>::value>&
    getClass();

Use of the default constructor is deliberately suppressed by the no_init argument to the class_ constructor, but then added back in by the call to addDefaultConstructor if T is default constructible.

At the same time, an equality operator is defined that at minimum returns true if the same C++ object is referenced by two different python objects, but also calls into the C++ equality operation if that is defined.

Global objects

Surprisingly, Boost.Python doesn’t provide any means of exposing a global object. Unlike the global def() function which exposes functions to python, there is no equivalent global def_readwrite() or def_readonly(), nor is the module available as a class object that we could run those as methods.

Instead, we can use more primitive objects – the module is available as Boost.Python object via the default constructor of scope. We can extract the __dict__ attribute of this object, and insert the global object into the module dictionary via a pointer proxy:

  extract<dict>(scope().attr("__dict__"))
  ()[tail(d).c_str()]=ptr(&o);

Thus exposing a global object to Python is a matter of calling the python_t::addObject(string name, T& obj) method.

Containers

Boost.Python does not explictly support standard containers, such as std::vector or std::list. In Classdesc, the philosophy is to support standard containers, or better still the concepts behind standard containers. In Classdesc, two concepts are defined: sequence and associative_container. Sequences include std::vector, std::list and std::deque. Associative containers include std::set, std::map, std::multimap, and the unordered versions of these, such as std::unordered_map.

Users can exploit these concepts for their own containers by defining the appropriate type trait: is_sequence or is_associative_array (see Listing 6).

namespace classdesc
{
  template<>
  struct is_sequence<MySequence>
  {
    static const bool value=true;
  };
}
			
Listing 6

In the case of the python descriptor, we want containers to support the Python sequence protocol. It suffices to define the methods __len__, __getitem__ and __setitem__. This is sufficient to support Python operations such as len(), [] and to iterate over a container like:

  for i in someVector:
    print(i)

Additionally, it is useful to be able to assign lists of objects to sequence containers. For this, we create an assign method, which takes boost::python::object as an argument, and attempt to assign each component of the boost object (if it supports the sequence protocol).

Finally, it is desirable to construct a new C++ container from a Python list or tuple. Doing this is not well documented in Boost.Python, but involves def’ing the __init__ method with a very special function:

  template <class T>
  boost::shared_ptr<T> 
    constructFromList(const boost::python::list& y)
  {
    boost::shared_ptr<T> x(new T);
    assignList(*x,y);
    return x;
  }

  getClass<std::vector<T> >().
    def("__init__",boost::python::make_constructor
      (constructFromList<std::vector<T>>);

The crucial key is that the actual constructor implementation must return a boost::shared_ptr<T>.

Finally, standard container are archetypical template types, and after Classdesc processing, the class names are not syntactically valid Python types. For example, a std::vector<int> cannot be directly instantiated, however it is possible to reference that type from within Python. Assuming the C++ classes are exposed within namespace example, then you can rename the constructor functions within Python like:

  IntVector=example.std.__dict__['vector<int>']
  x=IntVector([1,2,3,4])

One final nice to have feature not currently implemented is to directly pass a list or tuple to a C++ sequence parameter of a method. For now, you have to explicitly instantiate a C++ object of the appropriate type to pass to the argument, as above, or alternatively code an overloaded method in C++ that takes a boost::python::object in place of the sequence parameter, and then use the python sequence protocol (len(), operator[]) to construct an appropriate C++ container. The difficulty in arranging this to happen is that it is an area poorly documented in Boost.Python, so is still a subject of future research.

Smart Pointers

The standard library smart pointers implement a concept smart_ptr, which has the following methods and attributes:

  • target is the object being referenced by the smart pointer. You can access or assign that smart pointer’s target via this attribute. So
             x.target.foo()

    is equivalent to the C++ code

             x->foo();

    and

             x.target=y

    is equivalent to the C++ code

             *x=y;

    If the object is null for either of these operations, a null dereference exception is thrown.

  • reset() sets the smart pointer to null, deleting the target object if the reference count goes to zero.
  • new(args) creates a new target object by its default constructor, or with args if an init method exists for that object.
  • = assigning a smart pointer will cause the reference to be shared to the new variable (in the shared_ptr case) or transferred (in the unique_ptr case).
  • refCnt() returns the reference count pointing to the current target. For unique_ptr this will be 1. This can be of use for debugging why a destructor is not being called when expected.

Conversion of an existing codebase

In order to test these ideas out and to harden the implementation, it is necessary to use them in a real world application. The SciDAVis plotting application [Benkert14] was chosen for this purpose, as the author is the project manager, and it already sports a python interface via the SIP reflection system [Riverbank] by Riverbank Computing for exposing C++ objects to Python. SIP was exploited to expose Qt [Blanchette06] and Qwt [Rathmann] classes, in the form of the library PyQt library.

This work also hopefully addresses a problem with using SIP in that the MXE (http://mxe.cc) cross-compiler build environment does not readily build PyQt (the build process requires a working python interpreter for the target system, which is not so useful for cross compilers). This has led to the lack of python scriptability on the Windows build of SciDAVis.

A second problem, hopefully addressed with this work, is that the API change from Qwt5 to Qwt6 does not interact well with SIP. It could be argued that the API change is a backward step – the new API is harder to use and more error prone, but suffice it to say it becomes important to wrap the new Qwt6 classes with ones that have a more C++ style, supporting RAII semantics for example. This wrapping will insulate the Python layer from the Qwt layer.

Neither of these last two advantages have been realised yet – that is the scope of future work. However, the full Python interface, as documented in the SciDAVis manual has been implemented, in a feedback process that led to many additional features in the Classdesc python descriptor, such as supporting method overloading and constructor support.

qmake project file changes

SciDAVis uses the qmake [Blanchette06] build system. Unlike GNU make, which can exploit the C++ compiler to automatically generate dependencies of object files on the included header files, qmake requires all header files to be explicitly listed. Whilst qmake does understand the dependency relations between object files and headers, it doesn’t appear to have any way of specifying a dependency between a Classdesc descriptor implementation file (.cd) and its header (.h). Instead, a separate list of header files to be processed by Classdesc is maintained, and if any of those header files change, then all classdesc’d headers are processed again by Classdesc. This does cause more compilation work than is necessary, but in practice was not a major problem for SciDAVis, which has fairly modest build times. It should be noted that a direct make solution, such as used by the Minsky project does not have this problem.

Qt meta object compiler

SciDAVis is a Qt project, which has certain implications. The first is that Qt has a form of reflection called moc, short for meta object compiler. Qt header files are written in a superset of C++, the most significant change being keywords supporting Qt’s signals and slots mechanism. The keywords signals and slots appear in class definitions in the same place that the class access specifiers public, protected and private are used. Signals are always protected, but slots may be declared public, private or protected. Additional code was added to the Classdesc processor to parse these declarations, and set the is_private flag appropriately.

The other aspect of Qt code is specific macros used to indicate things to the moc preprocessor. These are Q_OBJECT, Q_PROPERTY() and Q_ENUMS(), which are filtered out by the Classdesc processor.

This additional processing is enabled with the -qt flag on the Classdesc processor.

Organising the use of the python descriptor in SciDAVis

Most of the python support code is handled in the one file PythonScripting.cpp. So exposing class definitions involved adding the SciDAVis header file and the Classdesc descriptor definition file for each exposed class to the beginning of that file. We started with exposing the ApplicationWindow class, which is the main omnibus class implementing the SciDAVis application. The code to expose this class becomes:

  BOOST_PYTHON_MODULE(scidavis)
  {
    classdesc::python_t p;
    p.defineClass<ApplicationWindow>();
  }

As mentioned in ‘Reference returning methods’, this will automatically expose classes referenced by each exposed method, provided the appropriate header files have been included. The compiler will let you know if the header file is not present.

As the full API support was developed, additional classes needed to be added, mainly for things like the various fitting algorithm, and filter algorithms such as integration and interpolation. These options can be instantiated from Python, and then passed to methods taking a fit or filter base class reference. In all, 21 classes needed to be added to the BOOST_PYTHON_MODULE block.

I decided not to process the Qt library headers, as these tended to use a lot of conditional macros that Classdesc doesn’t have the context to deal with. The alternative strategy of preprocessing the Qt headers to remove macros was rejected, as this typically leads to an uncontrollable explosion of classes that Classdesc must process. Instead, for each Qt class exposed on the SciDAVis python interface, a wrapper class was created, with delegated methods. Whilst a bit of work, by starting from a copy of the class taken from the relevant Qt header file, it is a fairly mechanical process creating the delegated methods inlined in the class.

Static objects in the Qt namespace, such as Qt’s global colours, could be reimplemented in local code. I grouped these into a single class (called QtNamespace, as the identifier Qt clashes with the global Qt namespace). A single line was added to the BOOST_PYTHON_MODULE block creating an alias of this object to Qt in python's global namespace:

  modDict("__main__")["Qt"]=modDict("scidavis")
  ["QtNamespace"];

This pretty much implements the needed functionality from the PyQt library, eliminating the latter from SciDAVis’s software dependency list.

The final pieces were supporting the typeName functionality for Qt types. For any type derived from QObject, this was easily implemented as a call to the moc generated staticMetaObject::className() method, however there were numerous Qt classes not derived from QObject, such as QString. These were implemented individually for each one, although the common cases were easily handled with a macro to reduce the amount of boilerplate code.

Code changes to SciDAVis

The biggest code changes involved methods that return pointers to objects. For the reasons outlined in ‘Reference returning methods’, pointer returning methods are never exposed by Classdesc, so instead they must be converted to methods return a reference. However, these methods typically return null when an error condition is encountered. So these methods were refactored to throw an exception (a handy NoSuchObject exception type was created for this purpose). The Boost.Python library converts all C++ exceptions propagated through the C++/Python interface into Python exceptions, so this was clearly the right thing to do. One could take the lazy way out, and simply provide a wrapper method that converts a pointer returning method into one returning a reference, or throwing on null pointer return, but I took the opportunity to refactor caller code to use the reference interface too, in line with conventional C++ practices.

The second set of changes revolved around making the Python API consistent with C++ API, as Classdesc will faithfully expose the C++ interface to the equivalent Python one. In the original SciDAVis code, the API is specified in 2 places. It is documented in the manual, and specified in the scidavis.sip specification file. As might be expected, these two definitions were sometimes contradictory, and also were not consistent with C++. When resolving these inconsistencies, I chose to follow what was documented in the manual, even though it potentially introduces breaking changes for scripts that rely on how the API was actually defined. The most significant change were in methods that took arguments that satisfy Python’s sequence semantics, such as lists or tuples. So such a method call should look like:

  foo.bar((1,2,3))

or

  foo.bar([1,2,3])

but instead the SIP implementation did it in a variadic way:

  foo.bar(1,2,3)

Whilst it is possible to supply a variadic definition from within Boost.Python, it needs to be coded explicitly, as Classdesc ignores variadic methods.

Ideally, in C++, one should be able to initialise a C++ sequence with one of these python sequence objects, but currently that is not possible. So for now, supporting this call from Python involves adding an addition overloaded method taking a pyobject reference. The pyobject type is defined boost::python::object, which implements operator[] and len(), which suffices for constructing a C++ sequence object. In the future, I hope to be able to automatically generate this code. In the case where python support is disabled, pyobject is declared as a class, but otherwise not defined. In the C++ implementation file, the body of the method is simply #ifdef’d out.

Most of the code changes were then to make the C++ API consistent with the published Python API.

Originally, in order to get a runnable executable as quickly as possible, all unrecognised types were given a null python descriptor. However, that proved to be a mistake – it was better just to define dependent library classes (Qt, Qwt) as having null descriptors, and ensure Classdesc was run on all necessary SciDAVis defined classes.

Results

The core SciDAVis code (libscidavis directory) consisted of 100,466 lines of code, and after the conversion to Classdesc weighed in at 100,838 lines of code. The saving from eliminating the 2K loc scidavis.sip file was mostly eaten up by having to implement shim classes to expose Qt and Qwt classes. There is room for improvement by eliminating dead code that has been made redundant in the classdesc-boost.python way of doing things.

The result on compile times though is rather disappointing. On a quad core Intel i5-8265U CPU, the original SciDAVis code takes 1'25'' to compile and link the application. The refactored classdesc-enabled code takes 6'3'' to do the same thing, much of which is spent compiling the one module PythonScripting.cpp. This could be improved by splitting the classdesc descriptor calls for the different classes into different compilation units. In further work, the ApplicationWindow class python support was compiled into a separate object file from other classes, and the build time was reduced to 5'7''. Further build time optimisations will be needed too.

The resulting binary is larger too, at 15.6 MiB versus the original 6.0 MiB, probably because classdesc exposes a fatter interface than the manually crafted SIP interface. Indeed, the approach is to expose a maximally fat interface – all SciDAVis public classes are exposed to Python, as well as select Qt and Qwt classes historically exposed to Python, via SciDAVis implemented wrapper classes. The compiler and python regression test together defined what these needed to be. Also, the equivalent PyQt functionality is inlined into the executable, rather than in a dynamically loaded library % get the size of PyQt dynamically loaded % lib...

Executable times, running the tests scripts appears to be much of a muchness between classdesc and SIP, the runtime differences between the two versions within experimental noise.

References

[Abrahams03] David Abrahams and Ralf W Grosse-Kunstleve (2003) ‘Building hybrid systems with Boost.Python’ in C/C++ Users Journal, 21(7).

[Benkert14] T Benkert, K Franke, D Pozitron, and R Standish (2014) Scidavis 1. D005 (Free Software Foundation, Inc: 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA), 2014.

[Blanchette06] Jasmin Blanchette and Mark Summerfield (2006) C++ GUI programming with Qt 4 Prentice Hall Professional.

[Leow03] Richard Leow and Russell K. Standish (2003) ‘Running C++ models under the Swarm environment’ in Proceedings SwarmFest 2003. arXiv:cs.MA/0401025.

[Liang99] Sheng Liang (1999) The Java Native Interface: Programmer’s Guide and Specification Addison-Wesley Professional.

[Madina01] Duraid Madina and Russell K. Standish (2001) ‘A system for reflection in C++’ in Proceedings of AUUG2001: Always on and Everywhere, page 207. Australian Unix Users Group.

[Rathmann] Josef Wilgen Uwe Rathmann ‘Qwt - Qt widgets for technical applications’ on https://qwt.sourceforge.io/ (retrieved 13 June 2019).

[Riverbank] Riverbank Computing ‘What is SIP?’ https://www.riverbankcomputing.com/software/sip/intro (retrieved 13 June 2019).

[Rossum02] Guido Van Rossum and Fred L Drake Jr. (2002) Python/C API reference manual Python Software Foundation.

[Standish03] Russell K. Standish and Richard Leow (2003) ‘EcoLab: Agent based modeling for C++ programmers’ in Proceedings SwarmFest 2003 arXiv:cs.MA/0401026.

[Standish16] Russell K. Standish (2016) ‘Classdesc: A reflection system for C++11’ in Overload 131 pages 18–23, published February 2016 https://accu.org/index.php/journals/c358/

Russell Standish gained a PhD in Theoretical Physics, and has had a long career in computational science and high performance computing. Currently, he operates a consultancy specialising in computational science and HPC, with a range of clients from academia and the private sector.

Programming Topics + Overload Journal #152 - August 2019