ACCU Home page ACCU Conference Page ACCU 2017 Conference Registration Page
Search Contact us ACCU at Flickr ACCU at GitHib ACCU at Google+ ACCU at Facebook ACCU at Linked-in ACCU at Twitter Skip Navigation

pinSelf Registering Classes - Taking polymorphism to the limit

Overload Journal #27 - Aug 1998 + Design of applications and programs   Author: Alan Bellingham

In this article, I wish to propose a method of allowing easy addition and removal of classes from an application. This will use registration of class-factory functions to emulate virtual constructors.

Introduction

One of the main aims of an Object-Oriented programming language is to attempt to reduce coupling between the parts of a program by encapsulating the functionality and state of data structures within class instances, and for those classes to expose as little as possible to the outside world. Taken to an extreme, this becomes component-based software development, in which an application may comprise components written using a variety of languages and possibly running on disparate machines and architectures, but for now, we'll consider a single monolithic application.

Coupling

Firstly, what is the coupling problem?

Simply stated, it's the tendency for a subsystem A to know how subsystem B works, and vice versa. Any change to A requires a change to B, any change to B requires a change to A. Extend this to subsystems C, D and E, and a combinatorial explosion of dependencies occurs. Since larger systems tend to have more subsystems, one of the primary tasks of the software engineer on such projects is to avoid such reciprocal knowledge.

Ideally, then, a subsystem should have no knowledge of any subsystem that knows about it, and the grand design then tends toward the composition of more complex subsystems from simpler ones, somewhat like this, where an arrow means 'knows about':

In this case, whoever is implementing B doesn't need to know about A, and the implementor of C needs to know only about C.

In general, an attempt to design in this way will lead to reduced maintenance problems, and produce cleaner code. It shouldn't be hard to see that conceptually each subsystem roughly corresponds either to a single class, or to a class with helper classes that the client need not know about.

Back to reality

In real life, it's rarely this easy. Subsystems may need to notify their parents of changes, proxy classes may be returned that multiple subsystems need to understand, and the result becomes somewhat more of a cobweb. However, with suitable use of callback functions, notifications mean that a subsystem doesn't actually know anything about its owner, and common classes should be considered almost as built-in types and changed about as frequently ☺.

However, there is another potential problem, and that is hinted at by the "Law of 5 plus or minus 2". It is well known that human beings have problems really understanding what's going on when a large number of entities is under consideration, unless all the entities are the same, as in an array or list. In this case, consider the following:

In this case, subsystem J has to know how all the subsystems from A to H all work. However, much of the time, many of these subsystems, although different in detail, do similar work, and this is where a language such as C++ can simplify things by presenting all of these as being effectively the same class, by allowing the designer to use polymorphism.

By providing an abstract base class which exposes a common interface for all of these classes, instead of 9 subsystems A to I, we should be able to treat it as 9 copies of a single subsystem that just happen to be different internally.

The problem of creation

Indeed, careful use of C++ virtual functions does allow us to use polymorphism to dramatically reduce the number of times that an owner actually has to know about which concrete class it is currently using. However, there is one major function that cannot be made virtual: the constructor. As a result, there is often a switch statement, that looks something like this:

void Figure1Func(int objectType, int param)
{
  GraphicItem * AC = NULL ;
  switch(objectType)
  {
  case 0:
    AC = new TextItem(param) ; break ;
  case 1:
    AC = new Box(param) ; break ;
  //...
  case 99:
    AC = new FilledEllipse(param); break;
  }

  if (AC)
  {
    AC->DoWhatever();
    delete AC ;
  }
}

Figure 1 - calling constructors from a switch statement

Also, it is frequently the case that there will be a requirement to serialise such items in or out of memory. Serialising out is easy - it just requires a suitable virtual function call, and the object will write itself out. Serialising into memory, though, is harder - because there is no existing object that can be called that is known to be of the right type. So, a switch statement will occur there as well:

void Figure2Func(istream& inputstream) 
{
  int objectType ;
  GraphicItem * AC = NULL ;

  inputstream >> objectType ;
  switch(objectType) 
  {
    case 0:
      AC = new TextItem(inputstream) ;
      break ;
    case 1:
      AC = new Box(inputstream) ;
      break ;
    // ...
    case 99:
      AC= new FilledEllipse(inputstream);
    break ;
  }
}

Figure 2 - serialising from a switch statement

If the application is only ever to have a fixed number of such classes, there wouldn't be too much of a problem. Unfortunately for software developers, there is rarely such a creature as a finished program. New classes get added in. Special versions get written that have classes deliberately left out. Menus exist listing the options, and these need to be changed. Sooner or later, someone is going to miss updating the switch statements correctly, and all hell will be let loose.

Banishing the constructor

The whole problem is that the owner has to know exactly what concrete classes are available. It would be so much simpler if a list could be built automatically. And who knows better than the classes themselves?

Consider a class:

class GraphicItem
{
protected:
  GraphicItem(int param) { ; }

public:
  virtual ~GraphicItem () = 0 ;
  virtual void DoWhatever () = 0 ;
} ;

Figure 3a: GraphicItem.h

We may then derive the concrete types from it, like this:

class FilledEllipse : public GraphicItem
{
private:
  FilledEllipse(int param) ;

public:
  virtual ~ FilledEllipse () ;
  virtual void DoWhatever () ;

  static GraphicItem * 
                 Construct (int param) ;
  enum { ID = 99 } ;
  //  Different for each class
} ;

Figure 3b: FilledEllipse.h

This class has a private constructor, and a public class factory function - i.e., a function that returns a constructed instance of the class. The class factory function actually uses the private constructor.

We could have a table (or better yet, a map), of these class factory functions against class IDs, and the client code could then scan the table for the right function to call in order to construct a new FilledEllipse given only an ID:

#include "GraphicItem.h"
// typedefs to reduce typing later
//
typedef GraphicItem * 
         (*ClassFactoryFn)( int params) ;
typedef std::map<int, ClassFactoryFn> FactoryMapType ;
typedef FactoryMapType::const_iterator FactoryMapIter ;

FactoryMapType FactoryMap ;

//  Somehow FactoryMap is initialised ...

void Figure5Func(int objectType, int param)
{
  FactoryMapIter it = 
            FactoryMap.find(objectType) ;
  if ( it != FactoryMap.end())
  {
    GraphicItem * AC = 
                    (*it).second(param) ;
    AC->DoWhatever();
    delete AC ;
  }
}

Figure 4: using a factory map

You will see that, if FactoryMap is constructed to contain object IDs and function pointers to the class factories, the client has no idea at all what the real objects constructed are. This is polymorphism taken to the limit. Note especially that it doesn't have to include the subsidiary include files for the individual concrete types - all it needs to know is listed in the abstract base class declaration.

Since there should only be a single instance of the Factory and it should exist for the whole program run, it's probably best implemented using the pattern:

FactoryMapType& FactoryMap()
{
  static FactoryMapType FMT ;
  return FMT ;
}

Figure 5: a singleton factory map

This means that anything attempting to access it cannot see it before it's constructed.

Building the class factory map

"Aha," I hear you say, "this has only moved the problem elsewhere. Something has to build the Factory map, and that something has to know about the functions." Well, not quite.

What if the classes themselves cooperate in building the map, or at least, helper classes do. All the client has to supply is a function for the classes to register themselves:

void RegisterFactory(int ID, ClassFactoryFn fn)
{
  FactoryMap()[ID] = fn ;
}

Figure 6a: registering with the factory

Now all that is required is to ensure that this function is called for each of the classes. That can be done by a helper class:

template<class T> class FactoryRegistrar
{
public:
  FactoryRegistrar()
  {
  RegisterFactory(T::ID, T::Construct);
  }
} ;

Figure 6b: FactoryRegistrar.h

#include "FactoryRegistrar.h"
#include "FilledEllipse.h"

static FactoryRegistrar<FilledEllipse> FRFE ;

//  Implementation of FilledEllipse

Figure 6c: FilledEllipse.cpp

The construction of the static helper class does the class registration. Assuming one module per concrete object, then all that needs to be done is to link the required modules to the main client code, and on program startup, the FactoryRegistrars get constructed, the class factory functions get registered and the client suddenly "knows" about the available classes.

The snake in the grass

But there is a problem with this approach. In fact, there are two, closely related.

According to the ISO C++ Standard, §3.6.2 (Initialization of non-local objects [basic.start.init]):

"It is implementation-defined whether the dynamic initialization (_dcl.init_, _class.static_, class.ctor_, _class.expl.init_) of an object of namespace scope with static storage duration is done before the first statement of main or deferred to any point in time after the first statement of main but before the first use of a function or object defined in the same translation unit."

This means that the implementation may decide not to construct our FactoryRegistrar at all, since until it has been constructed, there is no way that any function or object in that translation unit is used.

Secondly, it might be useful to build a library of these classes. However, modern linkers making use of such a library will only include those units which they can see are used. Again, because no function call is made into these units, the linker will totally ignore them. This becomes even more obvious when you consider a set of ten classes, of which you want five - only pure telepathy on the part of the linker would help it.

So, we need an answer.

The huge source unit option

The first method is crude, but it should work - compiler limits aside. Simply create a source file that will be linked in, and #include within it all the source files for the classes you want. It will also need a function called within it before the Factory map is used for the first time:

void InitGraphics ()
{
}

// Change these lines to change
// which classes are available
//
#include "FilledEllipse.cpp"
#include "Box.cpp"

Figure 7: AllGraphics.cpp

You'll need to ensure that the headers can be multiply included, and it would be an extremely good idea to put the contents of each of the sources within its own namespace. This solution means that the statics should be constructed, as long as some function in this unit gets called. However, putting the classes into a library is no longer possible, and a full compilation of this unit is required, which may be quite time consuming, whenever a configuration change occurs.

The one call option

An alternative method is somewhat cleaner. Again, we define a function that the client code should call. But this time, it calls a function in each of the class units to be used in this configuration:

extern void InitialiseFilledEllipse() ;
extern void InitialiseBox();

void InitGraphics ()
{
  // Change these lines to change
  // which classes are available
  //
  InitialiseFilledEllipse() ;
  InitialiseBox() ;
}
Figure 8a: AllGraphics.cpp
#include "FactoryRegistrar.h"
#include "FilledEllipse.h"

void InitialiseFilledEllipse()
{
  static FactoryRegistrar<FilledEllipse> FRFE;
}

// Implementation of FilledEllipse

Figure 8b: FilledEllipse.cpp

Now we can place the class units into a library, and because we know that the class factory registrar will be constructed, we know that the class factories will be registered. Also, when a configuration is changed, it's a much smaller unit that gets recompiled.

Cleaning up

By now, we have a two functions that are global, but that deal with the singleton FactoryMap, either directly or indirectly: RegisterFactory() and InitGraphics(). It makes sense to make them member functions of the FactoryMap itself, and for the functionality in InitGraphics() to be called by the constructor. So let's see what our final result looks like:

class GraphicItem
{
protected:
  GraphicItem(int param) { ; }

public:
  virtual ~GraphicItem () = 0 ;
  virtual void DoWhatever () = 0 ;
} ;
GraphicItem.h
#include "GraphicItem.h"
#include <map>

typedef GraphicItem * (*ClassFactoryFn)( int param) ;

class GraphicsFactoryMapImpl : public std::map<int, ClassFactoryFn>
{
public:
  GraphicsFactoryMapImpl() ;
  void Register(int ID, ClassFactoryFn fn) ;
} ;

typedef GraphicsFactoryMapImpl::const_iterator GraphicsFactoryIter ;

GraphicsFactoryMapImpl& GraphicsFactoryMap() ;

template<class T> class GraphicsFactoryRegistrar
{
public:
  GraphicsFactoryRegistrar()
  {
    GraphicsFactoryMap().
                Register(T::ID, T::Construct);
  }
} ;

GraphicsFactoryMap.h

#include "GraphicsFactoryMap.h"

GraphicsFactoryMapImpl & GraphicsFactoryMap()
{
  static GraphicsFactoryMapImpl FMT ;
  return FMT ;
}

#define INCLUDE_UNIT(a) extern void Initialise##a();Initialise##a() ;

GraphicsFactoryMapImpl::GraphicsFactoryMapImpl()
{
  // Change these lines to change
  // which classes are available
  //
  INCLUDE_UNIT(FilledEllipse)
  INCLUDE_UNIT(Box)
}

void GraphicsFactoryMapImpl::Register(int ID, ClassFactoryFn fn)
{
  (*this)[ID] = fn ;
}

GraphicsFactoryMap.cpp

//  No need for a separate header
// since nothing else includes it
//
#include "GraphicsFactoryMap.h"

namespace {
class FilledEllipse : public GraphicItem
{
private:
  FilledEllipse(std::string params) ;

public:
  virtual ~FilledEllipse () ;
  virtual void DoWhatever () ;

  static GraphicItem *
                  Construct(int param) ;
  enum { ID = 99 } ;
} ;

//  Actual implementation here ...

} /* namespace anonymous */

extern void InitialiseFilledEllipse () ;
void InitialiseFilledEllipse ()
{
  static GraphicsFactoryRegistrar<FilledEllipse> GFR ;
}

FilledEllipse.cpp

#include "GraphicsFactoryMap.h"

void SomeFunc(int objectType, int param)
{
  GraphicsFactoryIter it = GraphicsFactoryMap().find(objectType) ;
  if ( it != GraphicsFactoryMap().end())
  {
    GraphicItem * AC = (*it).second(param) ;
    AC->DoWhatever();
    delete AC ;
  }
}

Actual usage

Conclusion

In reality, there are likely to be more functions than just a simple class factory that will want to be registered - and it's quite feasible that the registration will insert string descriptions into menus as well. This example should be sufficient to demonstrate a methodology that can be extended to such cases safely and easily.

Overload Journal #27 - Aug 1998 + Design of applications and programs