pinIterators and Memberspaces

Overload Journal #88 - December 2008 + Programming Topics + Design of applications and programs   Author: Roel Vanhout
Exposing a compound object's collections can be messy. Roel Vanhout introduces a powerful idiom.

This article describes a C++ idiom to expose an object's collection member variables in a way that is easy to use and read for users of the object. It allows users to iterate over filtered and/or modified elements of a collection without the user having to be aware of the implementation or type of the collection. It describes the problem, several possible solutions and their drawbacks and proposes a technique that uses the 'memberspaces' idiom and the Boost.Iterator library to provide object designers with a mechanism to allow users of their objects flexible access to the data of those objects.

It contains introductions to the memberspace idiom and Boost.Iterator's filter_iterator<> and transform_iterator<>.

Introduction

A common scenario in object-oriented development is that an object has a collection (for example a std::vector<>) as a member that the object's author wants to expose in the public interface of the object. Consider an object like this one:

      class A {
        protected:
          std::vector<int> values;
      };

This object has a member with a collection in it (values) in which ints can be stored. As far as the internal implementation goes, this works fine; however when the contents of values need to be exposed, dilemmas arise. There are several solutions to most of those dilemmas, all with their advantages and drawbacks.

Potential solutions

The first solution is to expose the collection itself directly:

      std::vector<int>& getValues();

or (depending on whether read/write or read-only access is appropriate):

      const std::vector<int>& getValues();

The disadvantages to this are clear: when the type of the collection or of its contents changes internally, users will have to change their code as well. Secondly, there is no granularity in what users of the object can do to the collection. When exposing the non-const version above, users can change the contents of the vector but also its size. There is no way to limit them to do just one of those two, nor does it have a mechanism to do validation of the vector's contents after the user performed an operation on it.

A second solution is to expose functions to query and modify the collection:

      size_t getNValues() const {
         return values.size(); }
      int getValue(int index) const {
         return values[index]; }
      void setValue(int index, int new_value) {
         values[index] = new_value;
      }

This solves the encapsulation issue of the first solution, and provides a way for the object author to put in validation of the input. On the other hand, it's inconvenient to use for both the author and the user of the object because there are so many ways to name and implement this solution. Although it is possible to enforce naming standards within the same class library (but even within a team this gets harder as the team grows and the codebase ages), invariably at one point differences in naming will emerge (getNumValues(), GetValuesCount(), ...) and users of the object will have to start refering to the documentation or header file of an object every time they want to use it; not to mention the inevitable differences in conventions between different libraries. Worse than this issue, users cannot use stl-style generic algorithms on collections that are exposed this way.

A third solution is to expose the stl interface of the collection directly:

      typedef std::vector<int>::iterator iterator;
      typedef std::vector<int>::const_iterator
         const_iterator;
      iterator begin() { return values.begin(); }
      iterator end() { return values.end(); }

This solution is starting to look better but is still not without its flaws. First, it makes the object directly 'iterable', which may not be intuitive (depending on the relationship between the object and its collection). Furthermore it restricts the number of exposed collections to one. Thirdly, it re-introduces the problem of the first solution of having no granularity in the access to the collection: users can for example change values and assign any value to the elements of the collection, without regards of whether or not those values make sense in the context of the object. As long as those values remain compatible with the collection's data type, they can do anything they want with them. Lastly, it relies on the collection that is being exposed to already have an stl-compatible interface with functions that provide iterators.

The goal

The ideal solution to the described problem has the following properties:

  • It exposes iterators so collections can be used by generic, stl-compliant algorithms.
  • It can be used to expose all sorts of collections, not only stl collections.
  • It allows multiple collections to be exposed by an object.
  • It's intuitive and idiomatic: it allows client code to express clearly what is happening.
  • It provides object authors with finely grained control over the way the collection is accessed or modified.

Ideally, client code could look like this:

      Classroom c;
      BOOST_FOREACH (Desk d, c.Desks) {
        // do something with Desk d here
      }
      BOOST_FOREACH (Student s,
         c.StudentsInClassroomRightNow) {
        // do something with Student s here
      }

This code uses the boost Foreach construct that wraps iterators to allow direct iteration over stl-compliant containers; the corresponding way to do this directly with iterators is:

      typedef Classroom::Desks::iterator TDeskIter;
      for (TDeskIter it = c.Desks.begin() ;
           it != c.Desks.end() ;
           it++) {
          Desk d = *it;
          // do something with Desk d here
      }

Fortunately there is a way to make this happen. By combining the freely available Iterator library from Boost and using an idiom called 'memberspaces', the code above can be made into compilable and working C++ code.

Memberspaces and two techniques for implementing them

Memberspaces (a name I first found mentioned in [LópezMuñoz02]) are essentially a technique to subdivide the 'space' in which the member variables of an object live into separately named 'subspaces', like namespaces do for freestanding functions and classes. They are implemented as member structs, hence the name 'memberspace'. When used with only variables (i.e., not member functions), they look like this1:

      class PirateShip {
        struct Deck {
          int nCannons;
        }
        struct UpperDeck : public Deck {
        } UpperDeck;
        struct LowerDeck : public Deck {
        } LowerDeck;
      }

This allows users of the pirateShip class to write code like this:

      PirateShip adventure_galley;
      ...
      std::cout << "There are "
                <<   adventure_galley.UpperDeck.nCannons
                << " cannons on the upper deck and "
                << adventure_galley.LowerDeck.nCannons
                << " cannons on the lower deck."
                << std::endl;

Note the naming convention. In this article I use a style where class names are in CamelCase and variables are in all lowercase with underscores. In the PirateShip class above though, UpperDeck and LowerDeck are variables; why are they not in lowercase with underscores? The reason is in their use. They are not used as 'regular' member variables, but as pseudo-namespaces; therefore they use the naming convention of namespaces, which is CamelCase.

Also note that the class name and the memberspace (variable) name are the same (UpperDeck and LowerDeck). This is legal and although it doesn't serve an explicit purpose, I find it an easy way to recognize memberspaces in the code. It also prevents you from having to make up another name for the class: it could be called UpperDeckT, UpperDeckClass, UpperDeck_, ... but by making it be the same as the memberspace variable itself, it is easy to be consistent across your project.

After a little getting used to, this is a very readable and natural way of working with the properties of an object.

A problem arises when you want to access the variables of the class in which a member space is contained. If you wanted to implement member functions in the Deck class and for that wanted access to variables of the PirateShip class, you'd need a way to get to those variables. There are two easy ways to implement this, one of which I've coined 'non-intrusive' because it doesn't require changing the containing class, the other one 'intrusive' because it does.

'Non-intrusive'

To write a memberspace that can handle access to the containing class on its own, without any help from the containing class, you can use some pointer arithmetic to calculate the start address of the containing class and cast that to the correct type. To calculate this start address, you take the address of the memberspace struct and deduct from it the offset of the memberspace struct within the containing class - which you can get with the offsetof() macro.

The drawback of this technique is that it relies on potentially unportable assumptions, like offsetof() being implemented in a way that it accounts for things like virtual functions. More specifically, the standard requires offsetof() to work only on POD types. This (roughly) means that for classes you are restricted to those that are 'struct-like': no virtual functions, no constructor or destructor and only public functions. For more information.

Recent offsetof() implementations of the Visual Studio compiler and gcc work also in (some) cases not covered by the standard, but if you require guaranteed portability this technique is not the right one to use.

This technique may give others reviewing your code the jitters because it uses reinterpret_cast<>, a construct that is often indicative of code of highly questionable quality. In my opinion the case presented here is one of the few cases in which it is ok to use reinterpret_cast<>, but you'll have to check yourself whether this technique falls within the 'acceptable' bounds of the coding standards of your environment.

An example is shown in Listing 1.

    class A {
      public:
        void doSomething() {}
        struct MyMemberSpace {
        public:
            void someFunction()
            {
                owner().doSomething();
            }
        private:
            A& owner() {
                return *reinterpret_cast<A*>(
                    reinterpret_cast<char*>(this) -
                    offsetof(A, MyMemberSpace));
            }
        } MyMemberSpace;
    }

    A a;
    a.MyMemberSpace.someFunction();
Listing 1

'Intrusive'

A more portable way is to keep a reference to the containing object in a member variable and let the containing object initialize it in its constructor. This is most easily explained with a code example (Listing 2).

    class A {
        A() : MyMemberSpace(*this) {}
        void doSomething() {}
        struct MyMemberSpace {
            void someFunction() {
                m_A.doSomething();
            }
        private:
            friend A;
            MyMemberSpace(A& a) : my_A(a) {}
            A& my_A;
        } MyMemberSpace;
    }
    A a;
    a.MyMemberSpace.someFunction();
Listing 2

The main disadvantage of this technique is that it requires a variable in the memberspace - adding to the size of the object. When the first technique is used, no such member is needed and no price in space overhead needs to be paid. There will be few circumstances in which this overhead is prohibitive, though.

Exposing a collection through a memberspace

We've already fulfilled one of the original design goals we started with: we have a way to expose multiple collections, through multiple memberspaces that all have their own name. The next thing we want to do is make the memberspaces iterable, that is, make them compliant with stl-like containers. Luckily, it is easy to make them work for the most basic cases. All that is required is that we provide begin() and end() functions and an iterator typedef that indicates the type of iterator that is returned by those functions. If the collection we are exposing is already an stl-compatible container, we can use the iterator type and begin() and end() iterator accessors to implement our own function. A simple example is shown in Listing 3.

    class A {
      public:
        A() : MyMemberSpace(*this) {}
        typedef std::vector<int> myCollection;
        myCollection collection;
        struct MyMemberSpace {
            friend A;
            typedef myCollection::iterator iterator;
            iterator begin() { return
               m_A.collection.begin(); }
            iterator end() { return
               m_A.collection.end(); }

        private:
            MyMemberSpace(A& a)
                : m_A(a)
            {}
            A& m_A;
        } MyMemberSpace;
    };
Listing 3

This can now be used like this:

      A a;
      BOOST_FOREACH (int i, a.MyMemberSpace) {
        // do something with int i here
      }

Exposing a filtered collection through a memberspace

The technique explained above works fine as long as the only thing you want to do is to directly allow all elements of the container to be visible from the outside. But there are scenarios in which you want to provide an easy way for clients to go through a subset of the elements in the container - either as an addition to or as a replacement of the access to all elements. Consider a class like the following:

      struct Employee {
        enum ESeniority { E_SENIOR, E_JUNIOR };
        std::string name;
        ESeniority seniority;
      };
      class Staff {

      protected:
        std::vector<Employee> employees;
      };

If we could find a way to expose an iterator over employees that could leave out (or 'filter') certain elements that don't fulfill a certain criterion, we could use that iterator to go over all those that do fulfill it. Such an iterator is called a 'filtering iterator'. The design of one is described in detail in [Higgins08].

Here however a filtering iterator is realized through the freely available Boost.Iterator [Abrahams03] library. This library provides, roughly speaking, templates to easily implement new iterators and to modify existing ones. By using the filter_iterator<> template, we can modify the behaviour of the iterators that we can get from std::vector<> so that for every element in it, a function is called: if that function returns true, the element is included when advancing the iterator; otherwise it is skipped. This template provides us with a tool to easily implement iterators for the memberspaces in the example above (see Listing 4).

    #include <boost/iterator/filter_iterator.hpp>
    struct isSeniorEmployee {
      bool operator()(Employee& e) {
        return e.seniority == Employee::SENIOR;
      }
    };
    struct SeniorEmployees {
      typedef
         boost::filter_iterator<isSeniorEmployee,
         std::vector<Employee>::iterator> iterator;
      typedef
         boost::filter_iterator<isSeniorEmployee,
         std::vector<Employee>::const_iterator>
         const_iterator;
      iterator begin() {
        return boost::make_filter_iterator<
           isSeniorEmployee>(
           owner().employees.begin(),
           owner().employees.end()
        );
      }
      iterator end() {
        return boost::make_filter_iterator<
           isSeniorEmployee>(
           owner().employees.end(),
           owner().employees.end()
        );
      }
    private:
      Staff& owner() {
        return *reinterpret_cast<Staff*>(
            reinterpret_cast<char*>(this)
            - offsetof(Staff, SeniorEmployees));
      }
    } SeniorEmployees;
Listing 4

Here is what is happening. First, we define an object that will serve as the 'decision-making point' for whether or not a certain Employee will be considered part of the SeniorEmployees. We can determine this by testing the seniority member variable of the object. The object needs to be callable (have the () operator defined) and should take the type of the elements of the collection we want to iterate over - in this example, an Employee.

For the implementation of the memberspace proper, we start with providing a typedef for the iterator type that will be exposed. We can use the type of the boost::filter_iterator template for this. It takes two template arguments: the object that will provide the 'decision-making' functionality and the iterator that is being 'wrapped' or modified. Since we're exposing an stl collection, we can re-use the iterator type from the std::vector<>. Although not strictly needed, we also provide a const_iterator - it is needed by the BOOST_FOREACH construct that I'm rather fond of. Many algorithms use the const_iterator so it's good practice to always define it.

Next, we come to the implementation of the begin() and end() functions which should return the actual iterators. Boost.Iterator provides a utility function to make iterators of the filter_iterator<> type. It is (unsurprisingly) called make_filter_iterator and takes three arguments, of which one takes the form of a template parameter. This template parameter is the type of the class that will decide whether or not to include a certain element in the collection to be included when the iterator is used. The two 'normal' arguments that are passed to the constructor are a range of elements, embodied by two iterators. The end() iterator is needed because the filter iterator needs to know when to stop going over the elements of the source collection.

The result is a construct where we can iterate over only the senior employees by writing:

      Staff s;
      BOOST_FOREACH(Employee e, s.SeniorEmployees) {
        std::cout << e.name << std::endl;
      }

Exposing a modified collection through a memberspace

The previous section covered modifying the appearance of a collection by filtering out certain elements. We can also use iterators to modify not the collection, but rather the individual elements of a collection. The crucial element to do so is another iterator adapter: boost's transform_iterator<>.

Let's add another potentially useful memberspace to the Staff class introduced above:

      struct EmployeeNames {
      } EmployeeNames;

To get access to only the names of the employees (and to be able to iterate over them), we have to get access to the 'name' member variable of every Employee object. The transform_iterator<> mentioned earlier will apply a user-defined operation on an object to transform it into a different type. In the EmployeeNames example, what we want to do is 'transform' an Employee struct to its name, in std::string form. The 'transformer' struct to do this looks very similar to the struct that determined whether or not to include an object in an exposed collection. The actual conversion is done in the () operator, which should return the type to be transformed to and takes as a reference the object to be transformed. The code is shown in Listing 5.

    #include <boost/iterator/transform_iterator.hpp>
    struct getEmployeeName {
      std::string operator()(Employee& e) const {
        return e.name;
      }
    };

    struct EmployeeNames {
      typedef
         boost::transform_iterator<getEmployeeName,
         std::vector<Employee>::iterator> iterator;
      typedef iterator const_iterator;
      iterator begin() {
        return boost::make_transform_iterator(
           owner().employees.begin(),
           getEmployeeName());
      }
      iterator end() {
        return boost::make_transform_iterator(
           owner().employees.end(),
           getEmployeeName());
      }
      private:
        Staff& owner() {
            return *reinterpret_cast<Staff*>(
            reinterpret_cast<char*>(this)
            - offsetof(Staff, EmployeeNames)); }
    } EmployeeNames;
Listing 5

The begin() and end() iterators work in the same way as those for the filter_iterator. You use the make_transform_iterator utility function provided by the library, which takes the beginning of the iterator range you want to transform as an input. You don't need to specify an end this time as the transform_iterator iterates over all the elements. The second argument is an instance of the class you wrote to transform your input object.

Suggestions for further experimentation

It is easy to see that the previous two constructs can be combined: for example providing a way to iterate over only the names of the junior employees. For those trying to implement this, such an iterator declaration can quickly get unwieldy - it is advisable to use lots of typedefs for the iterator types to keep on top of things.

The transform_iterator can be used not only to query properties of objects. A useful 'transformer' class is the following:

      template<typename TObj>
      struct getObjectNr<boost::shared_ptr<TObj> > {
        typedef TObj* result_type;
        TObj* operator()(
           boost::shared_ptr<TObj> sh_ptr) const {
          return sh_ptr.get();
        }
      };

This allows for making an iterator over a collection of boost::shared_ptr's where the raw pointers themselves are exposed, rather than the shared pointers. This comes in handy in a codebase where smart pointers were added later but many parts of the code still only work with raw pointers.

We've only explicitly covered a part of the initial qualities that we sought in a solution to the original problem. Specifically, we haven't yet discussed using Boost.Iterator to make new iterators that work on non-stl types and we haven't done any input validation or value restrictions. Writing custom iterators is a topic of its own; a task that is simplified by the use of other parts of the Boost.Iterator libraries yet complex enough to warrant an eventual separate article.

Conclusion

Memberspaces are a powerful idiom to provide interfaces to collections. They make for expressive, easily readable code. With the help of the Boost.Iterator library, we can extend their functionality to more than readability: they can provide users of objects with an easy way to access that object's properties in a myriad of different ways - at little to no cost in terms of memory usage or speed. It provides class designers with ways to give users access to an object's properties while maintaining control over the way this is done. n

Note

During review, a third technique to deal with this problem was mentioned that may be of interest. See [Bass05].

References

[Abrahams03] 'The Boost.Iterator Library', David Abrahams, Jeremy Siek, Thomas Witt, 2003: http://www.boost.org/doc/libs/1_36_0/libs/iterator/doc/index.html

[Bass05] Phil Bass, 'The Curate's Wobbly Desk', Overload 70 (December 2005)

[Higgins08] 'Custom Iterators in C++', Jez Higgins, CVu Volume 20 Issue 3 (June 2008), ACCU.

[LópezMuñoz02] 'An STL-like bidirectional map', Joaquín M López Muñoz, 2002: http://www.codeproject.com/KB/stl/bimap.aspx

1. History buffs will shake their heads at this example, as pirates (the sea-faring type) generally preferred the highly manoeuvrable schooners or sloops, which typically only had one level of artillery. In this article, it is assumed that the PirateShip class is part of a highly extensible pirate-modelling framework that allows for flexible configuration of pirate paraphernalia and puts responsibility for historical accuracy in the hands of the modeller.

Overload Journal #88 - December 2008 + Programming Topics + Design of applications and programs