ACCU Home page ACCU Conference Page
Search Contact us ACCU at Flickr ACCU at GitHib ACCU at Google+ ACCU at Facebook ACCU at Linked-in ACCU at Twitter Skip Navigation

pinFrom Mechanism to Method - Function Follows Form

Overload Journal #48 - Apr 2002 + Programming Topics   Author: Kevlin Henney

Is programming the manufacture of code? I would suggest that of all the metaphors applied to the development of software, manufacturing rates as perhaps one of the least useful and most harmful. Where in the manufacturing metaphor is the idea that programming is an act of communication? And not just with the compiler. Code is more often read than written, and writing code is just that: writing. You are an author with an audience: Today it may be just you and the compiler, but tomorrow it will include others… Which includes you: the "What was I thinking when I wrote this?" or "Which idiot wrote this? … Oh" syndrome.

This perspective lends a quite different weight to the use of language features in a program. In C++ we have a formal notation for working with concepts as close to or as far from the metal as we chose. The compiler cares little for how clearly we write the code, how fit for purpose it is, or how we work in teams to develop systems. It cares only for the well-formedness of the code as C++ (or at least the compiler's closest approximation). All those other, non-functional considerations are about code as a means of communication with others.

C++ offers an extensive shopping list of mechanisms. It is left to the programmer to make sense - and sensible use - of these, bringing method and clarity to bear on the expression of code, coding to communicate intent idiomatically to others. But too often we find that code looks like, well, code: a cipher whose key is known privately and exclusively to its author - and sometimes, alas, even this much is not true.

Overloading - especially operator overloading - is one of those mechanisms that, when first encountered, can raise eyebrows and open mouths. This response comes in two opposing flavours: "Great! I can see that we could use this all over the system" or "Oh no, I don't think that's for us. Sounds different to what we normally do… too radical". The former can often lead to the most cunning of ciphers with the clarity of hieroglyphics (pre Rosetta Stone); the latter to verbose code that misses the effectiveness of established idioms and the benefits of template-based generic programming. There is, however, a centre ground of practice between these two.

Principles

Express coordinate ideas in similar form.

This principle, that of parallel construction, requires that expressions similar in context and function be outwardly similar. The likeness of form enables the reader to recognize more readily the likeness of content and function. Strunk and White [Strunk-1979]

This advice works as well with the written keyword as it does with the written word. It expresses the idea that similar constructs should have similar meanings, a goodness of fit between intent and realization, interface and implementation, reader and writer. This principle of substitutability [Liskov1987] is often expressed with respect to inheritance and runtime polymorphism [Coplien1992], but applies equally well to the compile-time polymorphism you have with conversions, overloads, and templates [Henney2000a].

Common Name Implies Common Purpose

The principle that overloaded functions work to similar ends is the one that makes the most sense of this feature. As a practice it frees programmers from mangling names to distinguish otherwise similar functions with differing argument lists (this is the job of the compiler).

Following well-established conventions, where possible, clearly makes sense. For instance, the standard C++ library establishes a common set of names and semantics, conventions clarifying that empty means "is empty?" not "to empty", clear means "to clear" and not "is clear?", etc. Note that judgement and resourcefulness are still needed:

  • The standard defines a relatively small set of names, clearly not enough to cover your whole domain of application.

  • The standard is not always consistent in its use of names, e.g. the unhelpfully named get member function in auto_ptr is a query without a side effect, whereas get on a basic_istream is a query with a significant side effect.

  • There are other well-established sources of terminology that provide names you can draw upon. There may be times these clash with the standard. For example, depending on context, begin yields an iterator or initiates a transaction. It is for you to determine whether or not such overloading of meaning, as well as name, is clear.

Operator Underloading

Whatever care is applied to the use of named function overloading applies doubly so to operator overloading. It can be a fertile ground for fertile imaginations. An opportunity to communicate clearly or to resurrect a Tower of Babel.

The built-in types both set expectations in the reader and offer a spec for the writer: "When in doubt, do as the ints do" [Meyers1996]. As with any style principle, this one is elasticated: operator+ for the standard basic_string is not commutative, but its meaning is clear nonetheless. Bitshift operators, operator<< and operator>>, for I/O stream insertion and extraction stretches the elastic taut by an appeal to scripting notations. However, the long history and established presence in the standard library qualifies this idiom as effectively built-in. Do not assume the same distinguished fate awaits any other 'creative' operator deployment! So, as a corollary, it may be worth considering that when in serious doubt, do not do it.

In deciding the suitability or otherwise of operator overloading, keep in mind that it only really makes sense for value-based [Henney2000b] rather than indirection-based objects. Value-based objects represent fine-grained information concepts, typically live on the stack or embedded within other objects, and are passed around by copy or const reference. Syntactically this emphasizes their value and allows easy use of operators. Indirection-based objects, by contrast, represent more significant chunks of system information or behaviour, typically live on the heap, and are passed around by pointer. Syntactically this emphasizes their identity but makes use of operators awkward: having to dereference the pointer explicitly before being able to use an operator somewhat defeats the intended transparency of operator overloading.

Smart Pointers

One of the most common C++ idioms involving overloading is the SMART POINTER, ranging from reference-counted pointers to the essentially simple but surprisingly intricate standard auto_ptr. However, it is a common myth that all smart pointers are concerned with memory management, and that all smart pointers support operator-> and operator* as their pointer-like operations.

The Three Rs

Use determines definition, and clearly not all smart pointers are intended for the same use. We can consider operators for pointers in three categories, the three Rs: (de)referencing, relational, and arithmetic. According to purpose, we can select if and how we provide these:

  • Dereferencing comes in the familiar forms of operator* and operator->, as well as the less familiar and often overlooked operator->* [Meyers1999] and operator().

  • Relational operators make sense for pointer or smart pointers that have a natural ordering, such as raw pointers in the same array or random access iterators. Having only equality (and hence inequality) comparison makes sense for many other pointers, such as reference-counted pointers. They typically test for identity rather than value, which is why auto_ptr does not support such comparison: Exclusive ownership means that in a well-formed system auto_ptr equality comparison will always return false.

  • Pointer arithmetic, such as operator++ and operator+, makes sense for smart pointers that encapsulate some concept of interval or progression, such as iterators.

Function Objects

A common piece of advice offered to developers making a transition from procedural to object-oriented code is that a class should not model a function. Such classes are often named as actions, and typically sport a principal or single member function named "do such and such". While this advice does guard against a common pitfall, it is not always poor practice. Those that have taken this rule of thumb to heart as a legalistic rule need to unlearn a little to appreciate how objects can encapsulate tasks and, in particular, mimic functions. The COMMAND pattern [Gamma-1995] demonstrates the power of task-based objects. The FUNCTOR idiom [Coplien1992] focuses on functional objects that overload operator() to achieve the appearance and transparency of use of conventional functions.

The standard library provides for the use of function objects with generic functions and templated containers, categorizing them as unary or binary functions. It also defines specific function object classes - e.g. less for ordered comparison - and function object adaptors - e.g. pointer_to_unary_function to wrap up naked function pointers. The Boost library [Boost] extends this with other function object classes, adaptors, and the nullary function category, for function objects taking no arguments.

Re-member

As an example of a function object class, focusing on the nullary form for void returning functions, Listing 1 shows code for a member function adaptor. You may have already come across the mem_fun_t family of adaptors in the standard library. However, there are key differences:

  • A remember_function bundles a target object together with a member function pointer for later callback through nullary operator(), whereas a mem_fun_t object simply holds a member function pointer and uses the argument to operator() as its target.

  • Although it is of little practical consequence for a nullary, voidreturning function, a variant for const member functions is not required because the member pointer's type is parameterized as a whole.

  • The target pointer type need not be a raw pointer: smart pointers supporting operator->* will also work.

  • The member pointer type need not be a member function pointer: a member data pointer that points to a nullary function object will also work.

The remember template function is a helper that simplifies composition of remember_function objects, automatically deducing the parameter types in the manner of make_pair for pair, bind2nd for binder2nd, or ptr_fun for pointer_to_unary_function and pointer_to_binary_function.

Generalized Function Pointer

The need for event-driven callbacks, such as timer-triggered actions, is often met with pointers to functions or an implementation of OBSERVER [Gamma-1995]. The former approach is fine for simple event handlers:

class timer {
public:
  void set(const time &delay,
  void (*callback)());
  ...
};

But it is inflexible, handling only functions and not context objects. The OBSERVER-based solution introduces a base class that a concrete handler class must implement:

template<typename target_ptr_type,
         typename member_ptr_type>
class remember_function {
public:
  remember_function(target_ptr_type on,
                  member_ptr_type call)
    : ptr(on), member(call) {}
  void operator()() const {
    (ptr->*member)();
  }
private:
  target_ptr_type ptr;
  member_ptr_type member;
};

template<typename target_ptr_type,
         typename member_ptr_type>
remember_function<target_ptr_type,
                member_ptr_type>
remember(target_ptr_type on,
        member_ptr_type call) {
  return remember_function<target_ptr_type,
              member_ptr_type>(on, call);
}

Listing 1. Function object class and helper for binding target object and member function pointer.

class handler {
public:
  virtual void run() = 0;
  ...
};
class timer {
public:
  void set(const time &delay,
           handler *callback);
  ...
};

However, this introduces a level of indirection that leads to additional memory management responsibilities, and imposes an intrusive base class participation on users for what is a relatively simple scenario. Using arbitrary objects or functions for callback would be preferred. Overloading multiple set member functions in timer is a kitchen-sink solution, leading to a wide interface that attempts to please all people and an awkward timer implementation.

Function objects at first appear to offer a route out: A nullary function pointer or object could be passed in, including a remember_function binding of member to target, and later called back. A member template function would accommodate the substitutability of all the variations:

class timer {
public:
  template<typename nullary_function>
  void set(const time &delay,
           nullary_function callback);
  ...
};

However, this raises a fundamental problem: How does a timer object later execute the callback passed in? Unlike many examples of member template functions in the standard library, this one does not execute the function or function object immediately - it would not be much of a timer if it did! The timer needs to store the callback for later use. Without parameterizing the whole timer class on the nullary_function type, rather than just the set member, this does not appear to be possible. Templating the whole class is undesirable because it means that for each different type of callback, a different timer class instantiation is needed.

A further problem with the member template approach is that a member template function cannot be declared virtual. This would be significant if the timer class were an abstract rather than a concrete class, i.e. an interface to timer features rather than a single implementation. The attempt to decouple both the mechanism of the timer and the target type like this would lead to the following illegal code:

class timer
{
public:
  template<typename nullary_function>
                             // illegal
  virtual void set(const time &delay,
                   nullary_function callback) = 0;
  ...
};

On the Outside

It is possible to resolve the tension in the design by approaching it from a different angle. We can take a step back and ask what simple interface to timer would also simplify its implementation. What is needed is some kind of abstraction of a function pointer that is both generic and generic: generic in the sense of supporting the generic programming style of the STL, and generic in the sense that it is general purpose and easily used in any context:

class timer {
public:
  void set(const time &delay,
           const function_ptr &callback);
  ...
};

To satisfy the requirements for simplicity in timer and our expectations of a function pointer, function_ptr needs to support syntax for initialization, assignment, and execution. Listing 2 shows such an interface.

class function_ptr {
public:
  function_ptr();
  function_ptr(const function_ptr &other);
  template<typename nullary_function>
    function_ptr(nullary_function function);
  ~function_ptr();
  function_ptr &operator=(
                const function_ptr &rhs)
  void operator()() const;
  ...
};

Listing 2. Smart function pointer interface.

On the Inside

This is all very well, but it has yet to solve the problem fully: It looks nice, but how is it implemented? How can a function_ptr object hold arbitrary representation, constrained only by the requirement that it must support an operator() with no arguments? The technique used is based on the EXTERNAL POLYMORPHISM pattern [Cleeland-1998], in particular the use of inheritance and runtime polymorphism to adapt template-based genericity for value-based objects through a level of indirection [Henney2000b]. Listing 3 opens up function_ptr to show this collaboration in practice, including the conversion (i.e. initialization) from any arbitrary nullary function object or pointer.

class function_ptr {
public:
  function_ptr()
    : body(0) {}
  template<typename nullary_function>
  function_ptr(nullary_function function)
    : body(new adaptor<nullary_function>
                             (function)) {}
  ~function_ptr() {
    delete body;
  }
  ...
private:
  class callable {
  public:
    virtual ~callable() {}
    virtual callable *clone() const = 0;
    virtual void call() = 0;
  };
  template<typename nullary_function>
  class adaptor : public callable {
  public:
    adaptor(nullary_function function)
      : adaptee(function) {}
    virtual callable *clone() const {
      return new adaptor(adaptee);
    }
    virtual void call() {
      adaptee();
    }
    nullary_function adaptee;
  };
  callable *body;
};

Listing 3. Smart function pointer representation and basic construction.

Clone Me

function_ptr is a value type, so it stands to reason that it should support copying through construction and assignment - an identity form of inward conversion [Henney2000b]. The body of a function_ptr cannot be copied directly because of the decoupling of interface from implementation, which leads to the polymorphic copying, or cloning, technique shown in Listing 4.

class function_ptr {
public:
  ...
  function_ptr(const function_ptr &other)
    : body(other.body
             ? other.body->clone()
             : 0) {}
  function_ptr &operator=(
              const function_ptr &rhs) {
    callable *old_body = body;
    body = rhs.body
             ? rhs.body->clone()
             : 0;
    delete old_body;
    return *this;
  }
  ...
};

Listing 4. Smart function pointer copying.

The assignment operator uses the COPY BEFORE RELEASE idiom [Henney1998] for exception- and self-assignment-safety. A nonthrowing swap could also be used for this [Sutter2000], but for this article the interface to function_ptr is being kept small and based only on operators.

Call Me

The final piece of the jigsaw is to dereference a function_ptr - fetch and execute. A raw function pointer supports dereferencing through operator*, which is the identity operation on a function pointer, operator(), which can be called directly on a function pointer without using operator*, but no operator->. This is the model that function_ptr should follow, and does so in Listing 5. For a null pointer, the execution assumes that for no function there is no function, as opposed to undefined behaviour as per built-in pointers.

class function_ptr {
public:
  ...
  void operator()() const {
  if(body)
    body->call();
  }
  function_ptr &operator*() {
    return *this;
  }
  const function_ptr &operator*() const {
    return *this;
  }
  ...
};

Listing 5. Smart function pointer dereferencing and calling.

Remember Me?

A focus on forms of substitutability - in this case derivation, overloading, and templates, each a way of establishing an interface - can decouple a system, allowing greater suppleness and clearer code. Putting it all together, we can put together a simple scenario based around the proposed timer interface. Consider the interface to a device that can be turned on or off at particular times, e.g. a heating or an air conditioning system:

class device {
public:
  virtual void turn_on() = 0;
  virtual void turn_off() = 0;
  ...
};

The following example combines the concepts and features presented so far:

void set_up(device *target, timer *scheduler,
  const time &on, const time &off) {
    scheduler->set(on,
          remember(target, &device::turn_on));
    scheduler->set(off,
          remember(target, &device::turn_off));
}

Conclusion

There is little that excites programmers' passions more than discussions of style, but there is little that helps them more than common understanding. Overloading is a powerful feature whose reasoned use underpins many idioms at the heart of the modern C++ programmer's vocabulary: smart pointers, function objects, iterators, etc.

Within each of these idioms there is variation for expression rather than any simplistic, one-size-fits-all, cookie-cutter rule. A function object will support some form of operator(), and a smart pointer must support some form of dereferencing, but this does not by necessity include operator->, as demonstrated by function_ptr, a smart function pointer.

References

[Boost] Boost library website, http://www.boost.org.

[Cleeland-1998] Chris Cleeland, Douglas C Schmidt, and Tim Harrison, "External Polymorphism", Pattern Languages of Program Design 3, edited by Robert Martin, Dirk Riehle, and Frank Buschmann, Addison-Wesley, 1998.

[Coplien1992] James O Coplien, Advanced C++: Programming Styles and Idioms, Addison-Wesley, 1992.

[Gamma-1995] Erich Gamma, Richard Helm, Ralph Johnson, and John Vlissides, Design Patterns: Elements of Reusable Object-Oriented Software, Addison-Wesley, 1995.

[Henney1998] Kevlin Henney, "Creating Stable Assignments", C++ Report 10(6), June 1998, also available from http://www.curbralan.com.

[Henney2000a] Kevlin Henney, "From Mechanism to Method: Substitutability", C++ Report 12(5), May 2000, also available from http://www.curbralan.com.

[Henney2000b] Kevlin Henney, "From Mechanism to Method: Valued Conversions", C++ Report 12(7), May 2000, also available from http://www.curbralan.com.

[Liskov1987] Barbara Liskov, "Data Abstraction and Hierarchy", OOPSLA '87 Addendum to the Proceedings, October 1987.

[Meyers1996] Scott Meyers, More Effective C++: 35 New Ways to Improve Your Programs and Designs, Addison-Wesley, 1996.

[Meyers1999] Scott Meyers, "Implementing operator->* for Smart Pointers", Dr. Dobb's Journal, October 1999.

[Strunk-1979] William Strunk Jr and E B White, The Elements of Style, 3rd edition, Macmillan, 1979.

[Sutter2000] Herb Sutter, Exceptional C++, Addison-Wesley, 2000.

Overload Journal #48 - Apr 2002 + Programming Topics