ACCU Home page ACCU Conference Page
Search Contact us ACCU at Flickr ACCU at GitHib ACCU at Google+ ACCU at Facebook ACCU at Linked-in ACCU at Twitter Skip Navigation

pinFrom Mechanism to Method: Generic Decoupling

Overload Journal #60 - Apr 2004 + Programming Topics   Author: Kevlin Henney

Simplification of code is often equated with the elimination of options. At best, this turns out to be a false correlation; at worst, it hampers the long-term code quality and development. The side effects of premature generalization and over-abstraction [Gabriel] are as much a problem in software as the consequences of premature optimization: complexity, unmaintainability, brittleness, bloatware, strengthened coupling, weakened cohesion, loss of flexibility - in short, a lot of criticisms that we would prefer not to have leveled at our own code.

It is true that in many cases of simplification options will be eliminated, but more often than not the eliminated options are the ones that tended to complicate the code or were of little practical use in the first place - dead code waiting for a garbage collector.

For example, in the C++ Standard library, the only noticeable role that traits and allocator parameters of the std::basic_string serve is to complicate the usage and error reporting on, typically, std::string. Their role is so constrained as to make them almost completely useless. The few people that take advantage of them are often attempting to solve the wrong problem or are employing the wrong solution. There is in fact a great deal of scope for increased simplicity and useful parameterization in string types [Alexandrescu2001, Henney2001]; it's just a shame that std::basic_string and its moribund parameters are already parked in that space.

It is possible to simplify the structure of software without losing effective options. It is even possible to do so and increase your options. Now, that sounds worthwhile: simpler and more flexible.

Decoupling in General

Although we cannot predict the future with any certainty, it is still possible to write code that is graceful and accommodating - rather than troublesome and resistant - in the face of change. Software development is concerned with the development of structure - partitioning and connection, separation and composition - so any conscious and conscientious approach to software development should have, as one of its prominent manifesto promises, a clear focus on structure management.

The aggressive pursuit of LCHC (low coupling and high cohesion) can ensure that the effect of change is simplified and isolated, rather than traumatic and global. LCHC also simplifies testing, building, versioning, experimentation, optimization, team organization, and pretty much any other development activity you can think of that absorbs more time, effort, or grief than you had originally anticipated. Sadly, few approaches can genuinely boast LCHC as one of their main pledges, preferring instead the active pursuit of more obviously crowd-pleasing headlines such as reuse.

The trick to achieving generality is, somewhat counterintuitively, to make the code specific enough to be fit for purpose. A fit to the task in hand must be targeted with one eye; the other should be seeking opportunities to keep options open, but without attempting to pursue all of the choices. It is tempting to try to enumerate all the possible ways in which something could change and be adapted and then incorporate all the necessary hooks and extra parameterization into your design. Unfortunately this style tends to make your code more complex to understand. In fact, your code can become so full of conveniences that it's almost impossible to use either simply or correctly. Over-guessing may narrow rather than widen your options. You - and your users - may end up with a lot of unused code and many workarounds.

By contrast, a concerted focus on dependency management will deliver you some tangible benefits in the short term - development times, build times, lunch times - and reduce the cost of change in the long term. The loose coupling keeps the code supple and more stable as, over time, the genuine sources of variation, and therefore parameterization, become apparent and needed.

So what are the sources of coupling in C++ code? We can classify two basic forms of coupling:

  • Physical coupling requires that for the compilation of one piece of code the compiler must see another piece of code. In practical terms this means that the code depended upon appears in the same source file or is pulled in by #include.

  • Conceptual coupling [Note1] implies that for a piece of code to work there is a dependency on another concept, which may exist either tacitly outside the code or explicitly within it. For instance, a template parameter can be described by a set of requirements outside the code, whereas a class definition is known to the compiler.

One does not necessarily imply the other:

  • An inheritance relationship represents both a conceptual and a physical relationship. A derived class is conceptually dependent on its base class(es) because it may use or override features. The compiler must also be able to see the definition of any base classes to compile a derived class.

  • A class or function template conceptually depends on its template parameters, but the use of the parameter does not require any #include support. A dependency on an actual parameter type occurs at the point of use of instantiation, but not at the point of definition.

  • The use of inline functions or template code written in headers may introduce a physical dependency, but not necessarily a conceptual dependency. Use of an inline function or a class template also pulls in any of the dependencies that are used in implementation, but are not relevant to the usage interface.

There are four complementary approaches for decoupling a C++ system.

Dynamic Typing

Why does that compile-link cycle take so long? Static type checking is the root cause of the delay; the design of the preprocessor merely exacerbates the issue. Want to know that your code makes at least basic sense? Let the compiler check your types and how you're using them and then let the linker tie all the loose ends in your program together. Efficiency and confidence in execution is your reward; extended surfing breaks and water cooler conversations are your punishment. Hmm, OK, perhaps we need a different spin on this: long build times put the irritation and detrimental into iterative and incremental development, frustration and time wasting that are only temporarily relieved by a machine upgrade.

If you weaken the type system you reduce the physical dependencies. This may conjure up images of void * in your mind, but banish those thoughts immediately: I want to loosen coupling, but the kind of unsafe promiscuity that void * often encourages is not quite what I had in mind. A more dynamically checked type system lies at the heart of many interpreted languages, from LISP to Smalltalk, Awk to Ruby. Good support for reflection allows you to get at the soft underbelly of other statically typed systems, such as Java or the meta-information available in many component middleware architectures. It is a matter of balance: you loosen the checking at compile time to increase flexibility, but you increase scope for failure at run time. You pays your money; you makes your choices; you takes your chances. That's the essence of design.

C++ does not currently have good standard support for reflection: the existing RTTI mechanism is a foot in the door, but no more. In spite of the half-open door, C++ programs often make effective use of dynamic typing:

  • Variant types, such as boost::any [Henney2000, Boost] or CORBA's any type, can hold values of arbitrary type. Depending on your application you can choose to leave the type fully uninterpreted, as in the case of any, or you can impose constraints on the contained types that are reflected in the interface of the variant (e.g., comparison or arithmetic operations).

  • Work in terms of strings, interpreting them as necessary and with respect to the context. In the Age of the Internet, strings are the new integers: everyone's using them for everything. Whether we are talking about internal command languages or data exchange, strings are remarkably versatile - given the right functions and classes, they support the Three Rs. You can take some of the guesswork out of how to structure your data and work with it by adopting a data definition language or meta-language, of which XML is certainly the most fashionable.

However, remember that these techniques reduce only the physical dependencies not the conceptual ones. Those are as strong as ever and will be lining up to bite you at run time should you disrespect them. You still need to know how to use them. Their correct usage is now implicit rather than explicit, and semantic drift between versions or developers is all too easy.

Flexibility has a price... and a number. This was recently brought home to me when I was entering a particularly long order number into a spreadsheet cell: the spreadsheet abbreviated the many significant digits of the reference number using scientific notation. Aha, yes, it is a number, just not that kind of number.

Interface Classes

Inheritance in its most common employment seems to be used more for subclassing (with a focus on inheritance of code) than for subtyping (with a focus on classification and substitutability). Hierarchies that accumulate implementation, often with concrete classes inheriting from concrete classes, lead to classes that are hard to understand.

But common is not the same as recommended: such usage is in direct contrast to much of the advice on practice that is available and held in some regard. For instance, only the leaves of a hierarchy should be concrete; its roots should be fully abstract. virtual functions should be introduced into a hierarchy as pure virtuals rather than with default implementations that must be guessed, and delegation and non-public derivation should be used to acquire implementation when there is no intent to hold a reference or pointer to a base class.

Is this just so much theory? No, it's better than either just theory or just practice: it's both. In practice it can be demonstrated that the failure to use inheritance in a controlled manner can be much worse than not using it at all [Hatton]. The use of deep hierarchies, with implementation scattered, defaulted, accumulated, and overridden over a derived trail of concrete classes, actively ambushes our ability as humans to grasp all the features of a concept within a single embrace. This kind of inheritance often sabotages the localization benefits of encapsulation.

All this may sound harsh and idealistic, but it is typically less harsh and far less idealistic than believing in the timely development, and appropriate quality, of a project that takes the common but unrecommended path. Of course there is wriggle room for pragmatism, for compromise. But remember that to compromise has two different meanings - make sure you choose the one that means to settle or resolve by making concessions rather than the one that means to expose to suspicion, disrepute, or mischief.

Inheritance is the strongest form of logical coupling you can have. The need for physical coupling follows in its wake: base classes must be directly visible or included in the source above their derived classes. But derivation is a blade with two edges: you can also use it to reduce coupling in a system.

An interface class [Carroll-, Henney2001_2] (also known as a protocol class [Lakos]) refocuses a class hierarchy's clients on the conceptual interface, away from the physical baggage and variability of its descendents. The absence of code in an interface class contributes to its stability [Martin] and comprehensibility - although a code-free class sometimes clashes with a programmer's instincts for producing executable code. The Observer pattern [Gamma-] is an example of a larger pattern that includes this smaller interface-decoupling pattern:

class subject;

class observer 
{
public:
  virtual ~observer();
  virtual void update(subject *) = 0;
protected:
  observer() {}
private:
  observer(const observer &);
  observer &operator=(const observer &);
};

The use of virtual functions in interface classes is distinctly public. Such a recommendation is clearly in tension with the alternative recommendation that class hierarchies should always have non-virtual public interfaces [Sutter2001]. A number of practices, such as the Template Method pattern [Gamma-] and the corresponding Form Template Method refactoring [Fowler], tend to give rise to nonvirtual public interfaces in C++. Such interfaces have some useful properties, but they typically arise as a consequence of specific practices rather than being a necessary and general virtue in their own right. They are by no means the only tool in the box. Design should be considered a dialogue with a situation rather than a monologue; there is often more than one reasonable route that such a conversation may take.

Hidden Delegation

Wherever there is a recommendation concerning inheritance, you can be sure that not far behind it is a contrasting recommendation framed in terms of delegation. The root of delegation-based decoupling is the forward declaration. It can be used both to resolve the problem of tail-chasing cyclic dependencies and to reduce the exposed physical dependency of using an #include, reducing the essential surface area between class definitions:

class observer;

class subject 
{
public:
  virtual ~subject();
  virtual void attach(observer *) = 0;
  virtual void detach(observer *) = 0;
  ...
protected:
  subject() {}
private:
  subject(const subject &);
  subject &operator=(const subject &);
};

For classes that are, by nature, concrete and not part of a class hierarchy, interface decoupling through interface classes has relatively little to offer. Value objects [Henney2000], for instance, are best manipulated directly in terms of their concrete type. Interface classes are primarily a means for decoupling class hierarchies. Another practice is required for specifically concrete classes.

The common idiom goes by various names, of which the most evocative is also the name originally coined for it in the late 1980s: the Cheshire Cat idiom [Murray]. The name, taken from Lewis Carroll's surreal cat whose ability to disappear except for its grin quite bemused Alice, is apt:

class cat 
{
public:
  ...
private:
  class body;
  body *self;
};

Here the representation disappears entirely from the class definition in the header, leaving behind only the discreet smile of a pointer. The details of the body are elaborated in the corresponding source file:

class cat::body 
{
public:
  body();
  ~body();
  ... // representation details
};

This technique also goes by the name of the Pimpl idiom [Sutter2000] or, very descriptively, as the Fully Insulating Concrete Class [Lakos]. Naturally, all idioms have consequences that must be considered: the additional level of indirection, extra memory management, and restriction on inlined functions are the price of the afforded creature comforts in this case. The introduction of this separation also allows representation sharing, although this is not a path one should tread either necessarily or lightly [Henney2001].

Cheshire Cats can be introduced to complement the use of interface classes, ensuring that class hierarchy users are as insulated from representation details as possible. However, they are less effective with class templates. Compiler portability constraints mean that it is common to require the definition of class template members in header files. In such situations, having to include the full definition of the nested body in the header rather takes the smile off the technique.

Template Parameters

Templates are not normally associated with loosening physical coupling. Quite the opposite. The inclusion of source code in headers imposes a significant burden on the size of headers and the patience of the programmer. However, the conceptual loosening that arises from defining function and class templates independently of their actual template parameter types has a knock-on physical decoupling effect. The point at which the physical dependency on the actual parameter type is needed is deferred to the point of use in the code.

Generic decoupling forms the basis of generic programming and the STL: templated iterator ranges for algorithm-based functions and container constructors, and templated value types to allow any appropriate convertible value to be used in a function, member or non-member. The following function (inlined for brevity) shows how the implementation of an Observer's subject class might use existing STL features to automate observer updates:

class subjected : public subject 
{
  ...
  void notify() 
  {
    std::for_each(observers.begin(), observers.end(),
                  std::bind2nd(
                    std::mem_fun(&observer::update),this));
  }
  ...
  std::list<observer *> observers;
};

An alternative approach perhaps demonstrates a number of generic-decoupling techniques a little more explicitly:

template<typename argument_type>
class update 
{
public:
  explicit update(argument_type argument)
     : argument(argument) 
  {}

  template<typename updateable>
  void operator()(updateable *target) const 
  {
    target->update(argument);
  }
private:
  argument_type argument;
};

template<typename argument_type>
update<argument_type>
updater(argument_type argument) 
{
  return update<argument_type>(argument);
}

This generalized code leads to the following crisp usage:

class subjected : public subject {
  ...
  void notify() 
  {
    std::for_each(observers.begin(), observers.end(), updater(this));
  }
  ...
  std::list<observer *> observers;
};

The obvious trade off with using templates to decouple is that implementation detail typically migrates to header files. This is particularly noticeable when introducing member function templates in place of ordinary member functions. Another consequence of the decision to template member functions is that they cannot be declared virtual. A more dynamically typed, variant-based approach can counterbalance this [Henney2000, Henney2000_2].

What is also apparent with generic decoupling is that the code tends to become more flexible and more precise as an immediate consequence. For instance, a different take on the needs of an observer dispenses with the need for any forward declarations:

template<typename subject>
class observer 
{
public:
  virtual ~observer();
  virtual void update(subject *) = 0;
protected:
  observer() {}
private:
  observer(const observer &);
  observer &operator=(const observer &);
};

And consequently allows more flexible and varied observing:

class data;
class events;
class watcher : public observer<data>, public observer<events> 
{
public:
  virtual void update(data *);
  virtual void update(events *);
  ...
};

Noosely Coupled Exceptions

As another worked example of generic decoupling, it is possible to loosen the noose of cyclic dependencies. Consider the standard exception classes defined in <stdexcept>. Each exception takes a std::string for construction. Note that std::string is mentioned only in the single argument constructor: There is no requirement that it is used for implementation, and the only query function offered by the standard exceptions, <stdexcept>, returns a const char *. Given this asymmetry in construction versus query types, and the role of exceptions in a program, it is certainly open to question whether std::string should be used at all in the interface.

However, the issue is not so much with the choice of type dependencies in the library in general, but with the nature of the dependencies: The <string> header defines std::basic_string, some of whose functions throw std::out_of_range. There is therefore a cyclic dependency between the types defined in <stdexcept> and those in <string>; this logical dependency is made more physical when inlined implementations are used - the norm for template implementations. The absence of a standard <stringfwd> header or a more general concept of strings means that each vendor is invited to break the cycle in their own way, some of which meet users expectations and some of which do not (e.g., char * may or may not convert implicitly for the exception constructor argument).

As an aside, it can be considered surprising that exceptions are granted the privilege to use string given that I/O and file handling, which are more obviously and intimately connected with string handling, have no such honor. Although file streams depend on char_traits, as found in <string>, const char * is used as the type for naming files and the type for predefined string insertion and extraction. The <string> header itself depends on I/O streams, representing another dependency noose.

Loosening the Noose

Returning to the <stdexcept> and <string> cycle, a decoupling can be arrived at by considering sufficiency and substitutability: the exception classes in <stdexcept> are conceptually more primitive than std::string and should not have the imposition and dependency on such a specific string type. The dependency should be narrower and more accommodating. The diversity of string-user needs means that such users cannot be characterized collectively as a community. Likewise, their needs cannot be met by a single type such as std::basic_string - a class template that attempts to be all things to all people, but manages only a few in each case.

So what if we don't depend on a specific string type at all? The following is an alternative version of std::logic_error, which uses a dynamically allocated char * internal representation and has no dependency on <string>:

class logic_error : public exception 
{
public:
  explicit logic_error(const char *detail)
     : detail(duplicate(detail, strchr(detail, '\0'))) 
  {}

  template<typename string>
  explicit logic_error(const string &detail)
    : detail(duplicate(detail.begin(), detail.end())) 
  {}

  logic_error(const logic_error &other)
    : detail(duplicate(other.detail, strchr(other.detail, '\0'))) 
  {}

  logic_error &operator=(const logic_error &rhs) 
  {
    char *new_detail = duplicate(rhs.detail, strchr(rhs.detail, '\0'));
    delete[] detail;
    detail = new_detail;
    return *this;
  }

  virtual ~logic_error() 
  {
    delete[] detail;
  }

  virtual const char *what() const throw() 
  {
    return detail;
  }

private:
  template<typename iterator>
  static char *duplicate(iterator begin, iterator end) 
  {
    char *result = new char[end - begin + 1];
    copy(begin, end, result);
    result[end - begin] = '\0';
    return result;
  }
  char *detail;
};

Lightly Strung

The most commonly used string initializer for exceptions is a vanilla null-terminated character sequence. In the revised logic_error shown, this maps directly to a constructor without requiring conversions and the creation of temporary string objects:

throw std::logic_error("illogical");

The templated constructor caters to the standard string type, and indeed any other character container that satisfies the minimal requirements for begin and end members that return randomaccess iterators - SGI's rope [STL], std::vector<char>, or a suitable string type of your own devising. So with a few obvious drawbacks, not only has the cyclic dependency been removed, the generality of the code has been increased:

std::vector<char> message;
...
throw std::logic_error(message);

I said few drawbacks. That is not to say that there are none. However, the most obvious and significant limitation may not be considered that great a disadvantage: a string type that has a userdefined conversion to char *, but does not sport begin and end functions, can no longer be used to directly initialize a logic_error. The success of such a conversion is not guaranteed in the existing Standard, but the arrangement of types in the headers often supports it. The suggested redesign is forward rather than backward looking: string classes that support such user-defined conversions are unsafe and the absence of support for container operations is nonstandard. So if you were to rework your own existing classes to support this style of string decoupling, existing code that worked in terms of legacy string classes would need to be modified - either with explicit casts or, taking the hint, with more standard-conforming types.

Conclusion

Code should be supple, not subtle. For code there is such a thing as being too well connected and too eager to please. Generality and reuse are often better served by paying attention to necessity and to the core activities of software development - comprehension, change, and confirmation - than to whimsy and speculation.

Refactoring code to reduce its coupling often has the effect of increasing its cohesion. In the exception example, physical and conceptual decoupling improved the precision of the requirement on the string type: only specific features were required, not the whole interface. This LCHC strategy suggests a design path that is as applicable to domain-specific libraries as it is to the liberalization of string types.

References and Notes

[Gabriel] Richard P. Gabriel. Patterns of Software: Tales from the Software Community (Oxford, 1996).

[Alexandrescu2001] Andrei Alexandrescu. "Generic<Programming>: A Policy- Based basic_string Implementation," C/C++ Users Journal C++ Experts Forum, June 2001, www.cuj.com/experts/1906/alexandr.htm

[Henney2001] Kevlin Henney. "From Mechanism to Method: Distinctly Qualified," C/C++ Users Journal C++ Experts Forum, May 2001, www.cuj.com/experts/1905/henney.htm

[Note1] Conceptual dependencies are sometimes referred to as logical dependencies. The distinction between - and separation of - logical from physical has been handed down to us from structured analysis and design. However, the bias inherent in the use of the word logical tends to cast all physical concerns into the shade as impure and irrational. Such Puritanism is of little practical use. The natural complement of physical is conceptual rather than logical, whose antonym is illogical. C++'s reliance on the preprocessor may not be elegant, but, given its rules, it is entirely logical that a piece of code requiring a declaration in a header file should also have a physical dependency on it.

[Henney2000] Kevlin Henney. "From Mechanism to Method: Valued Conversions," C++ Report, July-August 2000, www.curbralan.com

[Boost] Boost C++ Libraries, www.boost.org

[Hatton] Les Hatton. "Does OO Sync with the Way We Think?", IEEE Software, 1998, www.oakcomp.co.uk

[Carroll-] Martin D. Carroll and Margaret A. Ellis. Designing and Coding Reusable C++ (Addison-Wesley, 1995).

[Henney2001_2] Kevlin Henney. "From Mechanism to Method: Total Ellipse," C/C++ Users Journal C++ Experts Forum, March 2001, www.cuj.com/experts/1903/henney.htm

[Lakos] John Lakos. Large-Scale C++ Software Design (Addison- Wesley, 1996).

[Martin] Robert C. Martin. "Object-Oriented Design Quality Metrics: An Analysis of Dependencies," ROAD, September-October 1995, www.objectmentor.com

[Gamma-] Erich Gamma, Richard Helm, Ralph Johnson, and John Vlissides. Design Patterns: Elements of Reusable Object- Oriented Software (Addison-Wesley, 1995).

[Sutter2001] Herb Sutter. "Sutter's Mill: Virtuality," C/C++ Users Journal, September 2001.

[Fowler] Martin Fowler. Refactoring: Improving the Design of Existing Code (Addison-Wesley, 1999).

[Murray] Robert B. Murray. C++ Strategies and Tactics (Addison- Wesley, 1993).

[Sutter2000] Herb Sutter. Exceptional C++ (Addison-Wesley, 2000).

[Henney2000_2] Kevlin Henney. "From Mechanism to Method: Function Follows Form," C/C++ Users Journal C++ Experts Forum, November 2000, www.cuj.com/experts/1811/henney.htm

[STL] SGI Standard Template Library Programmer's Guide, www.sgi.com/tech/stl/

Overload Journal #60 - Apr 2004 + Programming Topics