Refactoring Towards Seams in C++

Breaking dependencies in existing code is hard. Michael Rüegg explains how seams can help and provides new automated refactorings for C++ to achieve them.

Unwanted dependencies are a critical problem in software development. We often have to break existing dependencies before we can change some piece of code. Breaking existing dependencies is also an important preliminary to introduce unit tests for legacy code — according to Feathers definition code without unit tests [ Feathers04 ].

Feathers’ seams help in reasoning about the opportunities that exist when we have to break dependencies. The goal is to have a place where we can alter the behaviour of a program without modifying it in that place. This is important because editing the source code is often not an option (e. g., when a function the code depends on is provided by a system library).

What is a seam?

Feathers characterises a seam as a place in our code base where we can alter behaviour without being forced to edit it in that place. This has the advantage that we can inject the dependencies from outside, which leads to both an improved design and better testability. Every seam has one important property: an enabling point . This is the place where we can choose between one or another. There are different kinds of seam types. We focus on object, compile, preprocessor and link seams in this article.

C++ offers a wide variety of language mechanisms to create seams. Beside the classic way of using subtype polymorphism which relies on inheritance, C++ also provides static polymorphism through template parameters. With the help of the preprocessor or the linker we have additional ways of creating seams.

Once we have broken dependencies in our legacy code base by introducing seams, our code is not relying on fixed dependencies anymore, but instead asks for collaborators through dependency injection.

Not only has our design improved much, but we are now also able to write unit tests for our code.

The seam types discussed in this article are often hard and time-consuming to achieve by hand. This is why automated refactorings and IDE support would be beneficial. The problem is that current C++ IDE’s do not offer this in the extent we describe it here. Therefore, we will explain how to create these seam types by applying refactorings which we have implemented for our IDE of choice, Eclipse C/C++ Development Tooling (CDT).

The classic way: object seam

Object seams are probably the most common seam type. To start with an example, consider Listing 1 where the class GameFourWins has a hard coded dependency to Die . According to Feathers’ definition, the call to play is not a seam because it is missing an enabling point. We cannot alter the behaviour of the member function play without changing its function body because the used member variable die is based on the concrete class Die . Furthermore, we cannot subclass GameFourWins and override play because play is monomorphic (not virtual).

// Die.h
struct Die {
int roll () const ;
};
// Die.cpp
int Die :: roll () const {
  return rand () % 6 + 1;
}
// GameFourWins .h
struct GameFourWins {
  void play (std :: ostream & os);
private :
  Die die;
};
// GameFourWins .cpp
void GameFourWins :: play (std :: ostream & os 
   = std :: cout ) {
  if ( die. roll () == 4) {
    os << "You won !" << std :: endl ;
  } else {
    os << "You lost !" << std :: endl ;
  }
}

Listing 1

This fixed dependency also makes it hard to test GameFourWins in isolation because Die uses C’s standard library pseudo-random number generator function rand . Although rand is a deterministic function since calls to it will return the same sequence of numbers for any given seed, it is hard and cumbersome to setup a specific seed for our purposes.

The classic way to alter the behaviour of GameFourWins is to inject the dependency from outside. The injected class inherits from a base class, thus enabling subtype polymorphism. To achieve that, we first apply the refactoring Extract Interface [ Fowler99 ]. Then we provide a constructor to pass the dependency from outside (we could also have passed an instance of Die to play directly). The resulting code is shown in Listing 2.

struct IDie {
  virtual ~ IDie () {}
  virtual int roll () const =0;
};
struct Die : IDie {
  int roll () const {
    return rand () % 6 + 1;
  }
};
struct GameFourWins {
  GameFourWins ( IDie & die) : die(die ) {}
  void play (std :: ostream & os = std :: cout ) {
    // as before
  }
private :
  IDie & die;
};

Listing 2

This way we can now inject a different kind of Die depending on the context we need. This is a seam because we now have an enabling point: the instance of Die that is passed to the constructor of GameFourWins .

Leverage your compiler: compile seam

Although object seams are the classic way of injecting dependencies, we think there is often a better solution to achieve the same goals. C++ has a tool for this job providing static polymorphism: template parameters. With template parameters, we can inject dependencies at compile-time. We therefore call this seam compile seam .

The essential step for this seam type is the application of a new refactoring we named Extract Template Parameter [ Thrier10 ]. The result of this refactoring can be seen in Listing 3. The enabling point of this seam is the place where the template class GameFourWinsT is instantiated.

template <typename Dice =Die >
struct GameFourWinsT {
  void play (std :: ostream &os = std :: cout ){
    if ( die. roll () == 4) {
      os << "You won !" << std :: endl ;
    } else {
      os << "You lost !" << std :: endl ;
    }
  }
private :
  Dice die;
};
typedef GameFourWinsT <Die > GameFourWins ;

Listing 3

One might argue that we ignore the intrusion of testability into our production code: we have to template a class in order to inject a dependency, where this might only be an issue during testing. The approach taken by our refactoring is to create a typedef which instantiates the template with the concrete type that has been used before applying the refactoring. This has the advantage that we do not break existing code.

The use of static polymorphism with template parameters has several advantages over object seams with subtype polymorphism. It does not incur the run-time overhead of calling virtual member functions that can be unacceptable for certain systems. This overhead results due to pointer indirection, the necessary initialisation of the vtable (the table of pointers to its member functions) and because virtual functions usually cannot be inlined by compilers. Beside performance considerations, there is also an increased space-cost due to the additional pointer per object that has to be stored for the vtable [ Driesen96 ].

Beside performance and space considerations, inheritance brings all the well-known software engineering problems like tight coupling, enhanced complexity and fragility with it [ Sutter04 , Meyers05 ]. Most of these disadvantages can be avoided with the use of templates. Probably the most important advantage of using templates is that a template argument only needs to define the members that are actually used by the instantiation of the template (providing compile-time duck typing). This can ease the burden of an otherwise wide interface that one might need to implement in case of an object seam.

What is compile-time duck typing?

Duck typing is the name of a concept that can be shortly described as follows:

If something quacks like a duck and walks like a duck , then we treat it as a duck , without verifying if it is of type duck .

Although duck typing is mainly used in the context of dynamically typed programming languages, C++ offers duck typing at compile-time with templates. Instead of explicitly specifying an interface our type has to inherit from, our duck (the template argument) just has to provide the features that are used in the template definition.

Of course, there are also drawbacks of using templates in C++. They can lead to increased compiletimes, code bloat when used naively and (sometimes) reduced clarity. The latter is because of the missing support for concepts even with C++11, therefore solely relying on naming and documentation of template parameters. A further disadvantage is that we have to expose our implementation (template definition) to the clients of our code which is sometimes problematic.

Apart from these objections, we are convinced that this seam type should be preferred wherever possible over object seams.

In case of emergency only: preprocessing seam

C and C++ offer another possibility to alter the behaviour of code without touching it in that place using the preprocessor. Although we are able to change the behaviour of existing code as shown with object and compile seams before, we think preprocessor seams are especially useful for debugging purposes like tracing function calls. An example of this is shown in Listing 4 where we exhibit how calls to C’s malloc function can be traced for statistical purposes.

// malloc .h
# ifndef MALLOC_H_
# define MALLOC_H_
void * my_malloc ( size_t size ,
   const char * fileName , int lineNumber );
# define malloc ( size ) my_malloc (( size ),
   __FILE__ , __LINE__ )
# endif
// malloc .cpp
# include " malloc .h"
# undef malloc
void * my_malloc ( size_t size ,
   const char * fileName , int lineNumber ) {
  // remember allocation in statistics
  return malloc ( size );
}

Listing 4

The enabling point for this seam are the options of our compiler to choose between the real and our tracing implementation. We use the option -include of the GNU compiler here to include the header file malloc.h into every translation unit. With #undef we are still able to call the original implementation of malloc .

We strongly suggest to not using the preprocessor excessively in C++. The preprocessor is just a limited text-replacement tool lacking type safety that causes hard to track bugs. Nevertheless, it comes in handy for this task. Note that a disadvantage of this seam type is that we cannot trace member functions. Furthermore, if a member function has the same name as the macro the substitution takes place inadvertently.

Tweak your build scripts: link seam

Beside the separate preprocessing step that occurs before compilation, we also have a post-compilation step called linking in C and C++ that is used to combine the results the compiler has emitted. The linker gives us another kind of seam called link seam [ Feathers04 ].

Myers and Bazinet discussed how to intercept functions with the linker for the programming language C [Myers04] . Our contribution is to show how link seams can be accomplished in C++ where name mangling comes into play. We present three possibilities of using the linker to intercept or shadow function calls.

Although all of them are specific to the used tool chain and platform, they have one property in common: their enabling point lies outside of the code, i.e., in our build scripts. When doing link seams, we create separate libraries for code we want as a replacement. This allows us to adapt our build scripts to either link to those for testing rather than to the production ones [ Feathers04 ].

Shadow functions through linking order

In this type of link seam we make use of the linking order. Although from the language standpoint the order in which the linker processes the files given is undefined, it has to be specified by the tool chain [ Gough04 ]: ‘ The traditional behaviour of linkers is to search for external functions from left to right in the libraries specified on the command line. This means that a library containing the definition of a function should appear after any source files or object files which use it. ’

The linker incorporates any undefined symbols from libraries which have not been defined in the given object files. If we pass the object files first before the libraries with the functions we want to replace, the GNU linker prefers them over those provided by the libraries. Note that this would not work if we placed the library before the object files. In this case, the linker would take the symbol from the library and yield a duplicate definition error when considering the object file.

As an example, consider these commands for the code shown in Listing 5.

// GameFourWins .cpp and Die.cpp as in Listing 1
// shadow_roll .cpp
# include " Die.h"
int Die :: roll () const {
  return 4;
}
// test .cpp
void testGameFourWins () {
  // ...
}

Listing 5

  $ ar -r libGame.a Die.o GameFourWins.o
  $ g++ -Ldir /to/ GameLib -o Test test.o
  shadow_roll.o -lGame

The order given to the linker is exactly as we need it to prefer the symbol in the object file since the library comes at the end of the list. This list is the enabling point of this kind of link seam. If we leave shadow_roll.o out, the original version of roll is called as defined in the static library libGame.a .

We have noticed that the GNU linker for Mac OS X (tested with GCC 4.6.3) needs the shadowed function to be defined as a weak symbol; otherwise the linker always takes the symbol from the library. Weak symbols are one of the many function attributes the GNU tool chain offers. If the linker comes across a strong symbol (the default) with the same name as the weak one, the latter will be overwritten. In general, the linker uses the following rules [ Bryant10 ]:

Not allowed are multiple strong symbols.
Choose the strong symbol if given a strong and multiple weak symbols.
Choose any of the weak symbols if given multiple weak symbols.

With the to be shadowed function defined as a weak symbol, the GNU linker for Mac OS X prefers the strong symbol with our replacement code. The following shows the function declaration with the weak attribute.

  struct Die {
    __attribute__ (( weak )) int roll () const ;
  };

This type of link seam has one big disadvantage: it is not possible to call the original function anymore. This would be valuable if we just want to wrap the call for logging or analysis purposes or do something additional with the result of the function call.

Wrapping functions with GNU’s linker

The GNU linker ld provides a lesser-known feature which helps us to call the original function. This feature is available as a command line option called wrap . The man page of ld describes its functionality as follows: ‘ Use a wrapper function for symbol . Any undefined reference to symbol will be resolved to __wrap_symbol . Any undefined reference to __real_symbol will be resolved to symbol . ’

As an example, we compile GameFourWins.cpp from Listing 1. If we study the symbols of the object file, we see that the call to Die::roll – mangled as _ZNK3Die4rollEv according to Itanium’s Application Binary Interface (ABI) that is used by GCC v4.x – is undefined ( nm yields U for undefined symbols).

  $ gcc -c GameFourWins.cpp -o GameFourWins.o
  $ nm GameFourWins.o | grep roll
  U _ZNK3Die4rollEv

This satisfies the condition of an undefined reference to a symbol. Thus we can apply a wrapper function here. Note that this would not be true if the definition of the function Die::roll would be in the same translation unit as its calling origin. If we now define a function according to the specified naming schema __wrap_symbol and use the linker flag -wrap , our function gets called instead of the original one. Listing 6 presents the definition of the wrapper function. To prevent the compiler from mangling the mangled name again, we need to define it in a C code block.

extern "C" {
  extern int __real__ZNK3Die4rollEv ();
  int __wrap__ZNK3Die4rollEv () {
    // your intercepting functionality here
    // ...
    return __real__ZNK3Die4rollEv ();
  }
}

Listing 6

Note that we also have to declare the function __real_symbol which we delegate to in order to satisfy the compiler. The linker will resolve this symbol to the original implementation of Die::roll . The following demonstrates the command line options necessary for this kind of link seam.


 $ g++ -Xlinker -wrap = _ZNK3Die4rollEv -o Test test.o GameFourWins.o Die.o

Alas, this feature is only available with the GNU tool chain on Linux. GCC for Mac OS X does not offer the linker flag -wrap . A further constraint is that it does not work with inline functions but this is the case with all link seams presented in this article. Additionally, when the function to be wrapped is part of a shared library, we cannot use this option. Finally, because of the mangled names, this type of link seam is much harder to achieve by hand compared to shadowing functions.

Run-time function interception

If we have to intercept functions from shared libraries, we can use this kind of link seam. It is based on the fact that it is possible to alter the run-time linking behaviour of the loader ld.so in a way that it considers libraries that would otherwise not be loaded. This can be accomplished by the environment variable LD_PRELOAD that the loader ld.so interprets. Its functionality is described in the man page of ld.so as follows: ‘ A white space-separated list of additional, user-specified, ELF shared libraries to be loaded before all others. This can be used to selectively override functions in other shared libraries. ’ ¹ With this we can instruct the loader to prefer our function instead of the ones provided by libraries normally resolved through the environment variable LD_LIBRARY_PATH or the system library directories.

Consider we want to intercept a function foo which is defined in a shared library. We have to put the code for our intercepting function into its own shared library (e. g., libFoo.so ). If we call our program by appending this library to LD_PRELOAD as shown below, our definition of foo is called instead of the original one.

  $ LD_PRELOAD = path /to/ libFoo .so; executable

Of course this solution is not perfect yet because it would not allow us to call the original function. This task can be achieved with the function dlsym the dynamic linking loader provides. dlsym takes a handle of a dynamic library we normally get by calling dlopen . Because we try to achieve a generic solution and do not want to specify a specific library here, we can use a pseudo-handle that is offered by the loader called RTLD_NEXT . With this, the loader will find the next occurrence of a symbol in the search order after the library the call resides.

What is dlsym?

Symbols linked from shared libraries are normally automatically available. The dynamic linker loader


    ld.so

offers four library functions


    dlopen


    dlclose


    dlsym

and


    dlerror

to manually load and access symbols from a shared library. We only use


    dlsym

in this article. This function can be used to look up a symbol by a given name. The dynamic linker loader yields a void pointer for the symbol as its result.


    dlsym

has the following function prototype:

				void* dlsym (void *handle, char *symbol_name);

Note that dlfcn.h has to be included and the compiler flag


    -ldl

is necessary for linking to make this work.

As an example, consider Listing 7 which shows the definition of the intercepting function foo and the code necessary to call the original function. Note that we cache the result of the symbol resolution to avoid the process being made with every function call. Because we call a C++ function, we have to use the mangled name _Z3fooi for the symbol name. Furthermore, as it is not possible in C++ to implicitly cast the void pointer returned by dlsym to a function pointer, we have to use an explicit cast.

# include <dlfcn .h>
int foo(int i) {
  typedef int (* funPtr )( int);
  static funPtr orig = nullptr ;
  if (! orig ) {
    void *tmp = dlsym ( RTLD_NEXT , " _Z3fooi ");
    orig = reinterpret_cast <funPtr >( tmp );
  }
  // your intercepting functionality here
  return orig (i);
}

Listing 7

The advantage of this solution compared to the first two link seams is that it does not require re-linking. It is solely based on altering the behaviour of ld.so . A disadvantage is that this mechanism is unreliable with member functions, because the member function pointer is not expected to have the same size as a void pointer. There is no reliable, portable and standards compliant way to handle this issue. Even the conversion of a void pointer to a function pointer was not defined in C++03 ² .

Note that environment variables have different names in Mac OS X. The counterpart of LD_PRELOAD is called DYLD_INSERT_LIBRARIES . This needs the environment variable DYLD_FORCE_FLAT_NAMESPACE to be set.

Our IDE support for Eclipse CDT

Based on the seam types shown in this article, we have implemented Mockator. Mockator is a plug-in for the Eclipse CDT platform including a C++ based mock object library. The plug-in contains our implementation of the two refactorings Extract Interface and Extract Template Parameter to refactor towards the seam types object and compile seam.

For the preprocessor seam, our plug-in creates the necessary code as shown in Listing 4 and registers the header file with the macro in the project settings by using the -include option of the GNU compiler. Activating and deactivating of the traced function is supported by clicking on a marker that resides beside the function definition.

The plug-in supports all link seam types presented in ‘Tweak your build scripts: link seam’. The user selects a function to be shadowed or wrapped and the plug-in creates the necessary infrastructure. This includes the creation of a library project with the code for the wrapped function, the adjustment of the project settings (library paths, definition of preprocessor macros, linker options) based on the chosen tool chain and underlying platform and the creation of a run-time configuration necessary for the run-time function interception. To implement the presented link seams, we had to mangle C++ functions. Because we did not want to call the compiler and analyse the result with a tool like nm which would lead to both performance problems and unnecessary tool dependencies, we decided to implement name mangling according to the Itanium ABI in our plug-in.

For an upcoming publicly available release of Mockator, we plan to support further tool chains in Eclipse CDT beside GCC like Microsoft’s C++ compiler. Our plug-in will be bundled with our unit testing framework of choice, CUTE [ Sommerlad11 ]. Beside its sophisticated support for unit testing, CUTE will then also assist the developer in refactoring towards seams and in creating mock objects.

Conclusion and outlook

Although we are convinced that the described practices in this article are valuable especially in testing and debugging, they are not used as much as they should be. We think this is because of the large amount of manual work that needs to be done by the programmer which is both tedious and error-prone. With our Eclipse plug-in, we automate these tasks as far as possible which has the benefit that the programmer can focus on what really counts: Refactor its code to enable unit testing and finding bugs in its legacy code base.

Refactoring towards seams enables us to unit test our code. For our unit tests we sometimes want to use test doubles like fake or mock objects instead of real objects to control dependencies. In the upcoming article we will discuss how we think unit testing with mock objects should be done leveraged by the new language features C++11 give us.

References and further reading

[Bryant10] Randal E. Bryant and David R. O’Hallaron. Computer Systems: A Programmer’s Perspective . Addison-Wesley, 2nd edition, 2010.

[Driesen96] Karel Driesen and Urs Hoelzle. ‘The Direct Cost of Virtual Function Calls in C++’. SIGPLAN Not. , 31:306–323, October 1996.

[Feathers04] Michael C. Feathers. Working Effectively With Legacy Code . Prentice Hall PTR, 2004.

[Fowler99] Martin Fowler. Refactoring . Addison-Wesley, 1999.

[Gough04] Brian J. Gough and Richard M. Stallman. An Introduction to GCC . Network Theory Ltd., 2004.

[Myers04] Daniel S. Myers and Adam L. Bazinet. ‘Intercepting Arbitrary Functions on Windows, Unix, and Macintosh OS X Platforms’. Technical report , Center for Bioinformatics and Computational Biology, Institute for Advanced Computer Studies, University of Maryland, 2004.

[Meyers05] Scott Meyers. Effective C++: 55 Specific Ways to Improve Your Programs and Designs . Addison-Wesley, May 2005.

[Sutter04] Herb Sutter and Andrei Alexandrescu. C++ Coding Standards . Addison-Wesley, November 2004.

[Sommerlad11] Peter Sommerlad. ‘CUTE – C++ Unit Testing Easier’. http://www.cute-test.com, 2011.

[Thrier10] Yves Thrier. Clonewar – Refactoring Transformation in CDT: Extract Template Paramete r. Master’s thesis, University of Applied Sciences Rapperswil, 2010.

This indeed sounds like a security hole; but the man page says that LD_PRELOAD is ignored if the executable is a setuid or setgid binary.
This has changed now with C++11 where it is implementation-defined.