Supporting Threads in Standard C++ (Part 2)

The first part of this series of articles on threads addressed what I called the function signature mismatch problem. The thread library needs to allow an arbitrary user-defined function to be run asynchronously with respect to other threads. That function could have any number of parameters of any type, but the thread library can only accept a function (or function object) with a simple, fixed interface.

My solution to this first design challenge was called Thread-Runs-Polymorphic-Object . The key feature of this design is that the thread management facilities provided by a suitable library are separated from the application-specific code. This is achieved by having both a concrete thread class and an abstract thread function class. The thread class defines the thread management facilities provided by the library and the thread function class defines a function object interface for the library's users.

Figure 1 - Thread-Runs-Polymorphic-Object

// thread management facilities
class thread
{
public:
  struct function;
  thread (function&);
  . . .
};

// interface for application-specific code
struct thread::function
{
  virtual ~function() {}
  virtual int run() = 0;
};

In this instalment I would like to present a fairly complete implementation of a thread library based on the ideas presented in these articles. First, though, there is another design issue to address.

The Second Design Challenge

Figure 2 shows the sample code in the previous article fleshed out to an almost complete program. The main() function creates a new thread, uses the child thread to print a file, performs some other processing in the main thread, waits for the child thread to terminate and exits.

Figure 2a - Background printing example.

#include "thread.h"
#include "printer.h"

// concrete thread function class
class print_function : public thread::function
{
public:
  print_function (Printer& p, std::string n)
    : printer(p), file_name(n) {}

Figure 2b - Background printing example (cont).

private:
  virtual int run()
  {
    printer.print(file_name);
    return 0;
  }
  Printer&    printer;
  std::string file_name;
};

int main()
{
// Start printing in the background.
  Epson_Stylus printer;
  std::string  file_name("printer_test.txt");
  print_function print_file(printer,file_name);
  thread print_thread(print_function);

// Execute some foreground actions...

// Wait for the background thread to finish.

  print_thread.wait();

  return 0;
}

Note that the main thread creates, stores and destroys the thread object representing the child thread. But what if the main() function were to return before the child thread has finished? If the thread object truly represents a thread of execution, destroying the thread object should terminate the child thread. This makes it difficult to support designs in which the child outlives its parent (Posix detached threads, for example).

Of course, the parent thread could create the thread object on the heap and throw away the pointer. But some thread must still destroy the orphaned child - otherwise we have a memory leak. The child can not destroy its own thread object because, in general, it will not know if some other thread is about to use that object to change its priority or read its termination status, perhaps. In fact, no thread can take sole responsibility for destroying the child's thread object for exactly the same reason - some other thread could be about to use the thread object.

Reference-Counting

What we need is shared responsibility. The last thread that needs to access the orphaned child's thread object should destroy it. Regular readers of Overload will, no doubt, recognise this problem. The tried and tested solution is the Handle-and-Reference-Counted-Body design pattern. The handle part of this pattern is an example of a smart pointer; the overall pattern is a particular case of the Proxy pattern [ GoF ].

James Coplien has catalogued several variations on the Handle/Body pattern [ EuroPLoP ] and Scott Meyers describes smart pointers and reference counting in More Effective C++ [ Meyers ].

The skeleton C++ code in Figure 3 through Figure 5 shows one way of implementing this reference-counting technique. For another approach see the Boost Website [ boost ].

Figure 3 - Reference-Counted Body

// A thread body with shared ownership
class body
{
public:
  body(function&);
  friend class handle;
// ... thread functions ...
private:
  unsigned handle_count;
};

body::body (function& fn)
    : handle_count(0)
{
//... create a thread ...
}

In this case, the body class represents a thread that executes the function object passed to its constructor. I have provided both public and private members to show separate interface and implementation sections; in practice, though, the body class is entirely hidden from client code.

The handle class creates, destroys and performs all operations on body objects. Client code performs thread operations indirectly via a suitable handle. And, since I have made handle a friend of body, the body class need not have any public methods at all.

Figure 4 - Handle Interface

// A thread handle that shares ownership
// of a 'body'
class handle {
public:
  handle(function&);
  ~handle() throw();

  handle (const handle&);
  handle& operator= (const handle&);

// ... thread functions ...

private:
  body* shared_body;
};

A handle is a small object with 'value' semantics. That is, it can be created, copied and destroyed using the same syntax as the built-in types. It is intended to be passed to functions by value.

I have chosen to make the handle class a true Proxy in that all operations on threads are provided through forwarding member functions of handle . These are not shown here; they will be described later.

Each handle points to a body and the body holds a count of the number of handle s pointing to it. When a new handle is created the body 's handle count is incremented; when a handle is destroyed the body 's handle count is decremented. When the last handle is destroyed the body is destroyed, too.

Figure 5 - Handle Implementation

handle::handle(function& fn)
            : shared_body(new body(fn))
{
  ++shared_body->handle_count;
}
handle::~handle() throw()
{
  if (--shared_body->handle_count == 0)
    delete shared_body;
}
handle::handle (const handle& old_handle)
  : shared_body(old_handle.shared_body)
{
  ++shared_body->handle_count;
}
handle& handle::operator= 
            (const handle& other_handle)
{
  this->~handle();
  new(this) handle(other_handle);
  return *this;
}

The handle 's primary constructor creates a body on the heap (passing a function object) and increments the handle count . Thus, a single operation creates both a body and its first handle . The handle 's destructor decrements the body 's handle count and, if the count becomes zero, destroys the body.

The handle 's copy constructor increments the body 's handle count . The assignment operator literally destroys the existing handle and creates a new one that is a copy of the handle on the right hand side of the assignment expression. If the use of the destructor and placement new offends you, these can be replaced by auxiliary functions ( destroy() and create() , perhaps).

Thread Priorities

When designing a library component it is often difficult to reconcile the need for a simple generic interface with the desire to provide the full power of the underlying operating system. Thread priority schemes provide a simple example. All the thread libraries I've looked at use integers to represent thread priorities, but the int type is platform-dependent and there is no guarantee that all integer values are valid thread priorities. Under Win32, for example, there are only seven valid values for the priority parameter of the SetThreadPriority() API call. So, how can we make it possible for our thread library to use the much wider range of priorities available under Posix and real-time operating systems, while preventing the user from specifying invalid values under Win32?

I know of no general solution to these kinds of problem, but there is often a particular solution in individual cases. I usually try to generalise the problem first and look for a solution to the general problem. If that works we are home and dry. In this case, the right question to ask seems to be, "why do we have priorities?". And the answer has to be, "to control the scheduling algorithm". So, if we could provide our own scheduling algorithm it would be possible to achieve complete control of our threads. Unfortunately, operating systems don't usually provide that sort of hook, so I quickly abandoned the idea of overriding the operating system's built-in scheduler.

On the other hand, with suspend and resume functions we can take control of our threads. The operating system may still interfere to some extent (Win32 applies a dynamic boost at times, for example), but in principle these simple functions allow us to control the length of a time-slice, the proportion of CPU time allocated to individual threads, and so on. We can write our own scheduler, although such a scheduler is likely to be rather crude and inefficient compared with the operating system's native scheduler.

So it seems to me that a thread library needs to provide suspend and resume functions for special scheduling needs and priority control functions for the normal case, in which the operating system uses a thread's priority as a parameter to its internal scheduling algorithm. This brings us back to priority values. If the valid values are defined by the operating system's scheduler, how can we support priorities in a platform-independent thread library?

The best answer I can come up with is to define platform-dependent constants for the default, minimum and maximum priority values. It is possible for the library to provide platform-independent access to these values. Applications with stringent requirements for thread priorities would need to have a flexible strategy in order to maintain platform-independence. Less fussy applications can use a small set of priority values or simply accept being restricted to particular platforms.

Thread Library Interface

Putting it all together, here is my initial vision of a thread library suitable for use in standard-conforming programs. Figure 6 shows the thread namespace , Figure 7 shows the thread function class and Figure 8 shows the thread handle class. The definition of the thread body class is part of the implementation and is of no concern to the user of the library.

Figure 6 - The Thread Namespace

namespace thread {
  class handle;
  class body;
  struct function;
  typedef std::string name_type;
  typedef int priority_type;
  extern const priority_type minimum_priority;
  extern const priority_type default_priority;
  extern const priority_type maximum_priority;
  void sleep (unsigned long mS);
}

The thread namespace declares some types, some constants and one global function. These are all natural candidates for a namespace. Note, however, that there is no thread class, which seems a bit odd. I did consider re-naming handle as 'thread', but I felt that 'handle' was better because it strongly suggests a small, value-like object and there may be many handles to a single thread. A better case could be made for re-naming body , but I still like thread::body rather than thread::thread .

The global function is topical. The current issue of the C/C++ Users Journal [ Journal ] carries an article by Scott Meyers in which he argues that encapsulation is enhanced by using global functions rather than static member functions when there is a choice. This surprised some readers and triggered a debate on the accu-general e-mail list. My reason for choosing a global function here is more mundane - I believe users would find thread::sleep(1000) more intuitive than thread::handle::sleep(1000) . However, I might as well claim that it also provides greater encapsulation - it might earn me extra Brownie points :)

Figure 7 - The Thread Function Interface Class

struct thread::function {
  virtual ~function() {}
  virtual int operator() () = 0;
};

As discussed in the previous article, the thread function class defines a function object interface. Application code is required to define a derived class that implements the function call operator, create an object of the concrete type and pass it to the thread library. It is up to the library to create a new thread which will execute the code in the user-defined operator() function.

Figure 8 - The Thread Handle Class

class thread::handle {
public:
  handle (function&);
  handle ();
   ~handle () throw();
  handle (const handle&);
  handle& operator= (const handle&);
  void start();
  void wait();
  int status() const;
  name_type name() const;
  void set_name (name_type);
  priority_type priority() const;
  void set_priority (priority_type);
  void suspend();
  void resume();
private:
  body* shared_body;
};

The thread::handle class defines most of the interface supported by the thread library. Note that there are no platform-specific items in the class declaration, so no platform-specific header is needed.

The default constructor is used to create a handle to the current thread. This handle can be used to get and set the attributes of the current thread and can be used by other threads just like any other handle. The other constructors, destructor and assignment operator were discussed earlier. Together this group of functions manages the reference counting.

Threads are created in the suspended state so that their attributes can be set before the thread function starts. The start() function starts the thread function in the same way that resume() re-starts the thread function. In fact, in my Win32 implementation, start() and resume() are identical. The wait() function suspends the current thread until the thread associated with the handle has terminated. The status() function returns the termination status of the target thread.

Named threads are unusual, but Java threads have names and we created something similar at work. I have included names because they seem to be useful for diagnostics, particularly when an exception has been thrown and the application code has failed to catch it. (This is a bug, of course, but we realise that we are not perfect.) The thread library has get and set functions for the thread name. By default the name is the empty string.

The other attribute provided by the thread library is a priority. There are get and set functions for the priority. As mentioned earlier, the thread namespace defines minimum, maximum and default priority values.

Finally, there are the suspend() and resume() functions that are intended for application-defined scheduling policies. I have tentatively chosen the simplest possible interface for these functions, but there may be better options. Under Win32 the suspend and resume functions respectively increment and decrement a suspend count. The thread is suspended if the suspend count is greater than zero. Both functions return the previous suspend count (if they succeed). As yet I have been unable to find a compelling reason for providing such behaviour, but I'm sure the Win32 designers knew what they were doing.

The library reports errors by throwing standard runtime_error exceptions. In the current implementation the message string contains the name of the thread, the name of the function that failed and a message obtained from the operating system describing the reason for the failure.

Taking Stock

The thread library presented here provides most of the 'core' facilities listed in the previous article. Although it would be possible to provide a means for terminating the thread function I see this as a job for the application rather than the library. A 'suspend until ...' function would be a useful addition, but this takes us beyond threads into the realms of other kinds of "waitable object" as Win32 calls them.

Although not included in my original list of core facilities, the library should also provide thread synchronisation mechanisms such as mutexes, critical sections, and scoped lock objects. Apart from these, the most important remaining omission is, perhaps, access control facilities. This is a topic in itself and it's not something I know much about, so I will not address it here.

So far, I have described a Thread-Runs-Polymorphic-Object model in which a thread object executes a function object just once. Our experience at work suggests that this model is not quite as easy to use as we had expected. In the next and final article in this series I would like to look at ease-of-use issues and propose a more application-oriented threading model in which a thread executes a sequence of function objects. In the meantime, I would welcome your comments.

References

[GoF] " Design Patterns - Elements of Reusable Object-Oriented Software ", by Erich Gamma, Richard Helm, Ralph Johnson and John Vlissides, ISBN 0-201-63361-2.

[EuroPLoP] http://www.bell-labs.com/user/cope/ Patterns/C++Idioms/EuroPLoP98.html

[Meyers] More Effective C++ , by Scott Meyers, ISBN 0-201-63371-X.

[boost] http://www.boost.org .

[Journal] C/C++ Users Journal , February 2000.