pinMulti-threading in C++0x

Overload Journal #93 - October 2009 + Programming Topics   Author: Anthony Williams
Threading support is being added to C++. Anthony Williams introduces us to the new facilities.

Concurrency and multithreading is all about running multiple pieces of code in parallel. If you have the hardware for it in the form of a nice shiny multi-core CPU or a multi-processor system then this code can run truly in parallel, otherwise it is interleaved by the operating system - a bit of one task, then a bit of another. This is all very well, but somehow you have to specify what code to run on all these threads.

High level constructs such as the parallel algorithms in Intel's Threading Building Blocks [Intel] manage the division of code between threads for you, but we don't have any of these in C++0x. Instead, we have to manage the threads ourselves. The tool for this is std::thread. (For full documentation of this and the rest of the library, see my implementation at [JustThread]).

Running a simple function on another thread

Let's start by running a simple function on another thread, which we do by constructing a new std::thread object, and passing in the function to the constructor. std::thread lives in the <thread> header, so we'd better include that first (Listing 1).

    #include <thread>  
    void my_thread_func()  
    {}  
    int main()  
    {  
      std::thread t(my_thread_func);  
    }  
Listing 1

If you compile and run this little app, it won't do a lot: though it starts a new thread, the thread function is empty. What it will do is terminate with an unclean shutdown because we started a thread and then destroyed the std::thread object without waiting. Leaving that aside for a moment, let's make it do something, such as print "hello" (Listing 2).

    #include <thread>  
    #include <iostream>  
    void my_thread_func()  
    {  
        std::cout<<"hello"<<std::endl;  
    }  
    int main()  
    {  
        std::thread t(my_thread_func);  
    }  
Listing 2

If you compile and run this little application, what happens? Does it print hello like we wanted? Well, actually there's no telling. It might do or it might not. I ran this simple application several times on my machine, and the output was unreliable: sometimes it output "hello", with a newline; sometimes it output "hello"without a newline, and sometimes it didn't output anything. What's up with that? Surely a simple app like this ought to behave predictably?

Waiting for threads to finish

Well, actually, no, this app does not have predictable behaviour. The problem is we're not waiting for our thread to finish. When the execution reaches the end of main() the program is terminated, whatever the other threads are doing. Since thread scheduling is unpredictable, we cannot know how far the other thread has got. It might have finished, it might have output the "hello", but not processed the std::endl yet, or it might not have even started. In any case it will be abruptly stopped as the application exits.

If we want to reliably print our message, we have to ensure that our thread has finished. We do that by joining with the thread by calling the join() member function of our thread object (Listing 3).

    #include <thread>  
    #include <iostream>  
 
    void my_thread_func()  
    {  
      std::cout<<"hello"<<std::endl;  
    }  
 
    int main()  
    {  
      std::thread t(my_thread_func);  
      t.join();  
    }  
Listing 3

Now, main() will wait for the thread to finish before exiting, and the code will output "hello" followed by a newline every time. This highlights a general point: if you want a thread to have finished by a certain point in your code you have to wait for it. As well as ensuring that threads have finished by the time the program exits, this is also important if a thread has access to local variables: we want the thread to have finished before the local variables go out of scope. It's also necessary to avoid the unclean shutdown - if you haven't called join() or explicitly declared that you're not going to wait for the thread by calling detach(), then the std::thread destructor calls std::terminate().

Running a function object on another thread

It would be quite limiting if new threads were constrained to run plain functions without any arguments - all the information needed would have to be passed via global variables, which would be incredibly messy. Thankfully, this is not the case.

In keeping with the rest of the C++ standard library, you're not limited to plain functions when starting threads - the std::thread constructor can also be called with instances of classes that implement the function-call operator. Let's say "hello" from our new thread using a function object (Listing 4).

    #include <thread>  
    #include <iostream>  
 
    class SayHello  
    {  
    public:  
      void operator()() cons  
      {  
        std::cout<<"hello"<<std::endl;  
      }  
    };  
 
    int main()  
    {  
      std::thread t((SayHello()));  
      t.join();  
    }  
Listing 4

If you're wondering about the extra parentheses around the SayHello constructor call, this is to avoid what's known as C++'s most vexing parse: without the parentheses, the declaration is taken to be a declaration of a function calledtwhich takes a pointer-to-a-function-with-no-parameters-returning-an-instance-of-SayHello, and which returns astd::threadobject, rather than an object called t of type std::thread. There are a few other ways to avoid the problem. Firstly, you could create a named variable of type SayHello and pass that to the std::thread constructor:

      int main()  
      {  
        SayHello hello;  
        std::thread t(hello);  
        t.join();  
      }  

Alternatively, you could use copy initialization:

      int main()  
      {  
        std::thread t=std::thread(SayHello());  
        t.join();  
      }  

And finally, if you're using a full C++0x compiler then you can use the new initialization syntax with braces instead of parentheses:

      int main()  
      {  
        std::thread t{SayHello()};  
        t.join();  
      }  

In this case, this is exactly equivalent to our first example with the double parentheses.

Anyway, enough about initialization. Whichever option you use, the idea is the same: your function object is copied into internal storage accessible to the new thread, and the new thread invokes your operator(). Your class can of course have data members and other member functions too, and this is one way of passing data to the thread function: pass it in as a constructor argument and store it as a data member (Listing 5).

    #include <thread>  
    #include <iostream>  
    #include <string>  
 
    class Greeting  
    {  
      std::string message;  
    public:  
      explicit Greeting(std::string const& message_):  
        message(message_)  
        {}  
        void operator()() const  
        {  
          std::cout<<message<<std::endl;  
        }  
    };  
 
    int main()  
    {  
      std::thread t(Greeting("goodbye"));  
      t.join();  
    }  
Listing 5

In this example, our message is stored as a data member in the class, so when the Greeting instance is copied into the thread the message is copied too, and this example will print "goodbye" rather than "hello".

This example also demonstrates one way of passing information in to the new thread aside from the function to call - include it as data members of the function object. If this makes sense in terms of the function object then it's ideal, otherwise we need an alternate technique.

Passing arguments to a thread function

As we've just seen, one way to pass arguments in to the thread function is to package them in a class with a function call operator. Well, there's no need to write a special class every time; the standard library provides an easy way to do this in the form of std::bind. The std::bind function template takes a variable number of parameters. The first is always the function or callable object which needs the parameters, and the remainder are the parameters to pass when calling the function. The result is a function object that stores copies of the supplied arguments, with a function call operator that invokes the bound function. We could therefore use this to pass the message to write to our new thread (Listing 6).

    #include <thread>  
    #include <iostream>  
    #include <string>  
    #include <functional>  
    void greeting(std::string const& message)  
    {  
      std::cout<<message<<std::endl;  
    }  
    int main()  
    {  
      std::thread t(std::bind(greeting,"hi!"));  
      t.join();  
    }  
Listing 6

This works well, but we can actually do better than that - we can pass the arguments directly to the std::thread constructor and they will be copied into the internal storage for the new thread and supplied to the thread function. We can thus write the preceding example more simply as in Listing 7.

    #include <thread>  
    #include <iostream>  
    #include <string>  
    void greeting(std::string const& message)  
    {  
      std::cout<<message<<std::endl;  
    }  
    int main()  
    {  
      std::thread t(greeting,"hi!");  
      t.join();  
    }  
Listing 7

Not only is this code simpler, it's also likely to be more efficient as the supplied arguments can be copied directly into the internal storage for the thread rather than first into the object generated by std::bind, which is then in turn copied into the internal storage for the thread.

Multiple arguments can be supplied just by passing further arguments to the std::thread constructor (Listing 8).

    #include <thread>  
    #include <iostream>  
    void write_sum(int x,int y)  
    {  
      std::cout<<x<<" + "<<y<<" =  
         "<<(x+y)<<std::endl;  
    }  
 
    int main()  
    {  
      std::thread t(write_sum,123,456);  
      t.join();  
    }  
Listing 8

The std::thread constructor is a variadic template, so it can take any number of arguments up to the compiler's internal limit, but if you need to pass more than a couple of parameters to your thread function then you might like to rethink your design.

Invoking a member function on a new thread

What if you wish to run a member function other than the function call operator?

To start a new thread which runs a member function of an existing object, you just pass a pointer to the member function and a value to use as the this pointer for the object in to the std::thread constructor. (Listing 9)

    #include <thread>  
    #include <iostream>  
    class SayHello  
    {  
    public:  
      void greeting() const  
      {  
        std::cout<<"hello"<<std::endl;  
      }  
    };  
    int main()  
    {  
      SayHello x;  
      std::thread t(&SayHello::greeting,&x);  
      t.join();  
    }  
Listing 9

You can of course pass additional arguments to the member function too (Listing 10).

    #include <thread>  
    #include <iostream>  
    class SayHello  
    {  
    public:  
      void greeting(std::string const& message) const  
      {  
        std::cout<<message<<std::endl;  
      }  
    };  
    int main()  
    {  
      SayHello x;  
      std::thread t(  
         &SayHello::greeting,&x,"goodbye");  
      t.join();  
    }  
Listing 10

Now, the preceding examples both use a plain pointer to a local object for the this argument; if you're going to do that, you need to ensure that the object outlives the thread, otherwise there will be trouble. An alternative is to use a heap-allocated object and a reference-counted pointer such as std::shared_ptr<SayHello> to ensure that the object stays around as long as the thread does:

      #include <thread>  
      int main()  
      {  
        std::shared_ptr<SayHello> p(new SayHello);  
        std::thread t(&SayHello::greeting,p,"goodbye");  
        t.join();  
      }  

So far, everything we've looked at has involved copying the arguments and thread functions into the internal storage of a thread even if those arguments are pointers, as in the this pointers for the member functions. What if you want to pass in a reference to an existing object, and a pointer just won't do? That is the task of std::ref.

Passing function objects and arguments to a thread by reference

Suppose you have an object that implements the function call operator, and you wish to invoke it on a new thread. The thing is you want to invoke the function call operator on this particular object rather than copying it. You could use the member function support to call operator() explicitly, but that seems a bit of a mess given that it is callable already. This is the first instance in which std::ref can help - if x is a callable object, then std::ref(x) is too, so we can pass std::ref(x) as our function when we start the thread, as Listing 11.

    #include <thread>  
    #include <iostream>  
    #include <functional> // for std::ref
 
    class PrintThis  
    {  
    public:  
      void operator()() const  
      {  
        std::cout<<"this="<<this<<std::endl;  
      }  
    };  
    int main()  
    {  
      PrintThis x;  
      x();  
      std::thread t(std::ref(x));  
      t.join();  
      std::thread t2(x);  
      t2.join();  
    }  
Listing 11

In this case, the function call operator just prints the address of the object. The exact form and values of the output will vary, but the principle is the same: this little program should output three lines. The first two should be the same, whilst the third is different, as it invokes the function call operator on a copy of x. For one run on my system it printed the following:

      this=0x7fffb08bf7ef  
      this=0x7fffb08bf7ef  
      this=0x42674098  

Of course, std::ref can be used for other arguments too - the code in Listing 12 will print "x=43".

    #include <thread>  
    #include <iostream>  
    #include <functional>  
 
    void increment(int& i)  
    {  
      ++i;  
    }  
 
    int main()  
    {  
      int x=42;  
      std::thread t(increment,std::ref(x));  
      t.join();  
      std::cout<<"x="<<x<<std::endl;  
    }  
Listing 12

When passing in references like this (or pointers for that matter), you need to be careful not only that the referenced object outlives the thread, but also that appropriate synchronization is used. In this case it is fine, because we only access x before we start the thread and after it is done, but concurrent access would need protection with a mutex.

Protecting shared data with std::mutex

We have seen how to start threads to perform tasks 'in the background', and wait for them to finish. You can accomplish a lot of useful work like this, passing in the data to be accessed as parameters to the thread function, and then retrieving the result when the thread has completed. However, this won't do if you need to communicate between the threads whilst they are running - accessing shared memory concurrently from multiple threads causes undefined behaviour if either thread modifies the data. What you need here is some way of ensuring that the accesses are mutually exclusive, so only one thread can access the shared data at a time.

Mutexes are conceptually simple. A mutex is either 'locked' or 'unlocked', and threads try and lock the mutex when they wish to access some protected data. If the mutex is already locked then any other threads that try and lock the mutex will have to wait. Once the thread is done with the protected data it unlocks the mutex, and another thread can lock the mutex. If you make sure that threads always lock a particular mutex before accessing a particular piece of shared data then other threads are excluded from accessing the data until as long as another thread has locked the mutex. This prevents concurrent access from multiple threads, and avoids the undefined behaviour of data races. The simplest mutex provided by C++0x is std::mutex, which lives in the <mutex> header along with the other mutex types and the lock classes.

Now, whilst std::mutex has member functions for explicitly locking and unlocking, by far the most common use case in C++ is where the mutex needs to be locked for a specific region of code. This is where the std::lock_guard<> template comes in handy by providing for exactly this scenario. The constructor locks the mutex, and the destructor unlocks the mutex, so to lock a mutex for the duration of a block of code, just construct a std::lock_guard<> object as a local variable at the start of the block. For example, to protect a shared counter you can use std::lock_guard<> to ensure that the mutex is locked for either an increment or a query operation, as in Listing 13.

    #include <mutex>  
    std::mutex m;  
    unsigned counter=0;  
    unsigned increment()  
    {  
      std::lock_guard<std::mutex> lk(m);  
      return ++counter;  
    }  
    unsigned query()  
    {  
      std::lock_guard<std::mutex> lk(m);  
      return counter;  
    }  
Listing 13

This ensures that access to counter is serialized - if more than one thread calls query() concurrently then all but one will block until the first has exited the function, and the remaining threads will then have to take turns. Likewise, if more than one thread calls increment() concurrently then all but one will block. Since both functions lock the same mutex, if one thread calls query() and another calls increment() at the same time then one or other will have to block. This mutual exclusion is the whole point of a mutex.

Exception safety and mutexes

Using std::lock_guard<> to lock the mutex has additional benefits over manually locking and unlocking when it comes to exception safety. With manual locking, you have to ensure that the mutex is unlocked correctly on every exit path from the region where you need the mutex locked, including when the region exits due to an exception. Suppose for a moment that instead of protecting access to a simple integer counter we were protecting access to a std::string, and appending parts on the end. Appending to a string might have to allocate memory, and thus might throw an exception if the memory cannot be allocated. With std::lock_guard<> this still isn't a problem - if an exception is thrown, the mutex is still unlocked. To get the same behaviour with manual locking we have to use a catch block, as shown in Listing 14.

    #include <mutex>  
    #include <string>  
    std::mutex m;  
    std::string s;  
    void append_with_lock_guard(  
       std::string const& extra)  
    {  
      std::lock_guard<std::mutex> lk(m);  
      s+=extra;  
    }  
    void append_with_manual_lock(  
       std::string const& extra)  
    {  
      m.lock();  
      try  
      {  
        s+=extra;  
        m.unlock();  
      }  
      catch(...)  
      {  
        m.unlock();  
        throw;  
      }  
    }  
Listing 14

If you had to do this for every function which might throw an exception it would quickly get unwieldy. Of course, you still need to ensure that the code is exception-safe in general - it's no use automatically unlocking the mutex if the protected data is left in a state of disarray.

Flexible locking with std::unique_lock<>

Whilst std::lock_guard<> is basic and rigid in its usage, its companion class template - std::unique_lock<> - is more flexible. At the most basic level you use it like std::lock_guard<> - pass a mutex to the constructor to acquire a lock, and the mutex is unlocked in the destructor - but if that's all you're doing then you really ought to use std::lock_guard<> instead. There are two primary benefits to using std::unique_lock<> over std::lock_guard<>:

  1. you can transfer ownership of the lock between instances, and
  2. the std::unique_lock<> object does not have to own the lock on the mutex it is associated with.

Let's take a look at each of these in turn, starting with transferring ownership.

Transferring ownership of a mutex lock between std::unique_lock<> instances

There are several consequences to being able to transfer ownership of a mutex lock between std::unique_lock<> instances: you can return a lock from a function, you can store locks in standard containers, and so forth.

For example, you can write a simple function that acquires a lock on an internal mutex:

      std::unique_lock<std::mutex> acquire_lock()  
      {  
        static std::mutex m;  
        return std::unique_lock<std::mutex>(m);  
      }  

The ability to transfer lock ownership between instances also provides an easy way to write classes that are themselves movable, but hold a lock internally, such as Listing 15:

    #include <mutex>  
    #include <utility>  
    class data_to_protect  
    {  
    public:  
      void some_operation();  
      void other_operation();  
    };  
    class data_handle  
    {  
    private:  
      data_to_protect* ptr;  
      std::unique_lock<std::mutex> lk;  
      friend data_handle lock_data();  
      data_handle(data_to_protect* ptr_,  
         std::unique_lock<std::mutex> lk_):  
         ptr(ptr_),lk(lk_)  
      {}  
 
    public:  
      data_handle(data_handle && other):  
         ptr(other.ptr),lk(std::move(other.lk))  
      {  
        other.ptr=0;  
      }  
      data_handle& operator=(data_handle && other)  
      {  
        if(&other != this)  
        {  
          ptr=other.ptr;  
          lk=std::move(other.lk);  
          other.ptr=0;  
        }  
        return *this;  
      }  
      void do_op()  
      {  
        ptr->some_operation();  
      }  
      void do_other_op()  
      {  
        ptr->other_operation();  
      }  
    };  
 
    data_handle lock_data()  
    {  
      static std::mutex m;  
      static data_to_protect the_data;  
      std::unique_lock<std::mutex> lk(m);  
      return data_handle(&the_data,std::move(lk));  
    }  
 
    int main()  
    {  
      data_handle dh=lock_data(); // lock acquired
      dh.do_op();                 // lock still held
      dh.do_other_op();           // lock still held
      data_handle dh2;  
      dh2=std::move(dh);          // transfer lock to
                                  // other handle
      dh2.do_op();                // lock still held
    }                             // lock released        
Listing 15

In this case, the function lock_data() acquires a lock on the mutex used to protect the data, and then transfers that along with a pointer to the data into the data_handle. This lock is then held by the data_handle until the handle is destroyed, allowing multiple operations to be done on the data without the lock being released. Because the std::unique_lock<> is movable, it is easy to make data_handle movable too, which is necessary to return it from lock_data.

Though the ability to transfer ownership between instances is useful, it is by no means as useful as the simple ability to be able to manage the ownership of the lock separately from the lifetime of the std::unique_lock<> instance.

Explicit locking and unlocking a mutex with a std::unique_lock<>

As we saw in earlier, std::lock_guard<> is very strict on lock ownership - it owns the lock from construction to destruction, with no room for manoeuvre. std::unique_lock<> is rather lax in comparison. As well as acquiring a lock in the constructor as for std::lock_guard<>, you can:

  • construct an instance without an associated mutex at all (with the default constructor);
  • construct an instance with an associated mutex, but leave the mutex unlocked (with the deferred-locking constructor);
  • construct an instance that tries to lock a mutex, but leaves it unlocked if the lock failed (with the try-lock constructor);
  • if you have a mutex that supports locking with a timeout (such as std::timed_mutex) then you can construct an instance that tries to acquire a lock for either a specified time period or until a specified point in time, and leaves the mutex unlocked if the timeout is reached;
  • lock the associated mutex if the std::unique_lock<> instance doesn't currently own the lock (with the lock() member function);
  • try and acquire lock the associated mutex if the std::unique_lock<> instance doesn't currently own the lock (possibly with a timeout, if the mutex supports it) (with the try_lock(), try_lock_for() and try_lock_until() member functions);
  • unlock the associated mutex if the std::unique_lock<> does currently own the lock (with the unlock() member function);
  • check whether the instance owns the lock (by calling the owns_lock() member function;
  • release the association of the instance with the mutex, leaving the mutex in whatever state it is currently (locked or unlocked) (with the release() member function); and
  • transfer ownership between instances, as described above.

As you can see, std::unique_lock<> is quite flexible: it gives you complete control over the underlying mutex, and actually meets all the requirements for a Lockable object itself. You can thus have a std::unique_lock<std::unique_lock<std::mutex>> if you really want to! However, even with all this flexibility it still gives you exception safety: if the lock is held when the object is destroyed, it is released in the destructor.

std::unique_lock<> and condition variables

One place where the flexibility of std::unique_lock<> is used is with std::condition_variable. std::condition_variable provides an implementation of a condition variable, which allows a thread to wait until it has been notified that a certain condition is true. When waiting you must pass in a std::unique_lock<> instance that owns a lock on the mutex protecting the data related to the condition. The condition variable uses the flexibility of std::unique_lock<> to unlock the mutex whilst it is waiting, and then lock it again before returning to the caller. This enables other threads to access the protected data whilst the thread is blocked. A full discussion of condition variables is a complete article in itself, so we'll leave it for now.

Other uses for flexible locking

The key benefit of the flexible locking is that the lifetime of the lock object is independent from the time over which the lock is held. This means that you can unlock the mutex before the end of a function is reached if certain conditions are met, or unlock it whilst a time-consuming operation is performed (such as waiting on a condition variable as described above) and then lock the mutex again once the time-consuming operation is complete. Both these choices are embodiments of the common advice to hold a lock for the minimum length of time possible without sacrificing exception safety when the lock is held, and without having to write convoluted code to get the lifetime of the lock object to match the time for which the lock is required.

For example, in the following code snippet the mutex is unlocked across the time-consuming load_strings() operation, even though it must be held either side to access the strings_to_process variable (Listing 16).

    std::mutex m;  
    std::vector<std::string> strings_to_process;  
    void update_strings()  
    {  
      std::unique_lock<std::mutex> lk(m);  
      if(strings_to_process.empty())  
      {  
        lk.unlock();  
        std::vector<std::string>  
           local_strings=load_strings();  
        lk.lock();  
        strings_to_process.insert(  
           strings_to_process.end(),  
           local_strings.begin(),  
           local_strings.end());  
      }  
    }  
Listing 16

Note that here we are relying on update_strings() being the only function that can add strings to the list, and that it is only run on one thread - if it may be called from multiple threads concurrently then we need to ensure that load_strings() is itself thread-safe, and that the behaviour is as desired. For example, if you only want one thread to call load_strings() then additional checks may be required. In general, if you unlock a mutex then you need to assume that the protected data has changed when you acquire the lock again.

Summary

In C++0x, you manage threads with the std::thread class. There are a variety of ways of starting a thread, but only one way to wait for it to finish - the join() member function. If you forget to join your threads then the runtime library will remind you by forcibly terminating your application.

Once you've got your threads up and running, you need to ensure that any shared data is correctly synchronized, and the most basic way to do that is with a mutex such as std::mutex. The safest way to lock a mutex is with an instance of std::lock_guard<>, though occasionally the flexibility of std::unique_lock<> might be needed.

Mutexes aren't the only way to synchronize data in C++0x, and there are other ways of acquiring locks than just std::lock_guard<> and std::unique_lock<>, but those will have to wait for another time.

References

[Intel] http://www.threadingbuildingblocks.org/

[JustThread] http://www.stdthread.co.uk/doc/

Overload Journal #93 - October 2009 + Programming Topics