All together now.

Can you do several things at once? Ric Parkin tries multi-tasking.

Whats going on here?

Time is time, like what prevents an ever-rolling everything stream happening bears at all its once sons away.

Clear? Perhaps we should try a different example.

He draweth words form out the thread, the thread of his on which verbosity we string finer than the stable experiences of his argument.

Hmm, not any better. Perhaps if we try enough times, it'll eventually make some sense. Prepare to wait a long time though - there are so many possible sentences that will turn up, and the above examples are ones where I tried to make them still make some vague semblance of sense.

So where did they come from? Perhaps you have guessed: both are made from two quotations interleaved with each other:

Time, like an ever-rolling stream, bears all its sons away - Issac Watts

Time is what prevents everything happening at once - John Wheeler

and

Words form the thread on which we string our experiences - Aldous Huxley

He draweth out the thread of his verbosity finer than the staple of his argument - Shakespeare (Loves Labours Lost)

Those of a mathematical bent could calculate how many possible combinations of the sentences there are. Extrapolating further, how many possible ways can an arbitrary number of extremely long strings be interleaved? It gets to be a remarkably large number very quickly.

Now check them all.

Sounds ridiculous to even think of doing so, doesn't it?

If you haven't guessed yet from my choice of quotes, I've recent been thinking about concurrency and multithreading. Some of the reasons should be obvious - despite C++0x coming a little later than originally hoped, one of the areas which is comparatively mature is the memory model, the threading primitives and the higher level library built upon it. We're quite lucky in the ACCU as many members have contributed directly and indirectly to the design of this area, and in this issue Anthony Williams introduces us to some of the new facilities.

But mainly because I've realised that I've had to deal with concurrency quite a lot in the last few years, and this is a change. In much of my career, I've not actually had to deal with multithreading at all - for most of that time, the computers were single-cored, and the operating systems would often use a cooperative form of multitasking, so your application itself would just carry on as if it was the only thing around, occasionally letting other programs have a chance to run. Even when preemptive tasking appeared, most programs would be written as if they were single threaded, and let the operating system sort out which process would be running.

But now true mutithreaded machines are common, and the only way of speeding up our programs now that processor speed has plateaued is to find suitable ways of processing in parallel. Unfortunately this can be really difficult to do in practice, for several reasons.

Sharing mutable data is difficult to get right, and can be expensive in many ways - the locking adds expensive calls as an overhead; over-zealous locking can stall other threads, reducing any benefits and potentially re-serialising execution so that one one thread can run at a time! Also if the data is shared across processors, the versions in their caches have to be synchronised which can even lead to the odd situation where adding processors slows things down!

This last is an example of the problems of predicting which areas can be split into separate threads. It is difficult, and prone to counter-intuitive results. Part of it comes from the overheads associated with threads and locking tend to be quite expensive, and so too-fine-grained threading, locks and cross-thread communication can slow things down.

Herb Sutter has pointed out that locks and other synchronisation techniques are not composeable [ DDJ ]. That means that combining two parts that work on their own may no longer work, and you have to consider their interactions, and often the interactions between their internals. This means that you potentially get a combinatorial explosion, and building a large system becomes very difficult.

Testing is also much harder. I have already pointed out the huge number of possible execution paths, depending on the whims of the scheduler. This means that software is no longer easily deterministic - the same program on the same input can have a slightly different interleaving, and obtain locks in a different order. One will appear to work the other can deadlock or, more subtly change some timings and things start to time out. Even worse, future hardware can completely change the execution environment. Consider the difference between a single core machine interleaving threads verses a true multi-core machine where the threads are truly running at the same time. In the former the threads sees a single version of memory; in the latter they see two slightly different versions depending on whether a write by one thread has propagated to the other processor's memory cache. The can be very hard to reason about - we're used to thinking though code as if the changes happen instantaneously.

And the number of possible execution orders is extremely high, which was the point of my interleaved sentences example. So checking a threaded program by running it empirically is a pretty poor way of testing as you'll only cover a vanishingly small set of the possible combinations, and yet sometimes were are reduced to doing so. It can be useful so long as you understand the limitations: if you find a problem, great you know there's a bug, and yet it might be hard to reproduce, and even if you have logs it can be extremely hard to analyse what exactly happened. And if you don't find a problem, you haven't proved one doesn't exist as it might be that you haven't found it yet.

Unit testing can become much harder too. Trying to coordinate threads to get to known places so you can query their state is non trivial. I've seen naive programmers start a thread and sleep for a few milliseconds before checking some variable has been set, which worked fine on their machine, but on the heavily loaded build machine the thread would run too slowly and the test would fail. A solution was to add synchronising locks and semaphores on either side, and let the two sides let each only other run once they had done their work, but this was an awful lot of infrastructure to add that confused the code considerably (I've a gut feeling that there are ways of encapsulating such a technique - I'll leave it as an exercise for the reader, but would be interested in seeing any ideas).

There are ways to deal with these problems though. Making a lock to only allow one operation at a time works trivially, but would kill performance, so the real challenge is to loosen such constraints such that independent actions can occur simultaneously, but not to loosen any further. A good approach is to separate the threads of execution as much as possible, perhaps even into separate processes, and only communicate via asynchronous message passing. This helps avoid the temptation to share too much data, and gets the programmer into the habit of not expecting the results of a cross-thread request to be ready immediately, and instead go off and do something more useful in the meantime, waiting for notification that it has completed. This also reduces the number of locks that may be being used, which will reduce the locking overheads, and also help to avoid deadlocks.

Having shared state be immutable also avoids the need for many locks (which is one reason why Java and .Net have immutable string types), and which has led to an upsurge of interest in functional programming techniques. A good tip is to try and write code that has no side effects - in other words functions that rely only on their input values, and avoid aliases to mutable data. This means that there will be no possible interactions with other threads changing things under your feet. This means that you don't need any locks, and that function is simpler to reason about. It also has an effect that it's much easier to test.

Other interesting ideas include things like the MapReduce technique, used by many people such as Google to implement distributed processing [ MapReduce ].

But concurrency is not just about multithreading - in a more general sense it is about multiple 'actors' running independently yet interacting with each other and shared resources. A simple example: two processes using the file system will be having their file system calls being serialised via some sort of locking mechanism. If they are reading and writing to the same files you also now have a consistency problem, unless the processes have some way of making their changes atomically with respect to the other process. Also there can be deadlocks if multiple files with read/write locking is involved, although a well designed API should time out for you and return a failure.

Databases also provide a rich source of shared resources that can be accessed by multiple machines, and so they provide lots of locking mechanisms, such as the whole DB, individual tables or down to single rows. Designing a database that provides proper consistency and yet scales to a high number of users is non-trivial.

And the most obviously distributed system, is a distributed system built out of separate machines passing messages to each other to trigger manipulation of some resources, which need careful coordination to keep consistency without sacrificing performance. What we need to come to terms with are that these sorts of systems are already to be found in a single box, and even within a single chip. We're moving from a simplistic single-threaded world, and have to deal with the complexities of things happening all at once.

References

[DDJ] http://www.ddj.com/architect/202802983

[MapReduce] http://en.wikipedia.org/wiki/MapReduce

References

Advertisement

Advertisement

Your Privacy