Concurrency has many different approaches. Lucian Radu Teodorescu clarifies terms, showing how different approaches solve different problems.
Most engineers today use concurrency – often without a clear understanding of what it is, why it’s needed, or which flavour they’re dealing with. The vocabulary around concurrency is rich but muddled. Terms like parallelism, multithreading, asynchrony, reactive programming, and structured concurrency are regularly conflated – even in technical discussions.
This confusion isn’t just semantic – it leads to real-world consequences. If the goal behind using concurrency is unclear, the result is often poor concurrency – brittle code, wasted resources, or systems that are needlessly hard to reason about. Choosing the right concurrency strategy requires more than knowing a framework or following a pattern – it requires understanding what kind of complexity you’re introducing, and why.
To help clarify this complexity, the article aims to map out some of the main flavours of concurrency. Rather than defining terms rigidly, we’ll explore the motivations behind them – and the distinct mindsets they evoke. While this article includes a few C++ code examples (using features to be added in C++26), its focus is conceptual – distinguishing between the flavours of concurrency. Our goal is to refine the reader’s taste for concurrency.
The core idea
Concurrency is about dealing with multiple activities that overlap in time. It is dealing with complexity on the time axis. Mathematically, concurrency corresponds to a strict partial ordering of work items; this is different from sequential execution where we have a (strict) total ordering of work items.
That is, if we have two distinct work items a and b, then, at execution time, we might have the following possibilities:
- a < b
- b < a
- neither of the above – the execution of the two work items overlap
By comparison, in sequential execution , either a happens before b, or b happens before a – they cannot overlap. Thus, concurrent execution adds more complexity by adding a degree of freedom in the execution time axis.
We use the < symbol to represent the happens-before relation between work items. Although the literature often denotes this relation with → [Lamport78], using < better emphasises that we are discussing the ordering between work items.
This is it! That’s what concurrency is all about.
Understanding concurrency as partial ordering helps us see why it’s hard and why it matters. In the next section, we explore what motivates us to introduce this complexity in the first place.
Motivation
We would not introduce the complexities that come with concurrency if we didn’t gain anything from it. There are multiple reasons for why we would want to use concurrency. The biggest classes of reasons are:
- Responsiveness: keep the system interactive.
- Performance: do more work or do it faster; this can be divided into:
- improving latency: completing work faster;
- improving throughput: doing more work per time unit.
- Decomposition: breaking the problem into independent parts.
- Coordination: handling many independent inputs or events.
Each of these motivations calls for a different way of thinking about concurrency – a different flavour, if you will.
These might be hard to grasp, so let’s illustrate them using a real-world analogy: a restaurant – The Burrs – owned by Charlie. If we were to consider a non-concurrent world, we would have just one employee (Charlie) that runs all the activities in the restaurant (greeting customers, taking orders, cooking, bringing food, taking payment, etc.).
One could model in code this situation as shown in Listing 1. All the activities are executed sequentially, in order. As one could imagine running the restaurant this way would be a complete disaster. As soon as the restaurant has two customers, one customer will be starved – they would have to wait for the first customer to be completely served. No clever scheduling within a single thread can compensate for the need to serve multiple customers simultaneously. No matter how good Charlie would be at his job, this organization of work will not work.
void run_restaurant() {
while (restaurant_is_open) {
for (auto c: new_customers) {
sit_customer(c);
}
for (auto c: seated_customers) {
auto o = take_order(c);
auto food = cook(o);
serve(c, food);
wait_for_completion(c);
auto b = print_bill(c);
cash_in(c, b);
}
}
}
|
| Listing 1 |
Responsiveness
Let’s look first at responsiveness. If there is one person that is doing all the job, then customers might need to wait for long periods of time for different activities. Imagine that Charlie is cooking for 30 minutes, and customers cannot be seated while cooking is done. That would make new customers run away and never come back.
In the restaurant world, it would be fair to have a rule like this: if there are empty tables, a new customer is seated in less than 2 minutes. In the software world, this translates into a requirement.
In order to implement such a rule, the restaurant needs to have a lead host that doesn’t perform activities that can’t be interrupted for more than 2 minutes. This ensures that the lead host can always greet customers within 2 minutes. To keep the lead host (let’s assume it’s our friend Charlie) available for new customers, the rest of the duties need to be performed by other employees. Thus, we have more than one employee, so different activities might overlap – we’ve introduced concurrency in the restaurant.
A possible encoding is shown in Listing 2. Here we are using coroutines (more specifically C++26’s std::task [WG21Task]) to encapsulate concurrency concerns. The body of the lead_host coroutine can be executed concurrently with serve_flow, and multiple serve_flow may coexist at the same time. The spawn function will start the work described by serve_flow and will associate its completion with serving_scope, which keeps track of how many serving flows are still in progress. Through serving_scope, we use a counting_scope [WG21Scope] to ensure that we maintain a structured lifetime for all serving flows, and we can perform a clean shutdown (i.e., we cannot close the restaurant for the day while we still have customers serving dinner).
std::task<> serve_flow(Customer, Table);
exec::counting_scope serving_scope;
std::task<> lead_host() {
while (restaurant_is_open()) {
auto event = co_await lead_host_events();
if (event.type() == new_customer_arrived) {
std::print( "Good evening, and welcome to The Burrs. I'm Charlie Burr and I will be your host tonight. Allow me to confirm your table.");
auto c = event.new_customer();
auto r = get_or_create_reservation(c);
auto t = acquire_table(r);
// The serving flow is done by dedicated
// personnel
exec::spawn(serve_flow(c, t),
serving_scope);
}
else {
// other host and leadership duties
}
}
}
|
| Listing 2 |
Here, Charlie acts as the system’s reactor – the component that must remain unblocked and responsive – while serve_flow tasks represent background operations that can take longer but must not interfere with incoming events.
The same responsiveness principle applies beyond restaurants: in interactive software, we must ensure that the part of the system responsible for front-line interaction remains reactive and free to respond. This is especially evident in UI frameworks, where the main thread must remain responsive to user input. If a UI freezes while processing data or waiting on I/O, the system appears broken, even if it’s functioning correctly in the background.
Improving latency
Let’s assume that the average time to cook a dish in the restaurant is 30 minutes. If there are four people at a table, and we want to bring the food to the table all at once, we don’t want to wait for 2 hours before bringing the food to the table. We want to keep the waiting time for the table to be around 30 minutes, regardless of the number of people at the table. For that, we could prepare the 4 dishes concurrently.
One way to prepare multiple meals simultaneously is to hire more cooks, each handling a separate dish. Another alternative would be to perform ‘cooperative cooking’. That is, a single cook can prepare more than one dish at the same time, by shifting their attention to different dishes and sequencing some of the activities that require their full attention. More on this later.
Improving latency often requires doing more work overall.
Let’s return to our restaurant analogy to illustrate this concept. If The Burrs restaurant has soup on the menu, then it needs to have a house-made base stock (with one or multiple flavours) that is prepared at the beginning of the day, from which the actual soups on the menu are derived. Preparing a tripe soup usually takes between 3.5 to 5 hours (and, in some cases, even more) from start to finish – obviously the restaurant cannot start this process when the order is placed. But, in preparing the soup at the beginning of the day, we intentionally overproduce – accepting waste in exchange for responsiveness. In other words, optimising for latency can introduce inefficiencies elsewhere.
Another example is buying all the ingredients in bulk ahead of time. No matter how good the planning is, a typical restaurant often buys more ingredients that it needs, and some of those are just waste (they can’t be reused).
This contrasts with throughput-oriented design, where batching and reuse are prioritised over quick responses.
This principle carries over directly to software: to reduce latency, we often do more total work. A classic example is the parallel prefix sum algorithm [Wikipedia]. For a given array of numbers xi, the algorithm will compute an array yi, such as yi = Σij=0 xj. For example, for input array 1, 2, 3, 4, 5, the algorithm computes: 1, 3, 6, 10, 15. The sequential algorithm can be implemented with a simple for loop, and will execute n additions.
Parallelising this algorithm using Blelloch’s method [Blelloch90] yields a total work of 2n-2 additions, but having enough threads can lead to finishing the algorithm in 2 log n steps. Thus, we do more work to (hopefully) finish faster. Note: we completely ignored here the effects of synchronization, which make the parallel algorithm work worse in practice.
In the literature, this is often described as the difference between the work and the span. These two measures capture complementary aspects of execution. The work captures the sum of all the operations that need to be completed for an algorithm. The span (also called critical path length or depth) represents the longest chain of dependencies between the operations – the minimum time to execute the algorithm if we would have an infinite number of processors.
Improving throughput
Improving latency is not always the goal. Often, if we improve latency too much, we decrease throughput – the number of operations we can make in the unit of time. In Charlie’s restaurant, from a business perspective, it’s not that important to serve one customer in the least amount of time; it’s more important to maximise the number of customers served per day.
Let’s assume that Charlie hires three cooks. Let’s assume that, if a cook dedicates their entire attention to a dish, then they can finish up the dish in 25 minutes – the best time we can get. But, this arrangement means that The Burrs cannot serve more than 3 customers at once – one cook per customer. If a cook can divide their attention among several dishes (say, 5 at a time), each dish might take 30 minutes – but the restaurant could now serve 15 customers at once. It makes much more sense to increase the latency from 25 to 30 minutes, but increase the throughput from 3 to 15 customers.
In this example, we’ve introduced two levels of concurrency to be able to cook for 15 customers at once: we have more than one cook, and each cook can work on more than one dish at a given time.
Likewise in software, introducing concurrency – and adjusting how work is structured – can significantly improve throughput.
In CPU-bound applications, maximising throughput means avoiding oversubscription. That is, if the machine has 12 cores, we should avoid running more than 12 CPU-intensive threads in parallel. Creating more threads forces the scheduler to constantly switch between them, reducing overall performance. On a single core, it’s usually faster to run two CPU-intensive tasks sequentially than running them on two threads. For some engineers this may seem counterintuitive, but running these two operations sequentially actually improves throughput.
This is the main reason behind system-wide thread pools. It’s best for an application to submit all the CPU intensive work to a thread pool with a number of threads that matches the number of cores of the target machine. Other operations (e.g., I/O) can be put on a different thread pool.
In the upcoming C++26 execution model, there are two facilities that are targeting scenarios that require high throughput. One is parallel_scheduler [WG21ParSched], a global thread pool that aims at limiting oversubscription. The other one is the bulk family of sender algorithms, which aim at executing work concurrently.
Listing 3 provides a simple example of using these two facilities. For the job of chopping onions, the productivity scales linearly up with the number of workers. But each worker should chop only one onion at a time. If they try to handle multiple simultaneously – constantly switching between them – throughput drops.
void chop_all_onions(std::vector<Onion> onions) {
auto sched = exec::get_system_scheduler();
auto snd = exec::schedule(sched)
| exec::bulk(onions.size(), [=](size_t i) {
chop_one_onion(onions[i]);
});
exec::sync_wait(std::move(snd));
}
|
| Listing 3 |
Decomposition
Sometimes, we introduce concurrency not to boost performance, but simply to make the application easier to understand. In these cases, it’s often easier to think in terms of independent agents than to break the system into functional units.
Charlie might soon realise that, while one person can fill multiple roles, specialisation still helps. Specialisation enables clearer accountability, stronger role focus, better performance, and a more manageable business. The restaurant might have a manager, lead host, servers, bartenders, bussers, chefs, sous chef, cooks, dishwashers, etc. While roles can overlap, specialisation makes the system easier to reason about and manage.
In the previous example we had a split by role. Alternatively, we can imagine decomposing by location – say The Burrs becomes a chain with multiple restaurants. While a centralised manager is possible, there are clear advantages to assigning a manager to each location. A local manager can address specific issues relevant to that location, without misapplying solutions that worked elsewhere.
In this way, concurrency helps us focus on specific concerns – and structure systems in clearer, more maintainable ways.
A good example from the software industry is a web server. While it’s possible to write a web server with just one thread, giving each request its own thread often makes the logic easier to follow. Reasoning becomes simpler when each request runs independently, rather than interleaving unrelated tasks on a single thread.
Coordination
Sometimes, the temporal complexity of an application comes from the way it reacts to a variety of stimuli: events, inputs, messages. In such cases, the main concerns are correctness, consistency, and the timeliness of those responses. And concurrency helps in this case as well. While decomposition focuses on breaking work into pieces, coordination focuses on handling interactions between concurrent entities. Coordination is fundamentally about reactivity and synchronisation. This mindset forces us to think about causality, state changes, race conditions, and timing constraints.
In a restaurant business, a good example of using concurrency for coordination is the system put in place to ensure that the customer is served. The host might greet the customer and lead them to an empty table. Occupying an empty table is a signal to one of the waiters that they need to take care of the new customer (the table is a fixed shared structure on which both the host and the waiters interact). After taking the order, the waiter might lead a note to the kitchen with what dishes need to be prepared – this is done through a system that resembles a queue. The cooks are either pooling the queue, or they get notified whenever there is a new entry in the queue. After taking an order and cooking the needed dishes, the cooks typically signal back the waiters that the food is prepared (maybe with a note to indicating which order was cooked). The waiter needs to bring the food back to the customer, without mixing the table. When the customer finishes, the waiter will provide the bill to the customer and accept payment. Depending on the local customs and on the type of restaurant this process might be different, but it’s important that a customer pays for the food that they ordered.
In this restaurant system, multiple types of events occur independently of each other: customers can come into the restaurant, customers can order food, food is prepared, customers ask for the bill, customers paying, etc. While the flow must remain customer-centric, concurrency ensures that all these events are handled in a timely manner and that the overall process remains manageable.
The same coordination challenges arise when building software. Taking the previous example of a web server, let’s assume that handling one request implies communicating with multiple downstream services. Whether a request is handled on a single thread matters less than ensuring that its outgoing calls are matched correctly to their responses, and that continuations run as expected – even across threads. In terms of implementation, we can still handle one request on one thread, but this has some challenges: the thread will be blocked until the responses are received (and that might take a long time), and also we might have a limit on the number of threads we can create.
Another way to structure this type of problem is to use thread pools and letting the continuation to a response be handled on a different thread than the one used to make the request. Listing 4 shows how this can be implemented really easily with the help of coroutines.
std::task<MyResponse> request_handler(Request r) {
validate(r);
initial_processing(r);
auto oc = prepare_outgoing_call(r);
auto res = co_await outgoing_call(oc);
continue_processing(r, res);
co_return generate_response(r);
}
|
| Listing 4 |
In a thread-per-request model, it’s easy to associate state with the thread. But once we switch to shared thread pools, we need to explicitly carry context (user ID, request ID, continuation function, etc.). This shift – from implicit context to explicit coordination – is one of the hardest conceptual changes in writing concurrent code.
Concurrency becomes essential not to improve speed, but to preserve correctness in the face of overlapping stimuli.
To support such coordination patterns, modern languages and platforms offer a variety of frameworks/models:
- Event loops (e.g., GUI frameworks, Node.js)
- Actor models (e.g., Erlang, Akka)
- Reactive streams (e.g., Rx, ReactiveX)
- Async coroutines and futures (e.g.,
std::task)
Common distinctions in concurrency discussions
Now that we have covered the main types of motivation for adding concurrency to applications, let’s cover a few distinctions that are often made in the industry.
Concurrency vs parallelism
Concurrency and parallelism are often used interchangeably – but they describe different aspects of computation:
- Concurrency is about structure: tasks may overlap in time, but they do not necessarily execute simultaneously.
- Parallelism is about execution: tasks actually run at the same time.
Think of concurrency as a design-time concern – it’s about structuring systems to handle overlapping work. Parallelism, by contrast, is a run-time reality – it depends on physical resources and actual scheduling.
A common example is that interleaved execution (on a single core) is concurrency, but not parallelism.
Ultimately, this distinction may not always be useful in practice. For most programming environments, there is little we can control about the actual execution of work items. The compiler can reorder instructions, and even the hardware will execute instructions out of order. Even on a single thread, there are cases when instructions are executed in parallel. It is definitely useful to understand possible execution options, but we mostly should be concerned with the design of the application.
In practice, systems often involve both: concurrency in design, parallelism in execution. But because programmers directly control design but not the execution, I argue that we need to focus on concurrency. This is why I tend to exclusively use the term concurrency.
Concurrent vs parallel forward progress guarantees
The C++ standard makes a further distinction between concurrent and parallel progress guarantees [WG21Progress] – one that can often lead to confusion. This distinction matters as it allows us to discuss what types of concurrent algorithms we might execute on certain execution agents.
A thread of execution provides concurrent progress guarantee if the implementation guarantees that it makes progress (if not stopped). That is, no matter how many other threads are in the system, no matter what they are doing, the thread will start and continue to make progress, until it completes or it’s stopped.
A thread of execution provides parallel progress guarantee if it is not required to begin executing immediately – but once it starts, it behaves as if it has a concurrent progress guarantee. That is, the thread might never get an execution resource, but if it gets one, then it continuously makes progress.
A good example of parallel progress guarantee would be a thread pool with fixed size. Let’s say that the pool has only 4 threads, and we want to execute 5 tasks. If these tasks will continue to execute work without completing, one task will never start. This is the expected behavior under parallel progress guarantee.
Algorithms using a latch or barrier with count N require at least N threads with concurrent progress guarantees – otherwise, some threads may never reach the synchronisation point. As all threads are waiting to meet at the latch/barrier point, if we have less than N threads that arrive at that point, we will deadlock the threads that arrive there.
Moving to the restaurant analogy, we might have a rule that all the dishes are brought at the same time, each person at the table being served by a waiter. The waiters lift their plates and wait until all required waiters are ready and only then and only them deliver them to the table. If the number of persons that need to be served is greater than the number of waiters available, we would reach a deadlock (the waiters are waiting with the plates on their hand, without being able to do anything else). Parallel progress guarantees allow a fixed number of waiters to serve all tables, even if it means some customers may wait longer, and this can lead to deadlocks under the above rule. Concurrent progress guarantees imply that we would always hire enough waiters to avoid blocking – ensuring progress for every task.
To avoid these types of deadlocks, programmers should:
- avoid algorithms that require concurrent progress guarantee (i.e., allow less waiters to serve more people at a table)
- avoid blocking the execution threads (allow the waiter to do other things while waiting for all the plates to be ready for delivery)
To strengthen these recommendations, in C++ today, there’s no portable way to create a thread pool that provides true concurrent progress guarantees.
Concurrency, multithreading, asynchrony
Since we’ve spent time exploring concurrency flavours, it’s worth also touching on the differences between concurrency, multithreading, and asynchrony.
Multithreading is an implementation technique in which we get concurrency by executing multiple threads inside a single process. We typically use OS threads as the main execution agent, and synchronization primitives (mutexes, semaphores, etc.) for coordination. The communication method typically happens via shared memory.
Asynchrony is about how to express the waiting, thus its main concern is about control flow. The typical goal is to avoid blocking and simplify the asynchronous logic. In the upcoming C++ 26 standard, asynchrony may be expressed with coroutines or with the new senders/receivers framework.
If concurrency is more concerned with the overlapping execution of work, asynchrony is more concerned with the gaps between work. Asynchrony, in most frameworks, provides a form of concurrency – one that avoids blocking by explicitly modelling waiting.
One way to achieve conceptual unity is to focus on concurrency, and acknowledge that there might be multiple flavors of concurrency, based on the application domain and chosen technology.
Domain-specific concurrency
Beyond general-purpose concurrency patterns, some domains develop their own concurrency models tailored to their problem space. For example, GPU programming uses SIMD-style parallelism, requiring different thinking than thread-based concurrency. Similarly, dataflow programming (as seen in TensorFlow or LabVIEW) treats computation as a graph of flowing data, where concurrency is implicit in the structure.
While this article focuses on concurrency as commonly seen in systems and application programming, it’s worth remembering that domain constraints and idioms often give rise to specialised forms of concurrency that don’t map cleanly onto threads, futures, or actors.
These specialised models lie beyond the scope of this article, but they reinforce the broader message: concurrency adapts to the shape of the problem.
Wrapping it up
When we talk about concurrency, one size doesn’t fit all. We have different motivations, different terminology, and different problems to solve – and each leads us to different flavours of concurrency. Whether we care about responsiveness, throughput, structure, or coordination, our reasoning about concurrency should start with understanding why we’re introducing it in the first place.
This article has aimed to offer a conceptual map of those motivations – not to define terms rigidly, but to clarify the mindsets and trade-offs involved. From restaurant metaphors to C++26 features, we’ve looked at how concurrency emerges in different forms depending on what we value.
There’s more to say, of course. We haven’t covered key concerns like debuggability and observability, or the risks of over-engineering with concurrency when simpler designs suffice. Nor have we explored domain-specific models like cooperative multitasking, GPU scheduling, or distributed coordination. But even without going there, the main lesson remains:
Concurrency isn’t one thing. It’s a spectrum of strategies – each shaped by what you’re trying to achieve.
However, from a mathematical perspective, concurrency is deceptively simple: it’s just a partial ordering of work items. But that simplicity introduces a new dimension of complexity – one rooted in time. And, as Brooks warned us, there is no silver bullet [Brooks95]: this complexity cannot be abstracted away; it interacts with the existing complexity of the application and often amplifies it.
The key is not just to be concurrent, but to be deliberate – because taste matters. So, next time you add concurrency, don’t ask how first – ask why.
References
[Blelloch90] Guy E. Blelloch, ‘Prefix Sums and Their Applications’ Technical Report CMU-CS-90-190, School of Computer Science, Carnegie Mellon University, 1990.
[Brooks95] Frederick P. Brooks Jr., The Mythical Man-Month (anniversary ed.)., Addison-Wesley Longman Publishing, 1995.
[Lamport78] Leslie Lamport. ‘Time, Clocks, and the Ordering of Events in a Distributed System’, Communications of the ACM 21, no. 7, 1978. https://lamport.azurewebsites.net/pubs/time-clocks.pdf.
[WG21Progress] WG21, ‘Forward Progress’ in Working Draft Programming Languages – C++, https://eel.is/c++draft/intro.progress, accessed Oct 2025.
[WG21Task] WG21, ‘execution::task’ in Working Draft Programming Languages – C++, https://eel.is/c++draft/exec.task, accessed Oct 2025.
[WG21Scope] WG21, ‘Counting Scopes’ in Working Draft Programming Languages – C++, https://eel.is/c++draft/exec.counting.scopes, accessed Oct 2025.
[WG21ParSched] WG21, ‘Parallel scheduler’ in Working Draft Programming Languages – C++, https://eel.is/c++draft/exec.par.scheduler, accessed Oct 2025.
[Wikipedia] Wikipedia: ‘Prefix sum’, https://en.wikipedia.org/wiki/Prefix_sum.
has a PhD in programming languages and is a Staff Engineer at Garmin. He likes challenges; and understanding the essence of things (if there is one) constitutes the biggest challenge of all.









