C++ Executors: the Good, the Bad, and Some Examples

Executors aim to provide a uniform interface for work creation. Lucian Radu Teodorescu explores the C++ executors proposal.

One of the most anticipated features in the upcoming C++ standard is the support for executors. Through executors, we hope to get a higher level set of abstractions for doing concurrent/parallel work. If everything goes well, the executors will land in C++23.

This article aims at providing a critical perspective on the proposed additions to the C++ standard. We specifically look at the proposal entitled P0443R14: A Unified Executors Proposal for C++ [P0443R14], but we touch on a few connected proposals as well.

The article provides a brief tour of the proposal with a couple of examples, and then jumps to a critical analysis of various points of the proposal. This critical analysis tries to bring forward the strong points of the proposal, as well as the weak points. The hope is that by the end of the article the reader will have a better understanding of the proposal, and of its pros and cons.

A brief tour of the proposal

P0443R14 has some sort of internal unity, but at a more careful reading one can divide the proposal in two main parts:

support for executors
support for senders and receivers

The executors part doesn’t need to be coupled with senders and receivers, while senders and receivers can be theoretically based on slightly different executor semantics. Furthermore, conceptually, they solve different problems. Moreover, the paper itself makes the distinction between these two parts. Thus, it makes sense for us to treat them separately as well.

The libunifex library [libunifex] is a prototype implementation for the proposal, containing almost everything from the proposal, and much more. The authors of the library were also contributors to the proposal. My own Concore library [concore] also has support for the main concepts in the proposal.

Executors support

An executor is a work execution interface [P0443R14]; it allows users to execute generic work. The following code shows a simple usage of an executor:

 executor auto ex = ...;
 execute(ex, []{ cout << "Hello, executors!\n"; });

If we have an executor object (i.e., matching the executor concept), then we can just execute work on it. The work is some form of an invokable entity. We have decoupled the work from the context in which it is executed.

That’s it! Things are that simple!

The P0443R14 paper describes an executor that can be obtained from a static_thread_pool object, proposed to be added to the standard library. As the name suggests, this would add support for thread pools. The users can then create thread pools and pass work to be executed on these pools. Here is one simple example that will execute work on one of the threads inside the thread pool:

  std::static_thread_pool pool(4);
  execution::execute(pool.executor(),
    []{ cout << "pool work\n"; }

The executor concept and the thread pool by themselves are directly usable for building concurrent applications. But, more importantly, it is easy to create other executors. The paper exemplifies how one could write an inline_executor that just executes the work on the current thread, similarly to calling a function. To define a new executor, the user must provide an execute method that takes work (technically this is a customisation-point-object that can take other forms too, but we’ll try to provide a simplified view) and a way to compare the executors for equality.

To showcase how easy it is to define an executor, we’ll just copy the definition of inline_executor given by P0443R14 here:

  struct inline_executor {
    // define execute
    template<class F>
    void execute(F&& f) const noexcept {
      std::invoke(std::forward<F>(f));
    }
    // enable comparisons
    auto operator<=>(const inline_executor&) 
      const = default;
  };

Piece of cake!

An executor can have multiple properties attached to it. For example, an executor can be blocking to callers (like inline_executor) or non-blocking (like the executor from static_thread_pool). Another example would be the property that indicates the type of allocator that the executors use.

The support for properties is actually introduced by paper P1393R0: A General Property Customization Mechanism [P1393R0] (and the executors paper builds on it). At this point, there is no consensus on whether this would move forward or not. But, even if this proposal doesn’t move forward, executors are still usable.

Senders and receivers

Here, things get a bit more complicated. The paper defines the following (major) concepts:

sender: work that has not been scheduled for execution yet, to which one must add a continuation (a receiver) and then “launch” or enqueue for execution [P0443R14]
receiver: is a callback object to receive the results from a sender object
scheduler: a factory of single-shot senders [P0443R14]
operation_state: the state of an asynchronous operation, the result of connecting a sender with a receiver

Besides this, the paper also defines a few customisation points (CPOs):

set_value: applied to a receiver object, it is called by the sender to notify the receiver about the completion of the work, with some resulting values; part of the semantic of being a receiver.
set_done: applied to a receiver object, it is called whenever the work in the sender has been cancelled; part of the semantic of being a receiver.
set_error: applied to a receiver object, it is called whenever there was an error with the work in the sender; part of the semantic of being a receiver.
connect: applied to a sender object and passing in a receiver, it is called to connect the two objects, resulting in an object compatible with operation_state; part of the semantic of being a sender.
start: applied to an operation_state object, it is called to logically start the operation (as resulting from connecting the sender and the receiver); part of the semantic of being an operation_state.
submit: applied to a sender object, it is called to combine it with a receiver object and immediately start (at least logically) the resulting operation.
schedule: applied to a scheduler object, it is called to return a ‘single-shot’ sender (this will call the receiver with no value); part of the semantics of being a scheduler.

Customisation point objects are generalisations over functions. The standard provides some free functions for them, but allows (and sometime requires) the user to customise their behaviour. The key point that one should distinguish between two variants of functions with the same name: one that the framework provides and one that the user needs to provide.

Let’s take a simple example. To define a receiver, the user is required to define a set_done method/function. The framework also defines an execution::set_done function that can be called by the senders. Something like the following:

  struct my_sender {
  ...
  execution::set_done(recv);
  ...
  };

Yes, things can be a bit confusing, but I think that with enough exposure people will get used to this. It’s similar to the std::begin() function versus the begin() method defined in containers like std::vector.

Using this, one can write code that represents asynchronous computations as chains between senders and receivers. Listing 1 presents an example.

struct my_recv0 {
  void set_value() { cout << "impulse\n"; }
  void set_done() noexcept {}
  void set_error(exception_ptr) noexcept {}
};
template <typename T>
struct my_recv {
  void set_value(T val) { cout << val << endl; }
  void set_done() noexcept {}
  void set_error(exception_ptr) noexcept {}
};
static_thread_pool pool{3};
auto sched = pool.scheduler();
// single-shot sender
auto sndr1 = schedule(sched);

auto op_state = connect(sndr1, my_recv0{});
start(op_state); // prints "impuslse"

// computation with P1897R3 abstractions
auto f = [](int x) { return 3.141592 * x; };
auto print = [](double x) { cout << x; };
auto sndr = just(2)
          | on(sched)
          | transform(f);
// prints the result 2*3.141592 asynchronously
submit(move(sndr), my_recv<double>{});

Listing 1

At the beginning of the listing, there are two types that model the receiver concept; they can be used to be notified about the completion of some asynchronous operation. set_value() is called when the previous computation is successful, set_done() is called when the operation was cancelled, and set_error() is called whenever there was an error in the previous computation. One way to think of receivers is to consider them as generalised callbacks; they are called to announce the completion (successful or not) of another operation.

Schedulers are objects that can generate ‘single-shot’ senders. These single-shot senders are just sending impulses to receivers downstream, without sending any information to them. This is why we can connect such a sender (sndr1 with a receiver that doesn’t take any value).

A sender can be bound to a receiver only once, so they can be considered short-lived: they only live for one computation to go through them. This is why it is important for the framework to allow easy creation of senders.

The example in Listing 1 shows how a sender (sndr1) can be connected to a receiver (in our case, an object of type my_recv0). The connection between a sender and a receiver is captured by an object that models the operation_state concept. This concept corresponds to an asynchronous operation (i.e., tasks). Senders by their own, and receiver by their own, cannot be considered tasks. The only thing that one can do with an operation_state object is to start it. This is done by calling the start() customisation-point-object.

In the last few lines of Listing 1, we present how a computation can be represented using the sender algorithms introduced by [P1897R3]. A sender algorithm is a function that returns a sender (or, something that, when combined with a sender, returns another sender). In our case, just(2) returns a sender that will just push the value 2 to its receiver. The on(sched) returns an object that when combined with the previous sender generates a sender that runs the previous computation on the given scheduler. In our case, we are indicating that everything needs to be executed on our thread pool. Finally, transform(f), when combined with the previous sender, will return a sender that will execute the given function to transform the received input.

Putting all these together, we obtain a sender that will multiply 2 with 3.141592 on our thread pool.

In the last, line, instead of connecting it to a receiver and calling start on the resulting operation state, we call submit. This is a shorter version of the above.

This was a quick summary of the important parts of the proposal. More examples and discussions can be found in other resources on the Internet. Two presentations that explain executors and senders/receivers in more detail can be found at [Hollman19] and [Niebler19].

A critique of the executors proposal

We will organise this section into a series of smaller inquiries into various aspects of the proposals. We will label each of these inquiries with Good and Bad. This labelling scheme is a bit too polarising, but I’m trying to convey a summary of the inquiry. Especially the Bad label is maybe too harsh. This label must definitely NOT apply transitively to the people behind this proposal; they worked really hard to create this proposal; this thing is not an easy endeavour (see [Hollman19] for some more insights into the saga of executors).

Good: Support for executors

I can’t find words to express how good it is to have executors as a C++ core abstraction. It allows one to design proper concurrent algorithms, while separating them from the actual concurrency primitives.

The same application can have multiple concurrency abstractions, and we can design algorithms or flows that work with any of them. For example, one might have one or more thread pools as described by this proposal, or can have executors from third-party libraries (Intel oneAPI, Grand Central Dispatch, HPX, Concore, etc.). Moreover, users can write their own executors, with reasonable low effort. For example, in my C++Now 2021 presentation [Teodorescu21b] I’ve showcased how one can build a priority serializer (structure that allows executing one task at a time, but takes the tasks in priority order) – the implementation was under 100 lines of code.

The reader might know that I often talk about serializers as concurrent abstractions that help in writing better concurrent code, simulating locks behaviour while avoiding the pitfalls of the locks (see [Teodorescu21a], [Teodorescu20a] or [Teodorescu20b]). Serializers are also executors. Generalising, we can introduce concurrent abstractions as executors.

Moreover, the executors can be easily composed to provide more powerful abstractions. For example, a serializer (which is an executor) can be parameterised with one (or even two) executors, which specifies the actual mechanism to execute the tasks.

If there is one thing that the readers remember from this article, I hope it is that executors are a good addition to the standard.

Good: Every concurrent problem can be specified using only executors

As argued in [Teodorescu20a], every concurrent problem can be expressed as a set of tasks and restrictions/dependencies between them. I’ll not try to give all the formal details here, but we can prove that the restrictions/dependencies of the tasks can be represented using different executors. Just like we use serializers to handle some types of restrictions, we can generalise them and use executors for solving all types of restrictions.

To achieve this, we can add various labels (with or without additional information) to the tasks, and we define rules that infer the restrictions/dependencies based on these labels. For example, dependencies can be encoded with a particular label that contains some ordering number. Restrictions like those found in serializers can be implemented with labels that mark that certain tasks are mutually exclusive.

For each type (or better, for each instance) of a label, we can create an executor that will encode the restrictions represented by that label. This way, for each type of restriction that we have, we will have an executor to encode it.

If we have all these, then one just needs to compose the executors in the proper way to ensure the safety of the application. The composition might pose some problems in terms of performance, but these performance problems can always be solved by specialising the executors.

Executors are fundamental building blocks for writing concurrent programs.

Good: Proposal provides a way to encode computations

Between a sender and a receiver pair, we can encode all types of computations. As an operation_state object can act like a task, and as we can represent all computations with tasks, it means that we can represent all types of computations with senders and receivers.

Beyond this, the proposal seems to encourage the expression of computations as chains of computations, in which values are passed from a source (an initial sender) to a final receiver. This indicates a tendency towards functional style expression of computations. This sounds good.

Bad: Proposal seems to restrict how computations can be expressed

The above point can be turned around as a negative. It feels awkward in a predominant-imperative programming language to allow expression of concurrency in a functional style. Functional style sounds good to me, but there are a lot of C++ programmers that dislike functional style.

Probably my biggest complaint here is that where to place computations is confusing: in the senders or in the receivers. And the naming here doesn’t help at all. Let’s say that one wants an asynchronous action to dump to disk the state of some object. How should one design this? The problem of dumping some state to disk doesn’t properly map to the sender and receiver concept; there is nothing to send or to receive.

There is no way one can directly put this computation into an operation_state object, so one needs to choose between a sender and a receiver. If we were to look where the proposal puts most of the computation, we would end up with the idea that the dumping code needs to be placed inside a sender; the proposal also states “a sender represents work [that has not been scheduled for execution yet]” [P0443R14], so this seems like the reasonable thing to do. But writing custom senders is hard (see below); moreover, we need to bind to it a dummy receiver for no purpose.

An easier alternative is to put the computation in the receiver. But that goes against the idea of “a receiver is simply callback” that is used by the proposal to describe receivers. Creating a receiver that dumps the data to disk is relatively easy. But, in addition to that, one needs to also connect the receiver with a single-shot sender and lunch the work to the execution. The presence of the sender should not be needed.

One can easily solve the same problem with executors.

Bad: The concepts introduced are too complex

Executors are simple: you have an executor object, and you provide work to it. This can be easily taught to C++ programmers.

On the other hand, teaching senders and receivers is much harder. Not only there are more concepts to teach, but also the distinction between them is unclear.

Here are a few points that can generate confusion:

schedulers seem to be an unneeded addition; we can represent the same semantics with just executors and senders
operation_state objects seem more natural (as they correspond to tasks), but there is no way for the user to directly write such objects
if operation_state objects cannot be controlled by the user, then maybe they shouldn’t be exposed to the user
submit seems to be a nice simplification of connect and start, but having both variants adds confusion
considering that both submit and the pair connect and start can be customised by the user, one can end up in cases in which submit is not equivalent with connect/start; this means that we have ambiguous semantics

Looking closely at the proposal, we find circular dependencies between the proposed concepts and customisation-point-objects (via wrapper objects). For example, connect CPO can be defined via the as-operation wrapper in terms of execute CPO. But then, execute is defined in terms of submit CPO, which is also defined in terms of connect (via the submit-state wrapper object). Circular dependencies are a design smell.

The point is that it’s really hard to grasp senders and receivers.

Good: Receivers have full completion notifications

Many threading libraries are more concerned about ensuring the execution of asynchronous work, and don’t consider error cases much. The way that senders and receivers are conceived, if the user puts the work in the sender, there will be a notification that indicates in which way the work was completed: successfully, with an error, or it was cancelled.

Good error handling is always desired.

Good: Easy to write receivers

As shown in Listing 1, it’s relatively easy to write receivers. If one provides three methods – set_value, set_done and set_error – then one has defined a receiver. There are multiple ways in which the receiver can be defined, and there might be different types of receivers, but the idea remains simple.

If users only need to write receivers, then using senders and receivers would probably be an easy endeavour.

Bad: Hard to write senders

On the other hand, it’s hard to write even simple senders. For a sender, one needs to write a connect method/function (or maybe submit?). This gets a receiver and has to generate an operation_state object, which should be compatible with the start CPO. On top of these, templates, tag types, move semantics and exception specifications will make this much harder.

But this is not the complete picture. The proposal encourages composition of senders; thus one should write sender algorithms instead of simple senders. That is, algorithms that take one sender and return another sender. If one sender receives a signal from another sender, it should do its corresponding work and notify the next receiver. This means that the user also needs to write a receiver for the previous sender, and ensure that the flow is connected from the previous sender to the next receiver.

Listing 2 provides an example of a sender algorithm that just produces a value when invoked by the previous sender. I doubt that the reader would consider this easy to write, even with this relatively simple case we are covering.

// Receiver of void, and sender of int
template <typename S>
struct value_sender {
  int val_;
  S base_sender_;

  value_sender(int val, S&& base_sender)
    : val_(val)
    , base_sender_((S &&) base_sender) {}
  template <template <typename...> class Tuple,
    template <typename...> class Variant>
  using value_types = Variant<Tuple<int>>;
  template <template <typename...> class Variant>
  using error_types = Variant<std::exception_ptr>;
  static constexpr bool sends_done = true;

  template <typename R>
  struct op_state {
    struct void_receiver {
      int val_;
      R final_receiver_;

      void_receiver(int val, R&& final_receiver)
        : val_(val)
        , final_receiver_((R &&) final_receiver)
          {}
      void set_value() noexcept {
        execution::set_value((R &&)
        final_receiver_, val_); }
      void set_done() noexcept {
        execution::set_done((R &&)
        final_receiver_); }
      void set_error(std::exception_ptr e)
        noexcept { execution::set_error((R &&)
        final_receiver_, e); }
    };
    typename detail::connect_result_t<S,
      void_receiver> kickoff_op_;
    op_state(int val, R&& recv, S base_sender)
      : kickoff_op_(execution::connect((S &&)
        base_sender, void_receiver{
          val, (R &&) recv})) {}
    void start() noexcept {
      execution::start(kickoff_op_); }
  };
  template <typename R>
  op_state<R> connect(R&& recv) {
    return {val_, (R &&) recv
      , (S &&) base_sender_};
  }
};

Listing 2

Bad: Hard to extend the senders/receivers framework

As mentioned above, the proposal envisions extensibility through sender algorithms, similar to the one discussed above. This directly implies that the framework is hard to extend.

This can be compared with executors, which are relatively easy to extend. It’s not hard to create a class that models the executor concept.

Bad: No monadic bind

One solution to the extensibility problem was a monadic bind. That is, provide a way in which one can create new senders by providing a function with a certain signature. Although monads are sometimes considered hard to grasp, they are proven to be useful for extending the operations on certain structures.

It is worth noting that the senders framework has the potential to use monads as follows. First, msender<T> encodes the concept of a sender that connects to receivers that take object T as input. For the type converter part of the monad, it is easy to find a transformation from an object T to msender<T> – this is actually provided by the sender just() defined by [P1897R3]. The missing operation is something that would have a signature like:

  template <typename T1, typename T2>
  msender<T2> bind(const msender<T1>& x,
    function<msender<T2>(T1)> f);

The reader should note that [P1897R3] provides a relatively similar function (imperfect translation):

  template <typename T1, typename T2>
  msender<T2> transform(const msender<T1>& x,
    function<T2(T1)> f);

But this is not the same. In the first case the received function object is of type T1 → msender<T2>, while in the second case has the kind T1 → T2 One cannot provide the same extensibility with transform as with the bind function. If the bind function corresponds to monoids, the tranform function corresponds to functors (using terminology from category theory). Of the two, monads are good at composition.

Bad: Cannot express streams with senders/receivers

One might think that senders and receivers are good at representing data streams (i.e., reactive processing, push model). In such a model, one would have sources of events (or values). Then, one can attach various transformations on top of these sources to create channels that transform the input events/values so that they can be properly processed.

The chains of transformations can also be created with senders and receivers, but unfortunately such a channel can only propagate one value through it. For each value that needs to be propagated, a new channel of transformation needs to be created.

This is unfortunate as data stream programming can be an efficient way of solving some concurrent problems.

Bad: Too much templatised code

I’m not going to dwell too much on this topic. The proposal advocates highly templatised code, which will result in increased compilation time for all the code that uses it. If most C++ software were to use this as the fundamental basis for concurrency, then the overall compilation time for all the programs would increase considerably.

The reader should note that, for proper concurrency, the executors would have to move work packages between threads. This implies that at some point there needs to be some type-erasure. It’s a pity that this type-erasure is not at a higher level.

Conclusions

This article tried to provide a critique of the executors proposal for the upcoming C++ standard. As with a lot of such critiques in our field, we cannot be fully objective. Software engineering is based on compromises, and the tendency to choose one alternative over another makes us more subjective than we would want to be. But, even if we can’t achieve objectivity, such a critique can serve to highlight some nuances of the critiqued object. I have tried to be as objective as I can in this article, but, perhaps, my biases found their way through. However, it is my hope that the reader will find some help in all this endeavour.

As it is already mentioned in the proposal, P0443R14 has two main parts: one that introduce executors, and one that introduce senders and receivers.

Overall, I find the executors part to be a needed addition to the C++ language, Moreover, it’s simple to use and very extensible.

For the senders and receivers part of the proposal, I have formulated some objections. They are more complex than they need to be and not as extensible. Probably the best way forward for the standard committee is to split the proposal in two parts and consider them separately for inclusion in the C++ standard.

Looking at the overall proposal, the simple presence of executors makes it worthwhile. C++ can move towards using higher level abstractions for concurrency, abstractions that need executors as their fundamentals. For example, the parallel algorithms would greatly benefit from executors.

References

[concore] Lucian Radu Teodorescu, Concore library, https://github.com/lucteo/concore

[Hollman19] Daisy Hollman, ‘The Ongoing Saga of ISO-C++ Executors’, C++Now 2019, https://www.youtube.com/watch?v=iYMfYdO0_OU

[libunifex], Facebook, libunifex library. https://github.com/facebookexperimental/libunifex

[Niebler19] Eric Niebler, Daisy Hollman, ‘A Unifying Abstraction for Async in C++’, CppCon 2019, https://www.youtube.com/watch?v=tF-Nz4aRWAM

[P0443R14] Jared Hoberock et al., ‘P0443R14: A Unified Executors Proposal for C++’, http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2020/p0443r14.html

[P1393R0] David Hollman et al., ‘P1393R0: A General Property Customization Mechanism’, http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2020/p0443r14.html

[P1897R3], Lee Howes, ‘P1897R3: Towards C++23 executors: A proposal for an initial set of algorithms’, http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2020/p1897r3.html

[Teodorescu21a], Lucian Radu Teodorescu, ‘Threads Considered Harmful’, https://www.youtube.com/watch?v=_T1XjxXNSCs

[Teodorescu21b], Lucian Radu Teodorescu, ‘Designing Concurrent C++ Applications’, https://www.youtube.com/watch?v=nGqE48_p6s4

[Teodorescu20a] Lucian Radu Teodorescu, ‘The Global Lockdown of Locks’, Overload 158, August 2020, available from https://accu.org/journals/overload/28/158/teodorescu/

[Teodorescu20b] Lucian Radu Teodorescu, ‘Concurrency Design Patterns’, Overload 159, October 2020 available from https://accu.org/journals/overload/28/159/teodorescu/

Lucian Radu Teodorescu has a PhD in programming languages and is a Software Architect at Garmin. He likes challenges; and understanding the essence of things (if there is one) constitutes the biggest challenge of all.