Direct and continuation passing styles differ. Nick Weatherhead explains a continuation passing style for synchronous data flow.
Imperative code can be viewed in terms of routines that in turn call sub-routines before passing control back to the point at which they were initiated and proceeding from there; this is known as Direct Style programming. Command shells often have the facility to pipe the output from one utility into the input of another. Adjoining self-contained modules in this way promotes loosely coupled functionality with a single purpose and well insulated state. For example, instrumentation can be conveniently implemented by intercepting a call, inspecting it and passing it on unaltered. It also enables content to be recorded and played to create or restore the state of a program.
Procedures can also transfer control forward if their product is a further procedure to call, hence the moniker Continuation Passing Style (CPS). Instead of a function having no visibility of where it returns and what is done with the result, it knows of the continuation called and the parameters passed to it. Different continuations can be chosen for different conditions including exceptional ones. They represent a program from a point forth. In doing so the call-stack is reified enabling computation to be captured and resumed. This article is an introductory exploration of their application in synchronous data flows, although they are equally adept as asynchronous callbacks.
Trampoline style execution
Invoking a function places a frame containing variables local to it onto the runtime stack. Under normal circumstances this is removed once it returns. However, CPS logically flows forward so there are no returns in the traditional sense; instead a
return
is substituted by a function to
goto
next. In doing so, tail calls will accumulate until the stack overflows. Drawing an analogy to a trampoline this can be circumvented if, with each call, the stack cyclically goes up and comes back down again.
When parameters in the call before are not used again they can be replaced and the program counter sent back from whence it came. On other occasions the variables retained in outer frames are used once control returns. For example, the Quicksort is doubly recursive; repeatedly dividing partitions in two around a pivot point. Whilst the directives to partition one way, say left, need not be retained, those to the right need to be held until all the operations left of them have been completed. To accomplish this without use of the runtime stack they must be kept in auxiliary storage, nominally the heap, until required.
Figure 1 illustrates how a trampoline incorporating deferred computations can operate. Current points to a continuation to invoke and is repeatedly set as the result of its last operation and then called until the program aborts. Buffered continuations are written to a space set aside for their immediate use whilst deferred continuations are held in the heap for later. A continuation returns an opaque reference to one or other of these. So executing a buffered continuation results in it replacing itself or returning one that had been deferred. Similarly a deferred continuation may return. or create one that is buffered. Executing either may result in the creation of one or more deferred operations. With each iteration the call stack unwinds and a loop returns the program counter back to where the aforementioned continuation is now ready to perform the next operation.
Figure 1 |
Quicksort example
Utilising the runtime stack is an elegant way to implement the Quicksort; however, its recursive nature means that this will grow. Adapting it to use continuations demonstrates the elimination of tail recursive calls, known as Tail Call Optimisation (TCO), and the utilisation of deferred computation. An implementation is shown below.
Chain
(Listing 1) is the abstract base class for a continuation. It is composed of a single member, the function reference
onto_
, thereby avoiding the need for a virtual function table. This is initialised on construction and invoked via the function operator, which once called executes the current continuation and returns the subsequent one. The global pointer,
buffer_
, references space set aside for buffered continuations. This will later be sized to accommodate the largest one possible. Other strategies might arrange for the continuation object to be returned at the bottom of the call stack and proceed by advancing over it and on. While this may save space, manipulating the call stack adds complexity and must be done in a way that prevents corruption.
#ifndef CHAIN_H #define CHAIN_H #include <iostream> class chain { public: constexpr const chain* operator( )( ) const { return onto_( *this ); } protected: static void* const buffer_; using fn = const chain* ( & )( const chain& ); explicit constexpr chain( fn onto ) : onto_( onto ) { } constexpr chain( const chain& that ) : onto_( that.onto_ ) { } private: fn const onto_; const chain& operator=( const chain& ); }; … |
Listing 1 |
Buffered
(Listing 2) glues the definition of an abstract continuation to a derived class’s implementation. Static polymorphism is achieved by utilising the
Curiously Recurring Template Pattern
[
CRTP16
]. Here the principle of inheriting derived behaviour is similar, but instead of a class inheriting from a class template instantiation using itself, which in this case would be of the form
chain< buffered >
, it inherits from a regular class i.e. just
chain
. Thus
chain
is the base class from which both
buffered
and
deferred
objects derive and in turn means a
chain
pointer can be downcast to determine to which of these it refers. Variadic template arguments enable the creation of objects implementing a
chain
but which have different constructor signatures. Here a factory method,
create
, takes
args
to construct a derived continuation. This calls the derived class’s constructor and
placement new
writes the object directly into the continuation buffer.
… template< class Chain, typename... Args > class buffered : public chain { public: static constexpr const Chain* create( Args... args ) { return new( chain::buffer_ ) Chain( args... ); } protected: constexpr buffered( ) : chain( static_cast< fn >( buffered::onto ) ) { } private: static const chain* onto( const chain& that ) { const Chain& next = static_cast< const Chain& >( that ); std::cerr << "buffered(" << next << ")\n"; const chain* onto = next( ); next.~Chain( ); return onto; } }; … |
Listing 2 |
The
onto
function downcasts
chain
to the derived
Chain
; its function operator is then called. Before returning its destructor is explicitly called because of being placed in a buffer rather than on the call stack. It is these callbacks that are said to imitate ‘
goto
statements with arguments’. Whilst these jumps can make tracing code by hand more challenging, it need not make determining the execution path onerous. A continuation concerns itself with the content of the input rather than where it came from. Therefore, those that inspect input and output it unaltered can be injected between those that perform transformation without altering intent. Here, rather than injecting continuations, a
stderr
statement suffices for outputting trace. In production-like code, this could be replaced by categorised trace with each continuation having a bitmap of those categories to associate it with. This demonstrates that, unlike the traditional approach of peppering trace throughout a program, instrumentation can be achieved by observing what is passed between continuations.
Deferred
(Listing 3) is the heap allocated equivalent of
buffered
. Static polymorphism enables a continuation,
chain_
, to be embedded within a deferred object. This is as opposed to maintaining a reference to one passed in, thus keeping allocation contiguous. As a
deferred
object is itself a continuation it can use its own function,
onto
, as its chained functor. When this is called it invokes
chain_
from the heap and the memory is freed by the encompassing object deleting itself. In this way it is a one-time computation responsible for its own allocation and deallocation.
… template< class Chain, typename... Args > class deferred : public chain { public: static constexpr const chain* create( Args... args ) { return new deferred( args... ); } private: Chain const chain_; constexpr deferred( Args... args ) : chain( deferred::onto ), chain_( args... ) { } static const chain* onto( const chain& that ) { const deferred& next = static_cast< const deferred& >( that ); std::cerr << "deferred(" << next.chain_ << ")\n"; const chain* onto = next.chain_( ); delete &next; return onto; } }; #endif |
Listing 3 |
Bound
(Listing 4) uses a pair of pointers,
begin
and
end
, to demark an extent within an array.
Begin
points to the first element, and
end
just past the last element. From this its length can be calculated and there is an output operator that iterates over, and prints out, each element.
#ifndef QUICK_H #define QUICK_H #include <cstdlib> #include "chain.h" template< typename T > struct bound { T* const begin_; T* const end_; constexpr bound( T* begin, T* end ) : begin_( begin ), end_( end ) { } constexpr size_t length( ) const { return end_ - begin_; } friend std::ostream& operator<<( std::ostream& os, const bound& that ) { const T* itr = that.begin_; os << *itr; while( ++itr < that.end_ ) os << ' ' << *itr; return os; } }; … |
Listing 4 |
Terminate
(Listing 5) prints the elements of an array and aborts a program. When instantiating a Quicksort it is passed in as a deferred operation, hence the
friend class
declaration so that a cached instance can access the
private
constructor. It is the first continuation on the stack of these deferred operations and thus the last in the chain of execution.
… template< typename T > class terminator { friend class deferred< terminator, T*, T* >; public: friend std::ostream& operator<<( std::ostream& os, const terminator& that ) { return os << "terminator(" << that.bound_ << ")"; } const chain* operator( )( ) const { std::cout << bound_ << "\n"; exit( 1 ); } private: const bound< T > bound_; constexpr terminator( T* begin, T* end ) : bound_( begin, end ) { } }; … |
Listing 5 |
Quick
(Listing 6) implements a rudimentary Quicksort taking the middle element of an array, placing elements lower than it to its left and higher than it to its right. The left and right partitions are then taken and repeatedly divided until they can’t be partitioned any more, leaving the array in sorted order. Partitioning results in the left hand portion being written directly into the continuation buffer which is returned as the current continuation. The right hand portion references those already deferred, and adds itself to them, forming a stack of cached computation. If there are insufficient elements to partition then that most recently deferred is returned as the current continuation; and so it proceeds until the final deferred operation is reached and terminates the program. When pivoting left
quick
is created, by default, as a
buffered
object and when pivoting right as a
deferred
object. The
buffered
and
deferred
friend
class declarations are requires so that
quick
’s
private
constructor can be accessed via each one’s respective
create
factory method.
… template< class T > class quick : public buffered< quick< T >, T*, T*, const chain* > { friend class buffered< quick, T*, T*, const chain* >; friend class deferred< quick, T*, T*, const chain* >; public: friend std::ostream& operator<<( std::ostream& os, const quick& that ) { return os << "quick(" << that.bound_ << ")"; } const chain* operator( )( ) const { size_t length = bound_.length( ); if ( length < 2 ) return onto_; T mid = bound_.begin_[ length / 2 ]; T* begin = bound_.begin_ - 1; T* end = bound_.end_; for (;;) { while( *( ++begin ) < mid ); while( *( --end ) > mid ); if ( begin >= end ) break; T temp = *begin; *begin = *end; *end = temp; } return quick::create( bound_.begin_, begin, deferred< quick, T*, T*, const chain* >:: create( begin, bound_.end_, onto_ ) ); } static constexpr const quick* create_with_terminator( T* begin, T* end ) { return quick::create( begin, end, deferred< terminator<T>, T*, T* >:: create( begin, end ) ); } private: const bound< T > bound_; const chain* const onto_; constexpr quick( T* begin, T* end, const chain* onto ) : bound_( begin, end ), onto_( onto ) { } }; #endif |
Listing 6 |
Quick
’s constructor takes the continuation to move onto next as its last parameter. If there is no subsequent action to perform the program can exit, hence an overloaded constructor might be purposed to take just
begin
and
end
whilst defaulting the initialisation of
onto
to terminate. Nevertheless, when the compiler analyses the
create
factory method it continues to deduce that the constructor with more arguments, rather than those matching its signature, should be used. So, instead the call is wrapped in the aptly named
create_with_terminator
.
Finally, before starting the program (Listing 7) the continuation buffer is allocated of a size sufficient to store the largest continuation; in this case a
quick
sort operating on an array of integers. The
main
routine takes a space separated list of integer arguments from the command line and creates an array. The current continuation is defined as a
quick
sort on the entire array which, once complete, will execute
terminate
. Alternatively a continuation could be specified to go and use the sorted array in some other way. An infinite loop executes the program in trampoline style; the current continuation performing an operation and returning the next continuation in the chain.
#include <cstddef> #include "quick.h" alignas( max_align_t ) char buffer[ sizeof( quick<int> ) ]; void* const chain::buffer_ = buffer; int main( int argc, char* argv[] ) { int *data = ( int* ) calloc( --argc, sizeof( int ) ); for( int i = 0; i < argc; ++i ) data[i] = atoi( argv[i + 1] ); const chain* current = quick<int>:: create_with_terminator( data, &data[argc] ); for (;;) current = ( *current )( ); } |
Listing 7 |
Conclusion
As evidenced, by eliminating tail recursion in Quicksort, inductive calls and non-local control flows are good candidates for continuations. When flow is linear the active context is not revisited so can be overwritten with the next. This in combination with trampoline style execution ensures a compact stack. For flows parallel in nature the division of work, whether run separately or interleaved with others, needs to be captured. In the direct style the runtime stack implicitly suspends and resumes calls in the required order, but when using CPS these complexities are exposed and must be managed explicitly.
A detailed comparison of performance between direct and continuation passing styles isn’t examined here. There is some overhead in calling a continuation over a regular function call. Unlike regular functions they are polymorphic requiring an indirection to execute them. There is also the auxiliary storage required to hold those deferred. Despite this only a marginal increase in execution time was observed when comparing the Quicksort presented with a recursive implementation. This could well be accentuated if, by specifying smaller packets of work, a proliferation of continuations occurred.
Whilst it takes time to become accustomed to CPS, it affords a way to express tasks and handle events via callbacks. An application programmer is likely to encounter its use for this purpose. CPS is also relevant in the implementation of programming languages and their compilers. Constructs can be defined, and conversely programs can be described, in terms of it [ CPS16 ].
References
[CRTP16] Curiously recurring template pattern, Wikipedia 2016.
[CPS16] Continuation-passing style, Wikipedia 2016.
Further reading
Andy Balham, Tail Call Optimisation in C++, Overload 109, June 2012.
Cristina Videira Lopes. Exercises in programming style Chapter 8: Kick Forward, Chapman and Hall/CRC, November 2015.
Acknowledgments
Many thanks to the Overload review team for their tips and observations which have benefited this article and my own understanding.