In his quest to pass around values efficiently, Richard Harris concludes by looking at ways of transferring ownership, now and in the future.
Last time we took a look at string and discussed the implications of the copy-on-write, or COW, optimisation. I concluded that despite its many shortfalls, I was unwilling to dismiss this type of optimisation out of hand.
This time, I'll explain why.
auto_string
Recall that it's auto_ptr rather than shared_ptr that is designed to manage ownership transfer. So, why not consider auto_ptr rather than shared_ptr semantics to eliminate the copy?
Let's look at a slightly different definition of string (Listing 1).
class string { public: typedef char value_type; typedef char * iterator; typedef char const * const_iterator; typedef size_t size_type; //... string(); string(const char *s); string(const string &s); string(const auto_string &s); string & operator=(const string &s); string & operator=(const auto_string &s); string & operator=(const char *s); auto_string release(); const_iterator begin() const; const_iterator end() const; iterator begin(); iterator end(); //... private: size_type size_; scoped_array<char> data_; }; |
Listing 1 |
Here, the data is stored in a scoped_array (like scoped_ptr but uses delete[] ) rather than a shared_array and we've picked up a few extra functions, all of which refer to a new type, auto_string .
Let's have a look at the definition of auto_string (Listing 2).
class auto_string { friend class string; auto_string(string::size_type n, const auto_array<char> &s) throw(); string::size_type size_; auto_array<char> data_; }; |
Listing 2 |
So, auto_string is simply a wrapper for an auto_array (which also uses delete[] ) that hides its data from everything but string .
This gives us an explicit mechanism for stating that we wish to transfer ownership of a string , through the release member function (Listing 3).
auto_string string::release() { return auto_string(size_, data_.release()); } |
Listing 3 |
The constructor and assignment operator can now be overloaded to take advantage of the fact that we are transferring ownership (Listing 4).
string::string(const auto_string &s) : size_(s.size_), data_(s.data_) { } string & string::operator=(const auto_string &s) { string tmp(s); swap(tmp); return *this; } |
Listing 4 |
Now, when we assign an auto_string returned from a function to a string (Listing 5), we have the sequence of events shown in Listing 6 and at no point is the string data copied.
auto_string f() { string result("hello, world"); return result.release(); } void g() { string s; s = f(); } |
Listing 5 |
call g default construct s; call f construct "hello, world" into result transfer s into temporary auto_string transfer temporary auto_string into return value exit f transfer auto_string into tmp swap data with tmp exit g |
Listing 6 |
The advantage this has over the shared_array version is that at every step of copy construction or assignment, the string data is owned by one and only one object. In other words, we no longer have aliasing of the data and consequently don't have to deal with the headaches that that causes.
Of course, since the compiler can optimise the copy away anyway, all we have succeeded in doing is to create a slightly better version of a completely pointless optimisation.
Well, not quite.
Thank you for your patience, by the way.
vector
There is, in fact, another benefit to explicit lifetime control but this is much better illustrated with a class representing a mathematical vector.
To start, let's have a brief recap of valarray . Just kidding - see Listing 7.
class vector { public: typedef double value_type; typedef double * iterator; typedef double const * const_iterator; typedef size_t size_type; //... vector(); vector(const vector &v); vector(const auto_vector<T> &v); explicit vector(size_t n); vector & operator=(const vector &v); vector & operator=(const auto_vector &v); auto_vector release(); const_iterator begin() const; const_iterator end() const; iterator begin(); iterator end(); //... vector& operator+=( const vector& v ); //... private: size_type size_; scoped_array<double> data_; }; |
Listing 7 |
The principal difference between this and a std::vector<double> is that we want to support mathematical vector operations.
Let's use vector addition as an example (Listing 8).
vector operator+(const vector &l, const vector &r) { vector tmp(l); tmp += r; return tmp; } |
Listing 8 |
Now this isn't the most efficient implementation, creating an uninitialised vector and filling it with the result of the addition would have 3n read and write operations whereas this has 4n. Still, it does enable us to reuse the in-place addition operator.
Note that we are assuming that both in-place addition and out-of-place addition require 2n operations, with out-of-place addition incurring a further n to copy the results. Strictly speaking this is not the case since the processor must make 3n reads and writes from main memory in both cases.
However, for many processors in-place addition will make much better use of the processor cache and as a result will generally be more efficient.
A few crude experiments with my compiler supported this, showing that out-of-place addition did indeed take approximately 1.5 times as long as in-place addition, so we shall maintain these complexity assumptions as a convenient fiction.
We can effectively address the efficiency concern when l is bound to a function return value by redefining the operator as shown in Listing 9.
vector operator+(vector l, const vector &r) { l += r; return l; } |
Listing 9 |
Now we are exploiting copy elision to reduce the number of reads and writes to 2n, even better than if we were to use an uninitialised vector to store the result.
Unfortunately, we'll still pay for the extra copy when l is a variable. This is especially irritating if r is a function return value and hence a candidate for in-place addition. Since reference to T and value of T are afforded the same status during function overload resolution we can't use overloading to address this. Unless we use a different type for our temporaries. Like auto_vector , for example (Listing 10).
auto_vector operator+(const vector &l, const vector &r); auto_vector operator+(const auto_vector &l, const vector &r); auto_vector operator+(const vector &l, const auto_vector &r); auto_vector operator+(const auto_vector &l, const auto_vector &r); |
Listing 10 |
The implementation of each overload will look like our first version, except that we will prefer to construct the temporary from an auto_vector whenever possible, so that we can avoid copying one of the arguments (Listing 11).
auto_vector operator+(const auto_vector &l, const vector &r) { vector tmp(l); tmp += r; return tmp.release(); } |
Listing 11 |
So the long awaited advantage of this approach is that it allows us to indicate when we no longer care what happens to an object, which we can exploit both for transfer initialisation and for transforming out-of-place operations into in-place operations.
Note that by returning auto_vectors from the addition operators we can eliminate the construction of a series of temporaries in a compound expression (Listing 12).
auto_vector f() { vector x; //... return x.release(); } auto_vector g() { vector x; //... return x.release(); } auto_vector h() { vector x; //... return x.release(); } void i() { vector sum = f()+g()+h(); } |
Listing 12 |
With the original implementation of addition, the cost of the summation would have been:
- 2 copies to temporary values at 2n read/writes each
- 2 in-place additions at 2n read/writes each
This gives us a grand total of 8n read/write operations.
Recall that if we were willing to forego reuse of the in-place addition operator, we could remove one of the read/write operations from creating each of the temporary values, making the cost of summation:
- 2 additions at 2n read/writes each
- 2 copies to temporary values at n read/writes each
Reducing the total to 6n read/write operations.
With auto_vector , however, this becomes:
- 5 transfers to temporary values at O(1) read/writes each
- 2 in-place additions at 2n read/writes each
Giving us a final total of 4n + O(1) read/write operations. Not too shabby.
There are two principal disadvantages to this approach.
Firstly we have to write overloaded versions of every function and secondly we have to write a lot of boilerplate code.
Don't we?
The crux of the first problem is that we only want to overload a function if there is an advantage to us to do so. In other words, we only want to overload functions which can exploit the reuse of temporary objects. A lot of functions can't and we really don't want to make work for ourselves by having to overload them. See Listing 13, for example.
double abs(const vector &v) { double sum_square = 0.0; vector::const_iterator first = v.begin(); vector::const_iterator last = v.end(); while(first!=last) { sum_square += *first * *first; ++first; } return sqrt(sum_square); } |
Listing 13 |
Thankfully, in such cases it's perfectly OK to pass an auto_vector instead of a vector . This is because the compiler is allowed to bind a const reference to a temporary that is the result of a conversion. And a vector can be conversion constructed from an auto_vector .
The second problem is a little trickier to resolve.
auto_value
What we really need is a class that can manage the value transfer semantics for us. Much like auto_ptr does for pointers.
The class definition is actually pretty straightforward (Listing 14).
template<typename X> class auto_value { public: typedef X element_type; explicit auto_value(X &x) throw(); auto_value(const auto_value &x) throw(); auto_value & operator=(const auto_value &x) throw(); X & get() const throw(); private: mutable X x_; }; |
Listing 14 |
It's the implementation that's the problem.
What we really need to find is a generic way to transfer ownership of the controlled value to and from the auto_value .
Fortunately, for most of the types we are interested in, one already exists in swap .
Let's see how we can use it to implement the member functions of auto_value (Listing 15).
template<typename X> auto_value<X>::auto_value(X &x) { x_.swap(x); } template<typename X> auto_value<X>::auto_value(const auto_value &x) { x_.swap(x.get()); } template<typename X> auto_value<X> & auto_value<X>::operator=(const auto_value &x) { x_.swap(x.get()); return *this; } |
Listing 15 |
The chief problem with using swap rather than transferring the state directly is that we have to default construct a value before we can swap the transferred value in. This pretty much limits auto_value to types with relatively inexpensive default constructors.
Note that we've supplied an explicit transfer constructor for the original value. This is more in keeping with the auto_ptr interface and removes the need to add a release method to the class.
There's not much we can do about the conversion constructor and assignment operator that need to be implemented in the class itself.
Fortunately, these are pretty simple (Listing 16).
X::X(const auto_value<X> &x) { swap(x.get()); } X & X::operator=(const auto_value<X> &x) { swap(x.get()); return *this; } |
Listing 16 |
We could make it easier still if we were to abandon our sensibilities and define these functions inline using a macro.
I can't quite bring myself to write it though.
There's only one thing left that's really lacking from the auto_value interface and that's operator. . This would allow us to use the auto_value as a proxy for calls to the transferred object's member functions in the same way that operator* and operator-> do for auto_ptr .
Shame it doesn't exist.
Fortunately, there's another way to do this.
That other way is inheritance. If our auto_value were to inherit from rather than contain the value it controls, it would trivially be able to act as a proxy.
Let's have a look at the changes we need to make (Listing 17).
template<typename X> class auto_value : public X { public: typedef X element_type; explicit auto_value(X &x) throw(); auto_value(const auto_value &x) throw(); auto_value & operator=( const auto_value &x) throw(); }; |
Listing 17 |
The unfortunate side effect of this is that the auto_value and the value it controls are now the same entity, and hence a const auto_value implies a const value. Now, you'll recall that function return values can't be bound to non-const references and that swap is a non-const member function that has a non-const reference argument.
That's right. We can't use swap to implement our transfer semantics any more. Not unless we cast away the constness, that is (Listing 18).
template<typename X> auto_value<X>::auto_value(X &x) { swap(x); } template<typename X> auto_value<X>::auto_value(const auto_value &x) { swap(const_cast<auto_value &>(x)); } template<typename X> auto_value<X> & auto_value<X>::operator=(const auto_value &x) { swap(const_cast<auto_value &>(x)); return *this; } |
Listing 18 |
Pretty slick, I'm sure you'll agree.
There's just one tiny little problem with this code. Hardly even worth mentioning.
It's implementation-defined whether or not this will invoke the dreaded undefined behaviour.
There's a clause in the C++ standard stating that compilers are allowed to bind const references to function returns in one of two ways. They can either bind to the value itself, or they can copy it into a const value and bind to that.
Seems innocuous enough, but there's another clause that states that trying to modify a const value results in undefined behaviour. Which we all know could mean anything from doing exactly what you expect to washing away our coding sins with the cleansing fire of a thermonuclear explosion.
I can't think of any compiler vendors who'd go for the latter option though.
Well, on reflection...
Actually, no, not even them.
transfer
Fortunately we can avoid invoking undefined behaviour if we are willing to force users of auto_value to do a little additional work.
Specifically, we'll need them to implement an ownership transfer mechanism that will work for const objects. Let's call it transfer and have a look at how vector might implement it (Listing 19).
class vector { public: typedef double value_type; typedef double * iterator; typedef double const * const_iterator; typedef size_t size_type; typedef auto_value<vector> auto_vector; //... vector(); vector(const vector &v); vector(const auto_vector<T> &v); explicit vector(size_t n); vector & operator=(const vector &v); vector & operator=(const auto_vector &v); const_iterator begin() const; const_iterator end() const; iterator begin(); iterator end(); //... protected: void transfer(const auto_vector &v) const; private: mutable scoped_array<double> data_; }; |
Listing 19 |
Firstly, we've added a typedef that defines auto_vector as an auto_value of vector. Secondly, we've added a protected member function transfer to manage the ownership transfer. Finally, we've made the data_ member mutable so that the const transfer function can affect it.
A protected member function? Yeah, yeah, I know.
If you're really bothered by it you can grant friendship to auto_value<vector> and make it private instead. The point is that the transfer member must be accessible by the derived auto_value<vector> , but probably shouldn't be public since it will violate the perfectly reasonable expectation that const objects won't change.
Some of you may also balk at the thought of forcing client classes to make their state mutable.
Well, I can give two reasons why you shouldn't worry too much about it.
Firstly, we have no choice. By making the state mutable we are allowed to change it even if the object is really const. This enables us to neatly side step the clause in the standard that allows compilers to copy a return value into a const object.
Secondly, it doesn't matter. Const methods are already allowed to change the state of the object.
You may find the second point a little surprising, so I'll elaborate.
The vector class, like most container types, stores its state in a memory buffer. If you took a look at the definition of scoped_array , you'd notice that the access member functions are all const, but return non-const pointers or references. This means that the elements of the array can be changed even when accessed through a const method. Whilst this may seem a little counter intuitive, it's exactly how a pointer data member would behave. They key point to note is that constness doesn't propagate through the pointer.
Namely:
X * const x;
and:
X const * const x;
do not mean the same thing. The former defines an immutable pointer to a mutable value and the latter an immutable pointer to an immutable value, and it is the former that is implicitly applied to pointer data members in a const member function.
The behaviour we've come to expect from our container classes, that const member functions will not change the value of any items in the container, is enforced by the programmer rather than the compiler. When implementing container classes we must always be careful not to change the state through const member functions since the compiler is unlikely to warn us that we are doing so. Making the state mutable therefore has little effect on the effort we must spend designing the class.
Now we're finished discussing the design choices, let's take a look at the implementation:
void vector::transfer(const auto_value<vector> &v) const { data_.swap(v.data_); }
So transfer is actually just a swap for const objects, which shouldn't be particularly surprising since we've been using swap for ownership transfer from the start.
We'll need to provide new implementations of the conversion constructor and assignment operator used for ownership transfer (Listing 18).
Again, these shouldn't be particularly surprising. We've just replaced the calls to swap with calls to transfer .
Finally, we need to rewrite the auto_value member functions to make use of the transfer function (Listing 19).
Yet again, we have simply replaced the calls to swap with calls to transfer .
So, there we have it, a simple class implementing explicit ownership transfer that requires just a few trivial member functions and a relaxed attitude to constness in its client classes.
If we are willing to accept this compromise, auto_value provides the same functionality as a custom made value transfer type with a lot less work.
Recalling the auto_vector addition operators (Listing 20).
vector::vector(const auto_value<vector> &v) { transfer(v); } vector & vector::operator=(const auto_value<vector> &v) { transfer(v); return *this; } |
Listing 20 |
We can implement these operators in the same way as before, by constructing a temporary vector from an auto_vector argument (where there is one) and performing the addition in-place (Listing 21).
template<typename X> auto_value<X>::auto_value(X &x) { transfer(x); } template<typename X> auto_value<X>::auto_value(const auto_value &x) { transfer(x); } template<typename X> auto_value<X> & auto_value<X>::operator=(const auto_value &x) { transfer(x); } |
Listing 21 |
In fact, since the auto_value copy constructor transfers ownership, we can further simplify these operators by passing the auto_value arguments by value. For example:
auto_vector operator+(auto_vector l, const vector &r);
Now, instead of transferring the auto_vector into a temporary vector, we can exploit the fact that auto_vector inherits from vector to perform in-place addition directly (Listing 22), saving us a little bit of typing and a few calls to transfer.
typedef auto_value<vector> auto_vector; auto_vector operator+(const vector &l, const vector &r); auto_vector operator+(const auto_vector &l, const vector &r); auto_vector operator+(const vector &l, const auto_vector &r); auto_vector operator+(const auto_vector &l, const auto_vector &r); |
Listing 22 |
Once again, if we don't care about reusing temporaries we simply provide a single version of a function:
double abs(const vector &v);
As before, this function will work for both vectors and auto_vectors although now this is because auto_vector inherits from vector and can therefore be bound directly to the reference.
namespace mojo
In his article 'Generic<Programming>: Move Constructors' ( Dr Dobb's Journal , 2003), Alexandrescu describes the Mojo framework for automatically detecting temporaries and using move semantics for them.
By its very nature this technique must rely upon implicit, rather than explicit, conversion to transfer types and hence requires a little extra work to ensure that temporaries are bound to the correct transfer type.
The upshot of this is that three overloads, rather than two, must be provided for each function that exploits move semantics. Nevertheless, this is arguably a more sophisticated solution to the problem.
T &&t
To close, it's worth mentioning that the value transfer semantics we've worked so hard to implement are trivial using the above notation.
If you're wondering why on Earth I didn't just go ahead and use it instead of leading you down the garden path, it's because it isn't C++. Not yet.
The notation is for a new kind of reference proposed for the next version of C++, the rvalue reference.
The rvalue reference differs from the familiar reference (hereafter known as an lvalue reference) in eight ways. Paraphrasing the proposal slightly:
- A non-const rvalue can bind to a non-const rvalue reference.
- Overload resolution rules prefer binding rvalues to rvalue references and lvalues to lvalue references.
- Named rvalue references are treated as lvalues.
- Unnamed rvalue references are treated as rvalues.
- The result of casting an lvalue to an rvalue reference is treated as an rvalue.
- Where elision of copy constructor is currently allowed for function return values, the local object is implicitly cast to an rvalue reference.
- Reference collapsing rules are extended to include rvalue references.
- Template deduction rules allow detection of rvalue/lvalue status of bound argument.
For the purpose of eliminating copies, behaviours 1-3 are of particular interest. Put simply, they mean that a non-const temporary can be bound to a reference type through which we can legally modify them. Specifically, we now have four kinds of reference (Listing 23).
auto_vector operator+(const auto_vector &l, const vector &r) { vector tmp(l); tmp += r; return auto_vector(tmp); } |
Listing 23 |
In the same way that a non-const object is preferentially bound to a non-const reference, an rvalue is preferentially bound to an rvalue reference and an lvalue is preferentially bound to an lvalue reference.
Given the following function signatures:
f(T &&t); //1 f(T const &&t); //2 f(T &t); //3 f(T const &t); //4
and the following references:
T &&t1; T const && t2; T & t3; T const & t4;
The rules mean that:
t1 binds to overloads 1, 2, 3 and 4 in that order of preference
t2 binds to overloads 2 and 4 in that order of preference
t3 binds to overloads 3, 4, 1 and 2 in that order of preference
t4 binds to overloads 4 and 2 in that order of preference
The original justification for disallowing the binding of rvalues to non-const references was that it was generally a mistake to do so. Changing an object which is about to go out of scope and be destroyed will, after all, lose those changes.
This is especially nasty when the object in question is a temporary resulting from an implicit conversion (Listing 24).
auto_vector operator+(auto_vector l, const vector &r) { l += r; return l; } |
Listing 24 |
//non-const reference to rvalue typedef T && Ref1; //const reference to rvalue typedef T const && Ref2; //non-const reference to lvalue typedef T & Ref3; //const reference to lvalue typedef T const & Ref4; |
Listing 25 |
void f(long &i); void g() { char c = 0; f(c); //oops //... } |
Listing 26 |
Since the character c is promoted to a long with an implicit conversion, f receives a reference to the temporary long that is the result of that conversion. The upshot of which is that, generally to the surprise and consternation of the programmer, c never changes.
Move semantics, however, have brought to light a situation in which this is a valid thing to want to do. The new rvalue references provide a mechanism by which we can express that we deliberately wish to change the value of a temporary object whilst avoiding the original problem.
Specifically, overloading on non-const rvalue reference and const lvalue reference enables us to distinguish between destructively reusing a temporary object and non-destructively referring to a non-temporary object.
Which is, of course, exactly what we've been striving to achieve.
article::~ article()
So has this all been a tremendous waste of time?
I hope not.
Firstly, I hope that this has given you cause to think a little more about how you can exploit ownership control.
Secondly, I hope that you may find auto_value , or something like it, a useful stop gap.
And finally, I hope that you've enjoyed it.
If not, I refer you to Harris's Addendum:
Nyah, nyah. I can't hear you.
#include
[ Alexandrescu, 2003] Alexandrescu, 'Generic<Programming>: Move Constructors' ( Dr Dobb's Journal , 2003).
[ Hinnant, 2004] Hinnant, Abrahams and Dimov, 'A Proposal to Add an Rvalue Reference to the C++ Language' (ISO/IEC JTC1/SC22/WG21 Document Number N1690, 2004).
With thanks to Kevlin Henney for his review of this article and Astrid Osborn, Keith Garbutt and Niclas Sandstrom for proof reading it.