ACCU Home page ACCU Conference Page ACCU 2017 Conference Registration Page
Search Contact us ACCU at Flickr ACCU at GitHib ACCU at Google+ ACCU at Facebook ACCU at Linked-in ACCU at Twitter Skip Navigation

pinauto_value: Transfer Semantics for Value Types

Overload Journal #80 - Aug 2007 + Programming Topics + Design of applications and programs   Author: Richard Harris
"Copy On Write" (COW) sounds like an ideal idiom for avoiding expensive copies. But care must be taken to avoid producing a "mad cow".

Last time we took a look at the various flavours of smart pointer and I suggested that auto_ptr could be simplified by separating the lifetime control and ownership transfer responsibilities into two classes, scoped_ptr and auto_ptr respectively.

I closed with a brief mention of the copy-on-write, or COW for short, optimisation for strings and the fact that it relies upon one of the smart pointers, shared_ptr.

This time, we'll take a detailed look at string.

string

Before we describe how COW works, let's take a look at a naïve implementation of a string class (Listing 1).

    class string  
    {  
    public:  
      typedef char         value_type;  
      typedef char *       iterator;  
      typedef char const * const_iterator;  
      typedef size_t       size_type;  
      //...  
 
      string();  
      string(const char *s);  
      string(const string &s);  
 
      string & operator=(const string &s);  
      string & operator=(const char *s);  
 
      const_iterator begin() const;  
      const_iterator end() const;  
      iterator begin();  
      iterator end();  
      //...  
 
    private:  
      size_type size_;  
      scoped_array<char> data_;  
    };  
Listing 1

A scoped_array (identical to scoped_ptr, except that it uses delete[] instead of delete) ensures that the data is deleted when a string goes out of scope.

The constructors are pretty straightforward (Listing 2).

    string::string() : size_(0), data_(0)  
    {  
    }  

    string::string(const char *s) : size_(strlen(s)),  
       data_(size)  
    {  
      std::copy(s, s_size_, data_.get());  
    }  
 
    string::string(const string &s) : size_(s.size_),  
       data_(new char[size_])  
    {  
      std::copy(s.data_.get(), s.data_.get()+size_,  
         data_.get());  
    }  
Listing 2

The copy and conversion constructors allocate new strings whose lifetimes are managed by the scoped_array member data_. The constructor bodies then simply copy the strings. Nothing particularly surprising.

The assignment operators are a little more complex, though. These days the recommended way to implement assignment is with the copy and swap idiom.

This is usually expressed as:

      T &  
      T::operator=(const T &t)  
      {  
        T tmp(t);  
        swap(tmp);  
        return *this;  
      }  

The chief advantage of this approach (besides its relative simplicity) is that it is guaranteed to leave the object in its original state if an exception is thrown during the copy operation. This is because we don't commit the change until we swap the object with the temporary copy and the call to swap is guaranteed not to throw.

In the case of string, the swap member function could be defined as:

      void  
      string::swap(string &s)  
      {  
        std::swap(size_, s.size_);  
        std::swap(data_, s.data_);  
      }  

Unfortunately std::swap will need to assign a new value to both data_ and s.data_ and since scoped_array doesn't allow us to rebind the pointer, it doesn't provide an assignment operator or reset member function. If we want scoped_array to consider classes as scopes, we'll need to provide some mechanism to enable assignment.

Adding assignment operators to scoped_array would dramatically weaken its guarantee that it will only ever point to one array.

Thankfully, we can achieve what we want by adding a swap member function instead, defined as follows:

      void  
      scoped_array::swap(scoped_array &a)  
      {  
        std::swap(x_, a.x_);  
      }  

This still weakens scoped_array's guarantee, but in a way that's less appealing to use for ownership transfer than an assignment operator or reset member function. Well, that's my story and I'm sticking to it.

We can now redefine string's swap member:

      void  
      string::swap(string &s)  
      {  
        std::swap(size_, s.size_);  
        data_.swap(s.data_);  
      }  

Now, the naïve approach is perfectly good for short strings, but those string copies in the constructors, and indirectly in the assignment operator, start to look pretty ominous in the face of very long strings.

For example, consider the sequence of events when storing the result of a function call (Listing 3).

    string  
    f()  
    {  
      string s("hello, world");  
      return s;  
    }  
 
    void  
    g()  
    {  
      string s = f();  
    }    
  
Listing 3

So what exactly happens when we call g?

      call g  
 
      call f  
      copy "hello, world" into s  
      copy s into return temporary  
      exit f  
 
      copy return temporary into s  
      exit g  

Yikes. I count two completely unnecessary copies of the data. Now that's not such a problem for "hello, world", but will hit hard for the King James Bible, for example.

So how does copy-on-write help?

Well, it seeks to eliminate unnecessary copies by deferring them until they can no longer be avoided. This is typically when an element of the string is about to be changed, hence the name. Until that moment, a 'copy' of a string simply holds a reference to the original.

This is related to, but subtly different from, reference counting for shared ownership. Unlike shared ownership, COW allows multiple references to the underlying data of an object only so long as the external behaviour is unaffected.

With shared ownership, we expect multiple references to be able to change the observed state of the object. With COW, this must be avoided at all costs. We use reference counting merely to avoid making multiple copies of provably identical data.

For example:

    void  
    f()  
    {  
      string s1("hello, world");  
      string s2(s1); //s2 can hold a reference to 
                     //s1's data here  
      std ::cout << s2 << std::endl;  
      s2.replace(0, 5, "goodbye");  //s2 must copy 
                                    //s1's data here  
    }  

Until we actually change the state of s2 in the last line of f, the behaviour of the function is unchanged whether s1 and s2 share their state or not.

Let's have a look at how we might implement a string class that supports COW (Listing 4).

    class string  
    {  
    public:  
      typedef char         value_type;  
      typedef char *       iterator;  
      typedef char const * const_iterator;  
      typedef size_t       size_type;  
      //...  
 
      string();  
      string(const char *s);  
      string(const string &s);  
 
      string & operator=(const string &s);  
      string & operator=(const char *s);  
 
      const_iterator begin() const;  
      const_iterator end() const;  
      iterator begin();  
      iterator end();  
      //...  
 
    private:  
      size_type size_;  
      shared_array<char> data_;  
    };  
  
Listing 4

The point to note about this definition is that the data is stored in a shared_array (like shared_ptr, except that it uses delete[]) rather than as a scoped_ptr. With this we will share, rather than copy, data for as long as possible.

To see how this works, let's take a closer look at a few member functions.

First, the constructors (Listing 5).

    string::string() : size_(0), data_(0)  
    {  
    }  
 
    string::string(const char *s) : size_( strlen(s)),  
                                    data_(new char[size_])  
    {  
      std::copy(s, s+size_, data_.get());  
    }
 
    string::string(const string &s) : size_(s.size_),  
                                      data_(s.data_)  
    {  
    }    
  
Listing 5

The sharp-eyed amongst you will have noticed that the copy constructor is superfluous since the compiler generated version would have done exactly the same thing.

The important point though is that when we construct a string with a C-style point to character array, the data is copied, whereas when we copy construct a string, the data is shared.

Now let's have a look at the two flavours of begin to see how we make sure that we don't accidentally change a string's value through this shared data:

    string::const_iterator  
    string::begin() const  
    {  
      return data_.get();  
    }  

Well, the first version is pretty simple. Since we're returning a const_iterator (defined as a pointer to const T), it's not possible to change the contents of the string so we can simply return the pointer to the start of the character data.

Granted, const_cast could throw a spanner in the works, but I doubt anyone would be too upset if we simply declared casting between iterator types undefined behaviour and ignored the problem.

Clearly, it's the non-const version of the function that is of interest:

    string::iterator  
    string::begin()  
    {  
      if(data_.get() && !data_.unique())  
      {  
        char *s = data_.get();  
        data_.reset(new char[size_]);  
        std::copy(s, s+size_, data_.get());  
      }  
 
      return data_.get();  
    }  

And here we see the mechanism at work. If more than one string is referring to the data, we copy it before we allow a means to change it to escape. The same check for uniqueness and subsequent copy must be present in every function that presents an opportunity to change the string.

In the following example:

    void  
    f()  
    {  
      const string s1("hello, world");  
      string s2(s1);  
      string::iterator i = s2.begin();  
      *i = 'y';  
    }  

we have the following sequence of events:

    construct s1  
    copy "hello, world"  
 
    construct s2  
    share "hello, world"  
 
    s2.begin  
    fail uniqueness check  
    copy data  
    return iterator  
 
    assign 'y' to start of s2  

Given that we have already established the sharpness of your eyes, I have no doubt that you have raced ahead of me and spotted that this code isn't remotely fit for purpose.

To hammer home why, consider the following example:

    void  
    f()  
    {  
      string s2("hello, world");  
      string::iterator i = s2.begin();  
      const string s1(s2);  
      *i = 'y'; //oops, s1 has changed too
    }  

Let's take a look at the sequence of events this time:

    construct s2  
    copy "hello, world"  
 
    s2.begin  
    pass uniqueness check  
    return iterator  
 
    construct s1  
    share "hello, world"  
 
    assign 'y' to start of s1 and s2  

It seems there was a potential alias that we overlooked, the iterator itself.

This is harder than I expected. Perhaps I should be more forgiving of my unnamed library vendor.

Worse still, even if we correctly identify and protect all possible aliases, we still have made a rather sweeping assumption. Namely, that strings will only be referred to by a single thread.

In a multi-threaded program, it is quite possible that a COW string's data could be manipulated by more than one thread at the same time. To illustrate the problems that this can lead to, consider what happens when two copies are made of a string.

    void  
    f(const string &s)  
    {  
      string s1(s);  
    }  
 
    void  
    g(const string &s)  
    {  
      string g1(s);  
    }  

In a single threaded program, f and g must be called sequentially:

        string s("hello, world");  
        f(s);  
        g(s);  

leading to the following sequence of events:

        construct s  
        copy "hello, world"  
 
        call f  
        share "hello, world"  
        release "hello, world"  
        exit f  
 
        call g  
        share "hello, world"  
        release "hello, world"  
        exit g  

and everything goes smoothly.

For multi-threaded programs, the problem is that sharing and releasing the string data are not atomic operations. Specifically, they must read the reference count, manipulate it and finally write it. Unfortunately a thread may be interrupted during these steps.

For example, if we were to call f and g on separate threads:

        string s("hello, world");  
        run_threaded(f, s);  
        run_threaded(g, s);  

we might observe the following sequence of events:

    construct s  
    copy "hello, world"  
 
    call f  
    call g  
 
    f: read reference count      (count == 1)  
    f: increment reference count (count == 2)  
    g: read reference count      (count == 1)  
    f: write reference count     (count == 2)  
    g: increment reference count (count == 2)  
    g: write reference count     (count == 2)  

oops

Because g read the reference count before f had committed its change, we've lost the record of f's interest in the string. This is certainly going to lead to interesting behaviour at some point in the program.

To protect ourselves from this eventuality, we need to ensure that only one thread can read or manipulate the reference count at any given time.

Fortunately there's a specific threading tool to do this, the mutex (or mutual exclusion) lock. Unfortunately it's not free. And since we need to lock each uniqueness check we'll need a lot of locks. So many that it's perfectly possible that our checks end up taking longer than making a copy of the string, rendering the entire approach rather pointless.

I've heard of at least one string implementation that sought to reduce the number of mutex locks by having a single mutex for every string in the program. This does cut down on the cost of creating locks, but has the unfortunate side effect that every access of every string is synchronised, rather defeating the point of multi-threaded string manipulation.

So is it worth it?

For my part, the chief justification for using COW is that I hate temporaries. Or, more accurately, I hate the expense of copying them left, right and centre. Winds me right up, it does.

Recall that with our naïve string, storing the result of a function call involved two unnecessary copies of the data.

Hold on a sec.

Did I say two unnecessary copies?

Doesn't the C++ language specifically allow compilers to avoid making unnecessary copies?

There I go with my little lies again. It's been quite a while since compilers would create any temporaries at all in this situation. C++ specifically allows such temporaries to be side-stepped ('elided', in the terminology of the standard) and the result of the call to f to be constructed directly into s.

There are two situations where a C++ compiler is legally allowed to avoid copying an object.

Firstly when a function return expression is the name of a local object of the same type as the return value, the object can be constructed directly into the return value rather than copied. For example:

    string  
    f()  
    {  
      string s;  
      s = "hello, world";  
      return s;  
    }  

In this case, rather than constructing s, manipulating it and copying it into the return value a C++ compiler can, by treating s as an alias for the return value, simply construct the return value and manipulate it directly, saving a copy.

Secondly when a temporary that has not been bound to a reference would be copied to an object of the same type, the temporary can be constructed directly into the object. For example:

    string t = f();  

In this case, the temporary return value of f can be treated as an alias of t, saving a copy.

Note that the two optimisations can be applied to the same statement. In the above example, this means that the string s in the function f can be treated as an alias for the string t, effectively eliminating two copies.

So are there any situations where unnecessary copies still exist?

Well, firstly there's the case when strings are passed by value, but are not consequently changed. But const references capture this behaviour perfectly, so this doesn't seem to be particularly compelling.

Secondly there's the case when a function has multiple points of return. For example:

    string  
    f(bool b)  
    {  
      string s, t;  
      s = "hello, world";  
      t = "goodbye, world";  
 
      if(b) return s;  
      return t;  
    }  

In such cases it can be difficult for the compiler to predict which of the return expressions should be treated as an alias for the return value. In this case, should the compiler pick s or t?

Of course, a simple restructuring of the code would make things easier for our beleaguered compiler:

      string  
      f(bool b)  
      {  
        string s;  
        if(b) s = "hello, world";  
        else  s = "goodbye, world";  
        return s;  
      }  

But there will inevitably be situations where such restructuring is difficult, or at least unwieldy.

Finally there's the case when a temporary is assigned to a string. In this case COW may save us a copy. (Listing 6.)

    string  
    f()  
    {  
      string s("hello, world");  
      return s;  
    }  
 
    string  
    g()  
    {  
      string s("goodbye, world");  
      return s;  
    }  
 
    void  
    h()  
    {  
      string s = f();  
      //...  
      s = g(); //COW may save us a copy here
    }  
  
Listing 6

The reason why COW won't always save a copy in this case is a little subtle.

As I mentioned before, the recommended way to implement assignment is with the copy and swap idiom:

    T &  
    T::operator=(const T &t)  
    {  
      T tmp(t);  
      swap(tmp);  
      return *this;  
    }  

In this form, we will definitely pay for a copy during assignment of a naïve string. This is because the compiler binds the temporary to a const reference, rather than to a newly constructed object so it can't be elided. On the following line we copy construct the temporary, but by then it's too late, since the function can't know if the reference is to a temporary or a named variable.

Fortunately, we can rewrite the code to take advantage of copy elision:

    T &  
    T::operator=(T t)  
    {  
      swap(t);  
      return *this;  
    }  

Now the temporary is passed directly to the copy constructor of the named argument, and is therefore a candidate for the copy elision rule. The result of a function call can be constructed directly into t which is then swapped with the object.

So that's that for COW then, isn't it?

Well, COW can also save us a copy or two when we might need to change a string, but aren't certain.

For example, consider a function to strip a trailing newline from a string (Listing 7).

    string   
    f(string s)   
    {   
      if(!s.empty() && *s.rbegin()=='\n')  
      {  
        return s.substr(0, s.size()-1);  
      }  
      else  
      {  
        return s;  
      }  
    }   
 
    void   
    g()   
    {   
      string s("Hello, world\n");   
      //...   
      std::cout << f(s) << std::endl;   
    }  
 
    void   
    h()   
    {   
      string s("Hello, world");   
      //...   
      std::cout << f(s) << std::endl;   
    }  
Listing 7

In the function g, string makes a copy since the original string has a trailing newline. In h, however, no change is required so no copy will be made.

We can get the same benefit, however, if we rewrite the code so that a copy is only made if we need one. For example, Listing 8 or Listing 9.

    void   
    f(const string &s)   
    {   
      if(!s.empty() && *s.rbegin()=='\n')  
      {  
        std::cout << s.substr(0, s.size()-1) << std::endl;  
      }  
      else  
      {  
        std::cout << s << std::endl;  
      }  
    }  
  
Listing 8

    void   
    g()   
    {   
      string s("Hello, world\n");   
      //...   
      if(!s.empty() && *s.rbegin()=='\n')  
      {  
        std::cout << s.substr(0, s.size()-1) << std::endl;  
      }  
      else  
      {  
        std::cout << s << std::endl;  
      }  
    }  
 
    void   
    h()   
    {   
      string s("Hello, world");   
      //...   
      if(!s.empty() && *s.rbegin()=='\n')  
      {  
        std::cout << s.substr(0, s.size()-1) << std::endl;  
      }  
      else  
      {  
        std::cout << s << std::endl;   
      }  
    }  
  
Listing 9

OK, I'll admit it, this isn't really a very compelling argument. We'd either need a different version of the function for every operation we want to perform on the resulting string or we'd have to let the newline stripping logic leak out into the calling function, neither of which are particularly attractive prospects.

Well, that and the fact that a more sophisticated COW string wouldn't make a copy if you were just creating a sub-string. Adding offset and length members to string would allow sub strings to share a reference into the original string, delaying the copy until we do something really destructive like change some characters.

Nevertheless, I contend that this aspect of COW does not confer that big an advantage. Consider Listing 10.

    void   
    f(const string &s)   
    {   
      if(!s.empty())   
      {   
        string::const_iterator first = s.begin();   
        string::const_iterator last  = s.end();   
 
        --last;   
        while(first!=last) std::cout << *first++;   
        if(*first!='\n')   std::cout << *first;   
      }  
    }  
  
Listing 10

This still has the unfortunate property that we'd need one function for each operation, but gains the significant advantage of making no copies whatsoever. Furthermore, with a few minor changes, we have something that looks suspiciously like a function from <algorithm> (Listing 11).

    template<class BidIt, class T, class UnOp>   
    void   
    f(BidIt first, BidIt last, T t, UnOp op = UnOp())   
    {   
      if(first!=last)   
      {   
        --last;   
        while(first!=last) op(*first++);   
        if(*first!=t)      op(*first);   
      }   
    }  
  
Listing 11

This being a generic algorithm to perform an operation, in our case printing, on every element in the iterator range except the last if it is equal to t.

Personally, I tend to view the STL less as a library than as a way of life, or at least I did until my girlfriend threatened to leave me if I didn't start washing once in a while. Now I guess I see it more as a way of programming.

If the STL doesn't have the algorithm or data structure I want I implement it myself and add it to my own extensions library. This can be a lot of work at the outset, but starts paying dividends fairly quickly.

I doubt I'd consider this a candidate for my extensions library, but my point is that if you really care about the efficiency of your string processing, I very much doubt that you would rely on delayed copy optimisation. A much better approach is to think very carefully about the algorithms you need to perform and to implement them in an efficient manner.

So, given my assertion that we can write our code in such a way that copy-on-write optimisation is unnecessary and that it has synchronisation problems to boot, is this kind of approach really worth further consideration?

Well, the growing opinion is no. Some string implementations (notably STLport) are turning to alternative optimisations such as the short string optimisation and expression templates.

The short string optimisation involves adding a small array member to the string class. (Listing 12, for example.)

    class string  
    {  
    public:  
      typedef char         value_type;  
      typedef char *       iterator;  
      typedef char const * const_iterator;  
      //...  
 
      string();  
      string(const char *s);  
      string(const string &s);  
 
      string & operator=(const string &s);  
      string & operator=(const char *s);  
 
      const_iterator begin() const;  
      const_iterator end() const;  
      iterator begin();  
      iterator end();  
      //...  
 
    private:  
      char data_[16];  
      char *begin_;  
      char *end_;  
    };  
  
Listing 12

When the string is less than 16 characters long the array data_ can be used to store it in its entirety, eliminating the need for a relatively expensive allocation when copying. Longer strings must be allocated from the free store as usual. The begin_ and end_ pointers are used both to manage the extent of the string and determine whether the internal array or the free store have been used to store it, enabling the destructor to ensure that the memory is correctly released when the string is destroyed.

Note that this optimisation does not reduce the number of characters that are copied when strings are copied. Instead it relies upon the speed of allocating and copying memory on the stack rather than the free store.

Expression templates are altogether more complicated beasts and their precise mechanics are beyond the scope of this article. Put simply they work by deferring string manipulation operations until the result is actually assigned to something. For example, in the code snippet:

    string s1 = "Hello";  
    string s2 = "world";  
    string s3 = s1 + ", " + s2;  

the strings s1, ", " and s2 are not actually concatenated until s3's constructor requires the result. At this point a triple concatenation is performed, rather than the two double concatenations that a simple string implementation would perform. This saves both the creation of a temporary sub string and copying it into the final string.

In fact, expression templates are a very powerful technique that can be used for far more sophisticated optimisations than simply eliminating copies.

Despite this, I'm not yet willing to dismiss COW like optimisations out of hand, although you'll have to wait until next time before I explain why.

Acknowledgements

With thanks to Kevlin Henney for his review of this article and Astrid Osborn, Keith Garbutt and Niclas Sandstrom for proof reading it.

Overload Journal #80 - Aug 2007 + Programming Topics + Design of applications and programs