ACCU Home page ACCU Conference Page
Search Contact us ACCU at Flickr ACCU at GitHib ACCU at Google+ ACCU at Facebook ACCU at Linked-in ACCU at Twitter Skip Navigation

pinA Review of cow_ptr<type>

Overload Journal #32 - Jun 1999 + Programming Topics   Author: Jon Jagger

Intro

In my previous article I looked at a copy-on-write pointer class. I had planned that this article would follow in the normal way. Things rarely happen exactly as you plan them. Instead, I'm going to use this article to review the previous article[1]. I think this is an honest approach. It is after all how most developers work. You don't write code in a straight line.

Style follows audience

Astute reader Mr Kevlin Henney of Bristol kindly read through the previous article before I submitted it. He asked why I had mixed the style of my const declarations. Ah yes. In the previous article at one point I wrote this

class string::char_reference
{
  ...
private:
  string * const target;
  size_t const index;
};

I think my train of thought for this experiment started with target. There is no const-choice for target. If I want target to be const it has to be as it is. Ok, but what about index. If I want index to be const there are two choices…

const size_t index;  // conventional
size_t const index;  // not-so-conventional

The first obvious point is that using the not-so-conventional style for index means that target and index acquire similar looking declarations. There's something else too though. It was pretty subconscious but looking at it now and trying to vocalise the difference I think I can see it. It's a matter of emphasis. The not-so-conventional style places the const near to the identifier (with the effect that the type-name is always the leftmost identifier), whereas the conventional-style places the const near to the type-name. The question is what is const? Is it the type or is it the entity referred to by index? It's both really, but it comes back to types and expressions. Do you want to focus on the type of index or on the use of index in expressions? Given that index is a private data member it's fair to say that the primary audience for index is the class-implementor. And of course they will be using index in expressions. Clear as mud?

However, at another point in the article I wrote…

cow_ptr(const cow_ptr & rhs); // c'tor declaration

To make this consistent with the index-style, surely it would need to be

cow_ptr(cow_ptr const & rhs);

I don't think so. This is a very different case. This is public method. I think it's fair to say that the primary audience for this is the class-clients. They won't be using rhs in expressions. They'll be supplying something for rhs to bind to. Their main consideration is the type of rhs. So, given that I've chosen to use the expression-style when I want to focus on the implementation it makes sense to use the type-style when I want to focus on the interface. It provides a nice interface/implementation distinction too.

There something else to add to this. It's the observation that these function declarations must have a matching function definition. Should the function definition use the type-style or the expression-style? I think that it should use the type-style…

template<typename type>
cow_ptr<type>::cow_ptr(const cow_ptr & rhs) 
…
{
  …
}

Note though that I'm happy for the names of the parameters to be different from declaration to definition. I think that's entirely reasonable. Again it's all about the target audience. For the declaration the audience are the users, and their focus should be what is happening. For the definition the audience is the implementor, and their focus is how it's happening. I think that's a healthy difference.

So there you have it; my limited excursion into expression-style. I don't go the whole hog with it. But some do. Dan Saks does. Francis uses it in his EXE columns too. I recall one reason for their enthusiastic embrace was based on reading declarations. The argument is that you can then read declarations from right to left.

size_t const index; 

Index is a const size_t. It does appear to work. Don't get me wrong now - I think reading code is very important. Code is rarely spoken aloud - it's read. Which is why how code looks is so important. Layout matters. It's just that I don't think that you read code right-to-left or left-to-right! I think an experienced C/C++ programmer slurps up whole statements at a time.

Style follows style

One more style issue. Member initialisation lists. In my original Overload 31 submission I wrote…

template<typename type>
cow_ptr<type>::cow_ptr(const cow_ptr & rhs)
  : ptr(rhs.ptr)
  , count(rhs.count)
{
  …
}

However, in the published article it ended up as…

template<typename type>
cow_ptr<type>::cow_ptr(const cow_ptr & rhs)
  : ptr(rhs.ptr), count(rhs.count)
{
  …
}

The former style, with each initialisation on its own line, is the style I prefer for member initialisation lists. There are a couple of reasons. Firstly it just fits on the page better. Particularly if I have a couple of longish identifiers. Secondly it separates the colon-comma punctuation from the essential initialisations. Thirdly it matches closely the declaration style I use in the class definition. In the class definition I declare identifiers one per line. I write….

template<typename type>
class cow_ptr
{
  …
private: // state
  type * ptr;
  size_t * count;
};

I don't write…

template<typename type>
class cow_ptr
{
  …
private: // state
  type * ptr;  size_t * count;
};

It all comes back to simplicity. See below.

Exception unsafety

Here's the constructor for cow_ptr<> again

template<typename type>
cow_ptr<type>::cow_ptr(type * p)
  : ptr(p), count(new size_t(1))
{
  // all done
}

and here's how it was used…

cow_ptr<type> unshared(new type(*ptr));

This is not exception safe. And remember this is a template class. We don't know how big the potential leak is because we don't know what type is. Ooops.

template<typename type>
cow_ptr<type>::cow_ptr(type * p)
{
  auto_ptr<type> safe(p);
  count = new size_t(1);
  ptr = safe.release();
}

That's better.

Refactor

The longer I work in software the more I become convinced that simplicity is the key. Yet somehow it's really hard to write really simple code. It's as though the programmer's psyche somehow rejects simplicity. I can imagine a hard-core hacker pointing to a particularly dense piece of code they've just written and proclaiming loudly "I wrote that. Bet you can't understand it. I can. I'm dead 'ard me". ;-> Simplicity is one of those things that is hard to define but you know it when you see it. To quote John Pawson,

"The minimum could be defined as the perfection that an artefact achieves when it is no longer possible to improve it by subtraction" [Pawson].

Code-simplicity and code-duplication don't sit well together. I'm afraid there was some code-duplication in my previous article. I wrote…

template<typename type>
cow_ptr<type> &
cow_ptr<type>::operator=(const cow_ptr & rhs)
{
  ++*rhs.count;
  if (--*count == 0)
  {
    delete count;
    delete ptr;
  }
  ptr = rhs.ptr;
  count = rhs.count;
  return *this;  
}

template<typename type>
cow_ptr<type> &
cow_ptr<type>::~cow_ptr()
{
  if (--*count == 0)
  {
    delete count;
    delete ptr;
  }
}

The duplication is plain to see. Let's try refactoring to avoid the duplication. One approach is to simply extract the common code and put it in a new private method.

template<typename type>
void cow_ptr<type>::decrement() const
{
  if (--*count == 0)
  {
    delete count;
    delete ptr;
  }
}

template<typename type>
cow_ptr<type> &
cow_ptr<type>::operator=(const cow_ptr & rhs)
{
  ++*rhs.count;
  decrement();
  ptr = rhs.ptr;
  count = rhs.count;
  return *this;  
}

template<typename type>
cow_ptr<type> &
cow_ptr<type>::~cow_ptr()
{
  decrement();
}

Now suppose we wanted to clear the cow_ptr pointer, that is to reset it's pointer to null. At the moment we would have to jump through a small hoop…

cow_ptr<type> ptr(new type(...));
...
ptr = cow_ptr<type>(0);  // clear pointer 

It's a small but perhaps not-so-obvious step to promote the private decrement() into a public clear().

template<typename type>
cow_ptr<type> &
cow_ptr<type>::~cow_ptr()
{
  clear();
}

template<typename type>
cow_ptr<type>::operator=(const cow_ptr & rhs)
{
  ++*rhs.count;
  clear();
  ptr = rhs.ptr;
  count = rhs.count;
  return *this;  
}

template<typename type>
void cow_ptr<type>::clear()
{
  if (--*count == 0)
  {
    delete count;
    delete ptr;

    count = 0;
    ptr = 0;
  }
}

There's still a potential problem with the assignment operator though. It's not exception safe. To fix this we can use an approach based on the "Resource Acquisition is Initialisation" idiom[2]. This requires a new constructor, which would be private…

tempate<typename type>
cow_ptr<type>::cow_ptr(type * p, size_t * c) throw()
: ptr(p)
, count(c )
{
  // all done
}

template<typename type>
cow_ptr<type> &
cow_ptr<type>::operator=(const cow_ptr & rhs)
{
  cow_ptr old_self(ptr, count);
  ptr = rhs.ptr;
  count = rhs.count;
  ++*count;
  return *this;  
}

Refactor

I'm afraid there was also some duplication elsewhere...

string::body::body(const char * literal)
{
cached_length = ::strlen(literal);
  text = new char[cached_length + 1];
  ::strcpy(text, literal);
}

string::body::body(const body & rhs)
  : text(new char[rhs.cached_length + 1])
  , cached_length(rhs.cached_length)
{
  ::strcpy(text, rhs.text);
}

Code on a printed page feels different to code on the screen. These grate on my sensibilities. The first assigns to cached_length and then to text. The second initialises text and then cached_length. Refactor. Here's my first attempt…

string::body::initialise(const char * literal)
{
cached_length = ::strlen(literal) ;
  text = new char[cached_length + 1];
  ::strcpy(text, literal);
}

string::body::body(const char * literal)
{
  initialise(literal);
}

string::body::body(const body & rhs)
{
  initialise(rhs.text);
}

This is better. I can still see a flaw though. The copy constructor is needlessly recalculating the length of the text. Here's my second attempt…

string::body::initialise(const char * literal, size_t length)
{
cached_length = length;
  text = new char[length + 1];
  ::strcpy(text, literal);
}

string::body::body(const char * literal)
{
  initialise(literal, ::strlen(literal));
}

string::body::body(const body & rhs)
{
  initialise(rhs.text, rhs.cached_length);
}

Ban all duplicate code.

Explicit

Following on nicely let's have a careful look at those two string::body constructors. The sole purpose of these two constructors is to be called explicitly. There is no reason I can think of why you'd want to use string::body objects as value parameters to a function, or as value return types from a function. They can be made explicit.

struct string::body
{
  explicit body(const char * literal);  
  explicit body(const body & rhs);
  ...
};

Typos

There were a couple of typos in the previous article. A declaration somehow acquired an extra right-parentheses[3]. The compiler would have balked at that of course. The more insidious typos are always the ones that aren't compile-time error. Such as...

class string::char_reference
{
  ...
  char_reference operator=(char new_value);
  ...
};

This assignment operator returns a char_reference by value. It should of course return a char_reference by reference.

class string::char_reference
{
  ...
  char_reference & operator=(char new_value);
  ...
};

My thanks again to Kevlin for several comments that improved this article. That's all for now. Cheers, JJ.

References

[Pawson] Minimum. by John Pawson. Phaidon Press. 0 7148 3817 9. I first learned of Minimum in a reference in one of Kevlin's talks - The Road to Software Architecture. You can find all his talks at http://techland.qatraining.com/.



[1] I'm tempted to repeat this as a pattern: article,review; article,review; article,review...

[2] What a catchy name ;->

[3] Also known as "theses". And of course the left-parentheses is then called "paren".

Overload Journal #32 - Jun 1999 + Programming Topics