Blindspots

One problem with learning to program in any computer language is that of understanding the known idioms of the language. A subtler problem is that of recognising the implications of a technique or idiom.

Elsewhere in this issue you will find a strong criticism of one popular idiom of C++, the (ab)use of friend. The popularity of this idiom highlights one form of programmer blindspot. You can probably list many others of this kind where a faulty idiom has entered the language.

Main returns a value

Another example of a faulty idiom is that which is prevalent among programmers who seem to resent having the line:

return 0;

at the exit point of their main(). This problem has been aggravated by compiler implementors who support overloaded versions of main() with a void return so that programmers can now start their program with:

void main() {

This certainly makes the warning message go away but at the cost of introducing a defect in the program. Except in a free standing environment (if you do not know what one of those is, don't worry about it) your program will be executed under the supervision of some other program - usually, but not always, an operating system.

This supervising program is outside your control as a programmer. It 'knows' that executables return values and may well make use of those returned values. A particular supervising program may be able to handle a program that does not return a value - most, if not all, operating systems can do so. But what right have you as a programmer to expect that behaviour in all environments where your program may run? Exactly, you expect others to meet you expectations and you should reciprocate by meeting theirs: main() returns an int - always. If you don't do so explicitly it is being fixed up for you somewhere. Is it really worth a poor fix to save typing 5 characters ('void' versus 'return 0;')?

Note the difference between the return type and the parameter list. In the latter you are specifying what others must provide you while the former provides for the expectations of others.

The C++ committee recently decided to sanction this sloppiness regarding "main" - falling off the end of "main" is now equivalent to saying "return 0;" - but note that this is a special case for "main" only - Ed.

An example from C

We all know (don't we?) that the Standard C Library function gets() is defective (by the way, when you use functions from this family, do you remember to check the return value?) because there is no way to prevent over-running the end of the buffer provided. I know a number of otherwise expert C programmers who, in their books, advocate a 'fix' by providing a buffer whose size is BUFSIZ. e.g.

char buffer[BUFSIZ];
gets(buffer);

Why they believe that a macro providing the preferred size of a stream buffer should have any connection with the number of characters read before a newline character or eof condition is beyond me. Of course, in practice, such code will work quite happily on most systems until the day when someone redirects stdin away from the keyboard. Then you have an obscure bug that will be very difficult to locate.

Some members of X3J11 (the ANSI committee largely responsible for writing the ISO C standard) are very sensitive about criticism - they shouldn't be, they did a good job within the specifications laid down. They were supposed to standardise existing practice. They had little choice but to include the many defective functions (along side many excellent ones) that you can find in the Standard C Library. The problem is not that they standardised existing practice but that most authors have continued to advocate such poor quality methods, presumably because they are blind to the problems and/or solutions.

Actually what happens is one of two things. Some simply blindly follow the current practice without considering whether there is a better option. Others just assume that the reader will see all the implications of what they are writing and so don't bother to emphasise important usages of techniques they describe.

A good example of this is a possible use of enum instead of #define in C to provide compile time integer constants that respect scope and as such are more robust when the code is ported to C++. I had entirely missed this point until the Harpist pointed it out to me. He had only noticed it because of a C++ idiom of using anonymous enums instead of static const int in classes. e.g.

class X {
  static const int maxsize;
  char buffer[maxsize];
  ...
};

requires that maxsize be initialised outside the class interface because C++ does not support in-class initialisation.

On the other hand:

class Y {
  enum{maxsize=256};
  char buffer[maxsize];
  ...
};

works as is.

A conceptual blindspot

Because C lacks a bool type (now fixed in C++, very late) we have a most unfortunate conflict. '0' is taken to represent false and all other integer values represent true. That is when we are dealing in boolean logic terms.

When dealing with success or failure (conceptually 'succeeded ? true : false') we find that we want to distinguish the many potential modes of failure from the 'unique' success. So we find we represent success by '0' and various failure modes by distinct integers.

When dealing with comparisons on totally ordered values we have three values we wish to represent - 'equality', 'less than' and 'greater than' (actually there are six but mostly compare functions only distinguish 3). C, always a language for the fast optimisation spotted that '-ve' '0' and '+ve' met the bill. So we have strcmp() returning zero when the strings are equal. A real 'gotcha' because most programmers are thinking 'are they equal?' when they call strcmp().

Had C provided a true bool some of the above conflicts would have gone away. Even C++'s late addition of bool does not tackle this problem because, to preserve existing code, it provides an automatic two-way conversion from 'true' to 1 and 'false' to 0.

Egg on face

The reason that I am writing this article is that I also suffer from serious blindspots (which is one reason why I continue to resist those that try to persuade me to write a book, but enjoy writing articles where my mistakes can rapidly be corrected before any lasting damage results) and sometimes invent overly complicated solutions. The most recent example of this can be found in the complimentary copy of EXE Magazine that you received in November. There is an excellent article on side-stepping passing by value from Crosbie Fitch (that is in so far as he writes about the solution to the problem - I abhor his abuse of friend, I think he has 'volatile' wrong and he misquotes me. For the record most books do not describe either implicitly or explicitly the solution he provides. He is lucky in that he has obviously only read the good ones.)

I have been familiar with handle classes for quite some time. I also know about reference counting. Perhaps those authors who I criticise also know about these things and simply suffer from a more severe blindspot than I do.

For some reason I had never put together the two concepts in the context of supporting arithmetic operators. Because I think it is important I want to highlight the key elements of Fitch's solution.

Basically he provides a handle class that contains a pointer (or a reference? Well no, because we couldn't then change it to a different body.) to the class providing the value based object body with a reference counter to count the number of references currently made to that body.

The copy constructor simply increases the counter and the destructor decrements it. If the counter reaches zero the destructor also explicitly destroys the object body.

This is an elegant way to avoid hanging pointers/references. For example, assuming no optimisation:

Handle_Large operator + (Handle_Large left,
                         Handle_Large right){
  return left += right;
}

This is not exactly the code Crosbie Fitch produced because I thought I might as well emphasise the some of the possibilities and problems of applying value based semantics via this method.

My parameters 'left' and 'right' are value based so they are copied by the copy constructor. That is, the counters in their bodies are incremented.

The in-class operator += must check the counter of the left-hand argument and clone it if it is greater than 1. As Fitch mentions, you must check the counter whenever you modify a Handle_Large (similar to his CSimple).

Now the operator +() call will increment the counters in the bodies of the actual arguments passed in as 'left' and 'right'. The call to operator +=() will result in a new modified body for 'left'. The counter will be incremented by the return statement. At exit from operator +() the destructors for 'left' and 'right' will be called and in both cases that will result in simply decrementing the counters. 'left' will now continue to exist as an anonymous temporary until such time as it is finally destroyed in accordance with the rules of C++.

What cost?

Not much, the implementation is relatively clean though functions encapsulating single actions such as the reference update and modify check might benefit from being inline.

Care, as always, will be needed by maintenance programmers to ensure that all non-const member functions clone the body before modification. Don't forget that the same rule will apply to all member functions that take a non-const Handle_Large parameter.

Of course, one cost is that the copy constructor cannot be used for true copying. Does that matter? Does anyone know of an example where true copying is essential and cannot be fixed up?

I do wish that there were an efficient (I know the inefficient one) method for blocking a class from being used as a base class.

Final thoughts

Obviously there are rather too many authors out there who need to do some studying. If the only solution had been the one I originally proposed then I could understand why they implemented ineffective methods. The method Crosbie Fitch has drawn our attention to only emphasises my gut reaction that there are too many people writing books who should be learning C++ first.

I took a long time trying to unravel Crosbie Fitch's CSimple::operator =(). It starts with a peculiar assert() which seems to require CSimple object on the left of the assignment to already hold a non-null pointer. Frankly, I cannot understand why this should be required. He does not check for unnecessary assignment nor does he sign off that decision in a comment. I think he has tried to encapsulate that functionality in his Refer() but I am not sure he has done so correctly. This would lead to maintenance problems.

I am bound to say (and I hope none of you think this is sour grapes) that his commenting and coding styles leave me uncomfortable. What do you think? Please write to Overload (or even EXE Magazine) and let us know how easy you found understanding the code.