pinQuality Matters #6: Exceptions for Practically-Unrecoverable Conditions

Overload Journal #99 - October 2010 + Programming Topics   Author: Matthew Wilson
Being robust is harder than you think. Matthew Wilson analyses a classic program.

This is the second in a series of instalments on exceptions. In the last instalment [QM-5] I considered a taxonomic perspective of program states and actions, and suggested a new vocabulary for the four defined states: normative, recoverable, practically-unrecoverable, and faulted. In this instalment I'm going to focus on the simplest proper use of exceptions, for reporting practically-unrecoverable conditions.

It is currently envisaged that there will be two more instalments in this mini-series. The next will deal with the much more challenging situation of using exceptions for recoverable conditions, including the non-trivial issue of deciding whether a given exception should be treated as recoverable or practically-unrecoverable.

The fourth and (hopefully) final part will suggest good practices in exception definition and use, look at how exceptions and threads work together, and consider the effects of the use (or non-use) of exceptions on the software quality characteristics of software libraries and programs.

The major part of this instalment comprises a surprisingly involved look at the classic "Hello, World" program, illustrating how its implicit exception-handling is a bad example for any non-trivial programs. I'll then proffer a practical example from my own work as a production-quality main() that adequately catches and processes exceptions representing practically-unrecoverable conditions. Finally, I'll look at how implementing C-APIs in C++ brings a blessed discipline to the catching of exceptions, albeit at a significant cost in effort and/or diagnostic flexibility.

Hello, World

In The C Programming Language [K&R], the eponymous hello-world is given as:

   // hello-world.0.c  
      #include <stdio.h>  
      int main()  
      {  
        printf("hello, world\n");  
      }  

It's expressive, transparent, portable, efficient, and it's almost correct. As a basis for all C programs of any sophistication it is reasonable, too. Execution passes to the program via entry to main() and, unless exit() (or equivalent) is called by the functions that are called from within main(), execution completes as main() returns.

However, as for completeness, specifically for correctness/robustness/reliability, there's a problem: what happens if printf() fails? (This could be the case if the program's output was redirected to a file that could not receive the 13 or 14 bytes of the message.)

Let's consider the issue by expanding the example. First, let's deal with the implicit return. In case you're not familiar with this form (which I happen to hate with a passion), although main() must have a return type of int, it is allowed to have no explicit return statement. The C standard states, in clause 5.1.2.2.3, that 'reaching the } that terminates the main function returns a value of 0'. So, the above code is equivalent to:

   // hello-world.1.c  
      #include <stdio.h>  
      int main()  
      {  
        printf("hello, world\n");  
        return 0;  
      }  

For the more pedantic, such as your humble author, this should be written with more explicit meaning, as:

      // hello-world.2.c  
      #include <stdio.h>  
      #include <stdlib.h>  
      int main()  
      {  
        printf("hello, world\n");  
        return EXIT_SUCCESS;  
      }  

The C standard defines (in clause 7.20.4.3;5) the macro EXIT_SUCCESS (in stdlib.h) to be equivalent to the value 0, and that both represent 'successful termination' of the process. (In every case where I've checked, EXIT_SUCCESS is defined as being 0, so you can safely ignore the possibly of two distinct successful termination values.) It also defines the macro EXIT_FAILURE (also in stdlib.h), whose value is also implementation-defined, to represent 'unsuccessful termination'. In every case where I've checked EXIT_FAILURE is defined as being 1, but that still does not make it appropriate to return 1 in your code. The standard requires an implementation to return an unsuccessful status to the program's calling environment only if EXIT_FAILURE is returned (or passed to exit(), which is equivalent).

Now we're getting somewhere. When main() returns the value EXIT_SUCCESS (or 0), that's an explicit statement to the calling environment - to the 'world' we're hailing, in fact - that everything succeeded. Unfortunately, the unconditional stipulation of success is unjustified, since there's no guarantee of success here.

We can assume correctness/robustness of the runtime and the implementation of standard library functions. (In fact, we must do so, otherwise we have an infinitely insoluble problem of recursive self-guessing; this is another aspect of irrecoverability [QM-2, QM-5] that will be dealt with when we get to contract programming. Probably around QM#9 at the going rate ...) Contrarily, we must not assume normative behaviour [QM-5] where that is not guaranteed, which includes cases, like this one, involving interaction with external entities such as the file-system.

So, strictly, the definitive hello-world program is wrong. Ouch! Now, it's entirely appropriate for its authors to claim that the requirements of hello-world allow for tacit failure when redirected, or when the kernel's running out of puff, or whatever. However, it is not appropriate to say that failing to account for non-normative action is justified for the purposes of pedagogy, or that failure is so unlikely in 99.9999% of use-cases that we don't need to bother. Unless that 1-in-a-million user who encounters a blank result can see in the program specification that such an eventuality is possible in certain circumstances (even when that circumstance is not precisely specified), then the program is wrong, and its authors have failed in their task.

Note that it is possible to have correct implementations of main(), and therefore to stipulate unconditional successful output, as long as we restrict ourselves to only using code that can be asserted as correct, such as:

      // hello-world.3.c  
      #include <string.h>  
      int main()  
      {  
        return (int)strlen("");  
      }  

or:

      // hello-world.4.c  
      int lnot(int v)  
      {  
        return !v;  
      }  
      int main()  
      {  
        return lnot(1);  
      }  

But interacting with the file-system involves the possibility of non-normative behaviour, requiring contingent action. Here's an attempt at a simplest robust version involving printf():

      // hello-world.5.c  
      #include <stdio.h>  
      #include <stdlib.h>  
      int main()  
      {  
        int n = printf("hello, world\n");  
        return (13 == n) ? EXIT_SUCCESS : EXIT_FAILURE;  
      }  

Ugly, isn't it? A slightly nicer one is possible, using the standard global pseudo-variable errno:

      // hello-world.6.c  
      #include <errno.h>  
      #include <stdio.h>  
      #include <stdlib.h>  
      int main()  
      {  
        errno = 0;  
        printf("hello, world\n");  
        return (0 == errno)   
           ? EXIT_SUCCESS : EXIT_FAILURE;  
      }  

The simplest one I can come up with is:

      // hello-world.7.c  
      #include <stdio.h>  
      #include <stdlib.h>  
      int main()  
      {  
        return (EOF != puts("hello, world"))  
               ? EXIT_SUCCESS : EXIT_FAILURE;  
      }  

Unfortunately, this is still not enough. As I'm sure you're aware, gentle readers, the standard output stream is buffered. Since the C standard (7.20.4.3;4) requires that 'all open streams with unwritten buffered data are flushed', it is entirely possible, indeed likely in all these example cases, that the salutation will not be written prior to leaving main(). As a consequence, checking the functioning of (f)printf()/(f)puts() does not suffice. The smallest clear and robust implementation of hello-world in C looks like the following:

      // hello-world.8.c  
      #include <stdlib.h>  
      int main()  
      {  
        if( EOF == puts("hello, world") ||  
            0 != fflush(stdout))  
        {  
          return EXIT_FAILURE;  
        }  
        else  
        {  
          return EXIT_SUCCESS;  
        }  
      }  

Perhaps it's no wonder that programming books don't trouble readers with correct/robust example programs!

Reporting

So far our only contingent action has been to indicate to the caller, via the return code, that the program has failed. We can (and should) also report what has failed, to the degree we are able, via a very simple form of contingent reporting, using the standard error stream, via perror() (see sidebar 'Printing Errors in C'):

Printing Errors in C

The C standard library provides two functions for mapping 'error' codes, maintained in the global pseudo-variable errno, into human-readable values. The first, strerror(), returns a non-NULL C-style string mapping any integer value, including all of those defined (both in the standard, and all implementation-defined ones) in errno.h, into a human-readable message. For example:

    strerror(ERANGE); → "Result too large"  
    strerror(EDOM);   → "Numerical argument out of
                         domain"  
    strerror(EMFILE); → "Too many open files"  
    strerror(0);      → "No error detected"  
    strerror(123456); → "Unknown Error (123456)"  
 

It's common to pass the current value of errno, to get a string explaining what most recently behaved in a non-normative manner within (the currently executing thread of) your program. There are issues with re-entrancy in the use of strerror(); see [STRERROR] for more information.

The second standard library function, perror(), is used to print a message that also includes the message associated with the current value of errno, separated by ": ", as in:


    errno = ERANGE;  
    perror("oops");  → "oops: Result too large

   // hello-world.9.c  
      #include <errno.h>  
      #include <stdio.h>  
      #include <stdlib.h>  
      int main()  
      {  
        if( EOF == puts("hello, world") ||  
            0 != fflush(stdout))  
        {  
          perror("failed to say hello");  
          return EXIT_FAILURE;  
        }  
        else  
        {  
          return EXIT_SUCCESS;  
        }  
      }  

Reporting can come in two flavours: contingent reports, and diagnostic log statements.

Definition: A contingent report is a block of information output from a program to inform its controlling entity (human user, or spawning process) that it was unable to perform its normative behaviour. Contingent reports are a part of the program logic proper, and are not optional.

Typical contingent reports include writing to the standard error stream, or opening a modal window containing a warning. They are almost always used for issuing important information about recoverable or practically-unrecoverable conditions.

Definition: A diagnostic logging statement is a block of information output from a program to an optional observing entity (human user/adminstrator, or monitor process) that records what it is doing. Diagnostic logging statements are optional, subject to the principle of removability [QM-1], which states: 'It must be possible to disable any log statement within correct software without changing the (well-functioning) behaviour'

Similarly, diagnostic logging statements are often predominantly used for recording contingent events, but this not need be so. In principle, all interesting events should be subject to diagnostic logging, to facilitate detailed historical tracing of the program flow. A good diagnostic logging library should allow for statements of different severities to be selectively enabled with minimal intrusion on performance when disabled.

Even though it's occasionally useful to piggy-back one form of reporting on the mechanism of the other, it's crucial not to confuse or transgress the requirements that the former is part of the program logic and may not be removed and the latter is optional and may be disabled at compile/link/run-time at will.

Hello, world++

What has all this got to do with exceptions, you may wonder? Well, the C++ hello-world (this one extracted from The C++ Programming Language [TC++PL]) is functionally similar:

      // hello-world.0.cpp  
      #include <iostream>  
      int main()  
      {  
        std::cout << "Hello, new world!\n";  
      }  

Unsurprisingly, it has the same defect as the C version: it does not account for failure. Since the IOStreams, like C's Streams library, uses buffered output, the first thing we need to do is to ensure that the standard output stream is flushed, in order that the program is in a position to detect whether the write was successful. That can be done by using the flush inserter, as in:

      // hello-world.1.cpp  
      #include <iostream>  
      int main()  
      {  
        std::cout << "Hello, new world!\n"   
           << std::flush;  
      }  

A more common way of doing this is to express the newline sequence and the flush operation in one, via the std::endl inserter:

      // hello-world.2.cpp  
      #include <iostream>  
      int main()  
      {  
        std::cout << "Hello, new world! " << std::endl;  
      }  

Now we have the stream flushed. Unfortunately, that's the least of our problems.

The IOStreams is such a horrible undiscoverable library that just getting to make my simple point involves a heap of messing around; see the sidebar 'IOStreams Hello-Worlds' for the hurdle jumping diatribe. Instead, I will use FastFormat [FF-1, FF-2, FF-3], which illustrates the point succinctly.

IOStreams Hello-Worlds

It's no secret that I'm not a fan of IOStreams, and I've written about its many undesirable features before [FF-1]. The one that pertains to our current concern is arguably one of the worst: by default, non-normative behaviour is not reported via exceptions. Instead, you have to use the call-then-test anti-idiom: we must explicitly call the basic_ios::fail() method, as in:

      // hello-world.3.cpp  
      #include <iostream>  
      int main()  
      {  
        std::cout << "Hello, new world!" << std::endl;  
        return std::cout.fail()  
               ? EXIT_FAILURE : EXIT_SUCCESS;  
     }  

Of course, it's easy to understand how this was the pragmatic choice when moving from a world predominantly without exceptions to a standard-prescribed one with them. But the result is the mess we see before us. (And disrupting programmers during compilation is a lot cheaper than after product deployment ...)

Alternatively, you can instruct the stream to throw exceptions in the case where it encounters a non-normative condition:

<
      // hello-world.4.cpp  
      #include <iostream>  
      int main()  
      {  
        std::cout.exceptions(  
        std::ios_base::badbit |  
        std::ios_base::eofbit |  
        std::ios_base::failbit);  
        std::cout << "Hello, new world!" << std::endl;  
      }  

While this looks a lot worse, it's actually a lot better, as it applies for the lifetime of the stream, so, as long as you remember to set it early in its lifetime, at least you won't experience any silent failures.

      // hello-world.5.cpp  
      #include <fastformat/ff.hpp>  
      #include <fastformat/sinks/ostream.hpp>  
      #include <iostream>  
      int main()  
      {  
        ff::flush(ff::writeln(std::cout,  
           "Hello, new world!"));  
        return EXIT_SUCCESS;  
      }  

This program is robust. The normative behaviour is to output the greeting. The non-normative behaviour, should the output fail to be written, causes an exception to be thrown (by FastFormat's std::ostream sink), and the program terminates with a non-zero exit code.

Bad reporting

The C++ hello-world world sounds great, doesn't it? (At least it does once we get it to the point where exceptions are thrown on failure.) Robustness is achieved by the runtime library performing contingent action in response to the uncaught exception emanating from the ff::writeln() statement. The programmer doesn't have to lift a finger (to provide any contingent action) and it all just magically works.

This may get us over the line as far as robustness is concerned, but in terms of usability it stinks! The main problem is that the carefully prepared diagnostic information put into the thrown exception is not used. When an uncaught exception makes its way to escape main(), the language runtime invokes std::terminate() [TC++PL].

      // in namespace std  
      void terminate(void);  

Note that it takes no parameters - the gratuitous void is for emphasis. The standard requires it to call abort(), which causes the process to exit with a non-0 exit code. You can set your own, if you wish, via std::set_terminate(). But your own version must also return void and have no arguments.

You might (reasonably) wonder why std::terminate() doesn't take an argument of type std::exception const&. Well, that's doubtless because in C++ it is permissible to throw instances of types not derived from std::exception; it's even possible to throw fundamental type instances: void*, int, char const*, double, etc. This allows for backwards compatibility with pre-standard exception mechanisms and hierarchies, but it's a pity nonetheless; see the sidebar 'Why Catch-All Clauses are Bad News' for my favourite (of many) reasons why this is a bad idea. Thankfully, newer languages have learned from C++'s experience, and mandate that thrown objects derive from a single, specific, class type.

Why Catch-All Clauses are Bad News

In C++, it's possible to catch all possible exceptions via the catch-all clause, as in:

      try  
      {  
        . . .  
      }  
      catch(...)  
      {  
        fputs("unknown exception\n", stderr);  
        throw;  
      }  

In principle, this is a great thing. In many cases it's desirable to temporarily intercept a thrown exception in order to issue diagnostic logging/contingent reporting, before rethrowing the exception to be caught by something that knows what to do with it.

Unfortunately, some compilers allow you to do more. On Windows, several compilers piggy-back the C++ exception mechanism on top of the Structured Exception Handling [Richter] mechanism, and, for reasons that must have seemed sensible to someone, somewhere, at some time, allow the user to catch operating system exceptions by C++ catch-all clauses. Microsoft's Visual C++ does this, along with a number of others.

This is a terrible idea, for two reasons. First, and most important, this means it's possible to catch access violations, divide-by-zero, and a whole host of critical, and desirably fatal, conditions, and quench them. Obviously, it's then impossible to trust the program. So, use of a catch-all (that doesn't rethrow, or terminate the process) means that robustness cannot be adjudged.

Second, but also pretty important, this behaviour is not standard, and therefore code that uses the catch-all clause is not portable, either between compilers on a given operating system, or between operating systems.

My advice regarding catch-all clauses is: use them as little as possible, preferably never. If you do use one, it should (almost) always rethrow/terminate, after performing the smallest amount of work possible (i.e. a small impact diagnostic logging/contingent reporting statement). We'll see later in this instalment about the ramifications of this.

Thus, we're not going to get the detailed diagnostics we want. Instead you might see a message such as the following, from hello-world.5.cpp compiled with GCC 3.4:

 
    This application has requested the Runtime to terminate it in an unusual way.  
    Please contact the application's support team for more information.  

Whoa, now Neddy! Have a sugar-lump and calm down. That's a pretty intimidating message. Programmers and non-programmers alike would be concerned to see their program having done that.

And it gets worse still. Some compilers take the minimal approach to fulfilling the standard's requirement for std::terminate(), and call abort() without issuing any output: CodeWarrior (version 8, on Windows) is one. So the program simply stops, and unless the user is testing the process exit status he/she gets no indication whatsoever that anything has failed.

If you think about it, the minimal approach, while leaving the user non the wiser, is arguably the more correct. Since the exception is, literally, unexpected, the runtime cannot assume the program is in any kind of fit state, not even to issue diagnostic logging output or a contingent report.

Either way, the fact that abort() is called, and explanation is scant/missing, means that failing to catch exceptions in main() is not intended. Any quality program will, therefore, contain at least one outer try-catch clause. So why don't we see it in C++ textbooks!?

Other languages

The foregoing exposition has fixed firmly on C and C++, for two reasons: they're the languages I know most about; they're the closest to the metal in all this stuff, so represent a very good place to start.

But having raised the spectre of wrongness in pretty much every C/C++ programmer's first introduction to the language, it behoves me not a little to see what's going on in other languages. Simply put, I want to see what happens when all the C#/Java/Python/Ruby hello-worlds are unable to write to stdout. Each will be judged by whether it:

  1. Appears to have any awareness that output has failed.
  2. Throws an exception to stop the process.
  3. Returns a non-zero exit code to the calling environment.

C#

The program used is as follows:

      class Program  
      {  
        static void Main(string[] args)  
        {  
          System.Console.Out.WriteLine(  
             "Hello, brave new world!");  
          System.Console.Out.Flush();  
        }  
      }  

How does it behave? The good news is that the .NET runtime does register that the write has failed, and throws an exception. Unfortunately, this is hardly handled in what you'd call a graceful way. On a machine where one or more debuggers are resident (including all those I have to hand while writing this article), it causes a 'Select Debugger' (Figure 1) dialog to appear.

Figure 1

If you select No, then you get the text shown in Figure 2 on the command-line:

    Unhandled Exception: System.IO.IOException: There is not enough space on the disk.  
       at System.IO.__Error.WinIOError(Int32 errorCode, String maybeFullPath)  
       at System.IO.__ConsoleStream.Write(Byte[] buffer, Int32 offset, Int32 count)  
       at System.IO.StreamWriter.Flush(Boolean flushStream, Boolean flushEncoder)  
       at System.IO.StreamWriter.Write(Char[] buffer, Int32 index, Int32 count)  
       at System.IO.TextWriter.WriteLine(String value)  
       at System.IO.TextWriter.SyncTextWriter.WriteLine(String value)  
       at hello_world.Program.Main(String[] args) in 
    H:\Publishing\Articles\accu\columns\QualityMatters\6-exceptions\code\hello-
    world\c#\hello-world\Program.cs:line 10
Figure 2

If you're a programmer this is ok. Well, no. Let me rephrase: if you're the programmer of this program, this is useful. If you're a user, it's unnecessarily horrible and scary.

Worse, much worse, is the fact that after all that, the process informs its calling environment that everything has gone swimmingly. Yes, hard as it is to believe, an uncaught exception in a .NET program - specifically, in a .NET 3.5 program, target runtime v2.0.50727 - results in a program exit code of 0, i.e. success! What a load of crap!

Java

You might think, ah well, .NET is really just Windows for people with large PCs and plenty of time to wait for programs to start up, do their thing, and then shut down again. Never intended for operating systems with sophisticated command-line processing anyway, so it's no loss. Now, Java, that'll show 'em how it's done.

Yeah? Well, prepare for some crow pie. Consider the following program.

      class HelloWorld  
      {  
        public static void main(String[] args)  
        {  
          System.out.println("Hello, JWorld!");  
          System.out.flush();  
        }  
      }  

With Java 1.6.0_05, this fails on all counts. It does not throw an exception when the flush() fails to write all 15 or 16 bytes to the standard output stream. Nor does it cause the java.exe process to return an non-zero exit code. It's impossible to know whether the Java runtime even detects the write/flush failure, or just that it doesn't think it worth mentioning. The Java program failed in the simplest way, but either doesn't know, or doesn't tell. Either way, it's a pathetic effort!

Mark both VM languages down as not intended for command-line programming (which, to be fair, we pretty much knew anyway, given the long load times for even the simplest programs).

Python

Thankfully, once we come back to the realm of languages that are intended for command-line programs, things get better. The following Python program

      import sys  
      print "hello,  
         untyped world!"  
      sys.stdout.flush()  

gives the output in Figure 3 and the exit code 2.

Traceback (most recent call last):
  File "H:\Publishing\Articles\accu\columns\QualityMatters\6-
  exceptions\code\hello-world\python\hello-world.py", line 4, in <module>
     sys.stdout.flush()
  IOError: [Errno 28] No space left on device
  

Figure 3

Ruby

Similarly, the following Ruby program

      puts "hello, bejewelled  
         world!"  
      $stdout.flush  

gives the output in Figure 4 and the exit code 1.

    H:/Publishing/Articles/accu/columns/QualityMatters/6-exceptions/code/
    hello-world/ruby/hello-world.rb:2:in `flush': No space left on device
    (Errno::ENOSPC)  
         from H:/Publishing/Articles/accu/columns/QualityMatters/6-
    exceptions/code/hello-world/ruby/hello-world.rb:2
Figure 4

Although neither of the script programs by default give the totally nice and neat output I was seeking - something like '<script-name>: No space left on device' - they both get pretty close. All the compiled languages fail miserably.

A production-quality main()

Clearly, the idea of not explicitly handling unrecoverable conditions at main()-level in any major language is not going to cut the mustard. Even though it's virtually never covered in text books - which describe only chocolate worlds with marshmallow skies and champagne rain - we must now put aside childish things, and look at doing things properly.

Ok, ok, I'm laying on the opprobrium a bit thick. I understand the need to have succinct examples in textbooks, and I know how hard had it is to pack information into a small space and have it still readable, but this whole situation is just not good enough.

In an attempt to redress the balance, and to put my money where my mouth is, I'm volunteering the main() I'm using with all the command-line tools I'm creating/enhancing in my main stream of work this year. It's not perfect - it highlights several issues I'll cover later in this instalment, and it also touches on software quality issues yet to be explored in Quality Matters - but it does address the major concern of diagnosis. Take a deep breath, then look at Listing 1.

    char const TOOL_NAME[];  
    clasp::alias_t const aliases[] =  
    {  
     . . .  
    };  
    int tool_main(clasp::arguments_t const* args);  
    int main(int argc, char** argv)  
    {  
     struct clasp_log  
     {  
      static void CLASP_CALLCONV fn(  
        void*       /* context */
      , int         severity  
      , char const* fmt  
      , va_list     args  
      )  
      {  
       pan::pantheios_logvprintf(severity,  
          fmt, args);  
      }  
     };  
     try  
     {  
      clasp::diagnostic_context_t ctxt(NULL,  
         &clasp_log::fn, NULL);  
      clasp::arguments_t const* args;  
      int flags = 0;  
      int argsres = clasp::parseArguments(flags,  
         argc, argv, aliases, &ctxt, &args);  
      if(0 != argsres)  
      {  
       pan::log_ALERT("failed to process the command-line arguments: ",
	   stlsoft::error_desc(argsres));  
      }  
      else  
      {  
       stlsoft::scoped_handle<clasp::arguments_t  
          const*> scoper(args,  
          &clasp::releaseArguments);  
       pan::log_DEBUG("entering main(  
          ", pan::args(argc, argv), ")");  
       return tool_main(args);  
      }  
     }  
     // 1. Always catch bad_alloc first  
     catch(std::bad_alloc&)  
     {  
      pan::logputs(pan::alert, "out of memory");  
      ::fputs("out of memory\n", stderr);  
     }  
     // 2. CLASP  
     catch(clasp::unused_argument_exception& x)  
     {  
      pan::log_INFORMATIONAL("unrecognised command-line argument: ", x.optionName);  
      ff::fmtln(std::cerr, "{0}: invalid argument:  
         {1}; use --help for usage", TOOL_NAME,  
         x.optionName);  
     }  
     catch(clasp::clasp_exception &x)  
     {  
      pan::log_INFORMATIONAL(  
         "invalid command-line: ", x);  
      ff::fmtln(std::cerr, "{0}: invalid command-line: {1}; use --help for usage",
	  TOOL_NAME, x);  
     }  
     // 3. recls  
     catch(recls::recls_exception& x)  
     {  
      pan::log_CRITICAL("exception: ", x);  
      ff::fmtln(std::cerr, "{0}: {1}, item={2};  
         use --help for usage", TOOL_NAME, x,  
         x.get_item());  
     }  
     // 4. Other standard exceptions  
     catch(std::exception& x)  
     {  
      pan::log_CRITICAL("exception: ", x);  
      ff::writeln(std::cerr, TOOL_NAME, ": ", x);  
     }  
    // 5. ...  
    #ifdef CATCH_UNHANDLED  
     catch(...)  
     {  
      pan::logputs(pan::emergency,  
         "unexpected unknown condition");  
      ::fputs("unexpected unknown condition\n	",  
         stderr);  
     }  
    #endif /* CATCH_UNHANDLED */
     return EXIT_FAILURE;  
    }  
Listing 1

Obviously, there's quite a lot going on here, and some of the points encroach on issues that will be covered in later instalments, particularly the minutiae of diagnostic logging content, format and severity. So I'll focus solely on aspects that pertain to exceptions. Before I enumerate the points, I need to cover what is pretty obvious from the code, that it uses the as-yet-unreleased CLASP library (Command Line Argument Sorting and Parsing), which I've mentioned a couple of times here (and in recent CVu articles). Without getting too much into it, the modus operandi of CLASP use is to parse the arguments into a (read-only) arguments structure, which is then passed to a 'real' entry point, which I tend to call tool_main(). All the code within the try-clause is pretty self-explanatory; the only thing worth mentioning is my favourite local-struct-static-method trick for defining (context-free) local functions, in this case allowing Pantheios' logging facilities to be used by CLASP.

Anyway, the important (and relevant) features are all in the catch handlers. In brief:

  • Always catch most-derived first, except for special cases.
  • All catch-clauses fall out to the single return EXIT_FAILURE at the end of main().
  • One special-case always present is to catch std::bad_alloc before anything else. Diagnostic logging and contingent reporting is done in least complex manner possible, since the program just experienced out-of-memory condition: note the use of C's Streams to avoid the C++ free store [TC++PL].
  • In this case, since I'm using CLASP for the tools, catch for CLASP exceptions to deal with badly specified command-lines, including, specifically, unrecognised arguments. This relies on using a specific member of the CLASP exception class, one that is not part of the ancestor class(es), hence the need for specific catch clauses, in a specific order.
  • In this case, since I'm using recls for file-search, catch for recls exceptions to deal with unrecoverable file-system issues, including the specification by the user of invalid directories/patterns.
  • Catch std::exception last. In principle - on the assumption that every exception type thrown by the program or any of its constituent libraries derives from it std::exception - any other exception that is thrown anywhere in the program will be caught here, and a minimum amount of information output.
  • The catch-all clause is only included if CATCH_UNHANDLED is #defined, which it is not by default; see sidebar 'Why Catch-All Clauses are Bad News'. Like the catch for std::bad_alloc, very little is attempted, since it must be assumed that the program is faulted, and that nothing can be relied upon.
  • Diagnostic logging and contingent report statements are always provided, and their format and content may differ depending on the likely needs of their respective audiences.
  • Diagnostic logging always appears before contingent reporting, since it's possible that the contingent report statements may themselves fail. (Of course, it's also possible that the diagnostic logging statements themselves fail, but with any good diagnostic logging library that is (i) far less likely, and (ii) non-faulted failures are not reported, and therefore do not prevent continuing program execution.)

There's an awful lot more here than in any hello-world you're ever likely to see in a C++ textbook. More than one reviewer complained that this example was too much, perhaps even that I am grandstanding. Well, I'm a programmer, so there's bound to be a little of that. But I am making a serious point here: real programs require a substantial amount of contingent logic, invariably involving general and domain-specific cases. Showing you the (only minimally simplified) implementation of a real program keeps it real.

There's actually a better way to do this, involving separation of the general from the domain-specific handling. But it's not as simple as might be imagined, and requires knowledge of issues not yet covered, so I'll leave that for the concluding fourth instalment. For now, Listing 1, while being elaborate, is a solid example of how to have your program handle practically-unrecoverable exceptions.

Implementing C-APIs in C++

Readers of part 2 of Imperfect C++ [IC++] may recall my assertion that C++ is a fine language for application code and for library implementations, but is often a poor choice for module interfaces, particularly so where programs may be composed of modules compiled by different compilers. This is the C++ ABI issue.

Distilled down to this subject, it's not valid to throw exceptions through C-APIs. A common example of this circumstance is the implementation of COM servers, such as Windows shell extensions. There's no good outcome of letting an exception leak out of any COM interface method: about the best you can hope for is to crash Windows Explorer when it's not doing anything useful.

Therefore, when writing COM in C++, the considerable challenge is to make sure that every possible exception is caught and translated into an appropriate HRESULT (the COM result type). Also important is to capture the non-normative action context information, whether for the purposes of diagnostic logging or contingent reporting. If you imagine a component involving several interfaces and many methods, applying try-catch-...-catch everywhere is a recipe for boredom, mistakes, defects, faults, crashes, career-impacts.

Because COM has a well-defined set of result codes, it is possible to prescribe the appropriate set of responses for a small number of high-level exceptions, covering all common possibilities. The Pantheios library provides a suite of function templates that combine these exception->catch->return-code translations along with diagnostic logging statements, enabling the authoring of logged, exception-safe COM servers without being swamped with boilerplate. I'll illustrate with a short example, from recls.COM, the COM mapping for recls; Listing 2. (This version is part of a back-burner rewrite, and not yet available. One day ...)

    // recls.COM.idl  
    interface IFileSearch3  
        : IFileSearch2  
    {  
      . . .  
      HRESULT CombinePaths(  
        [in, string] BSTR path1  
      , [in, string] BSTR path2  
      , [out, retval] BSTR *result);  
      . . .  
    // FileSearch.h  
    class FileSearch  
      : IFileSearch3  
    {  
      . . .  
      STDMETHOD(CombinePaths)(BSTR path1, BSTR path2,  
         BSTR *result);  
      . . .  
    private:  
      HRESULT CombinePaths_(BSTR path1, BSTR path2,  
         BSTR *result);  
    // FileSearch.cpp  
    STDMETHODIMP FileSearch::CombinePaths(BSTR path1,  
       BSTR path2, BSTR *result)  
    {  
      return pantheios::extras::com::  
         invoke_nothrow_method(this,  
         &FileSearch::CombinePaths_, path1, path2,  
         result, "CombinePaths");  
    }  
    HRESULT FileSearch::CombinePaths_(BSTR path1,  
       BSTR path2, BSTR *result)  
    {  
      . . . do "normal" C++, incl. exceptions  
    }  
Listing 2

This is all straightforward COM/C++, with the exception of the use of pantheios::extras::com::invoke_nothrow_method(), in FileSearch::CombinePaths(). This function template (Listing 3) is one of several overloads that cope with different numbers of parameters, providing a similar set of catch-clauses as that shown earlier for the 'production-quality' main(). Although it looks like a complex affair, it's actually pretty simple. Call the given member function within a try-catch block (if the compiler's exception-handling support is not disabled), and deal with any exceptions that are thrown. Three conditions are discriminated, via the catch clauses:

  • Out of memory. If the exception is std::bad_alloc, or the COM component itself returns the E_OUTOFMEMORY status code, then a basic log statement is issued and E_OUTOFMEMORY is returned to the caller. When compiling in the presence of MFC, it also catches CMemoryException* and treats it in the same manner.
  • A general exception, caught as std::exception, and, in the presence of MFC, CException*. The exception details (implicitly obtained from the exception instances via string access shims [IC++, XSTLv1, FF-2] by the Pantheios application layer) are included in the diagnostic log statement.
  • Everything else, via the catch-all clause. A suitably troubling diagnostic log statement is issued. As discussed previously, catching 'everything' is fraught with danger, so conditional compilation requires intentional buy-in from the programmer to convert into a return code, and even to rethrow; by default, the process is terminated with a call to ExitProcess(). Severe, but the only sensible default.

    template<  
      typename R  
    , typename C  
    , typename A0  
    , typename A1  
    , typename A2  
    >  
    inline R invoke_nothrow_method(  
      C *pThis  
    , R (C::*pfn)(A0, A1, A2)  
    , A0 a0  
    , A1 a1  
    , A2 a2  
    , char const* functionName  
    )  
    {  
    #ifdef STLSOFT_CF_EXCEPTION_SUPPORT  
      try  
      {  
      #endif /* STLSOFT_CF_EXCEPTION_SUPPORT */
        HRESULT hr = (pThis->*pfn)(a0, a1, a2);  
        if(E_OUTOFMEMORY == hr)  
        {  
          goto out_of_memory;  
        }  
        return hr;  
      #ifdef STLSOFT_CF_EXCEPTION_SUPPORT  
      }  
      catch(std::bad_alloc&)  
      {  
        goto out_of_memory;  
      }  
 
    PANTHEIOS_EXTRAS_COM_EXCEPTION_HELPERS_CUSTOM_CLAUSE_0  
      catch(std::exception& x)  
      {  
        log(alert, functionName, ": exception: ", x);  
        return E_FAIL;  
      }  
      # ifdef __AFX_H__  
      catch(CMemoryException* px)  
      {  
        px->Delete();  
        goto out_of_memory;  
      }  
      catch(CException* px)  
      {  
        log(alert, functionName, ": exception: ",  
           *px);  

        px->Delete();

        return E_FAIL;  
      }  
    # endif /* __AFX_H__*/
PANTHEIOS_EXTRAS_COM_EXCEPTION_HELPERS_CUSTOM_CLAUSE_1  
      catch(...)  
      {  
        log(critical, functionName,  
           ": unexpected exception!");  
    # if defined(  
      PANTHEIOS_EXTRAS_COM_ABSORB_UNKNOWN_EXCEPTIONS)  
        return E_UNEXPECTED;  
    # elif defined(  
     PANTHEIOS_EXTRAS_COM_RETHROW_UNKNOWN_EXCEPTIONS)  
        throw;  
    # else  
        ::ExitProcess(EXIT_FAILURE);  
    # endif  
      }  
    #endif /* STLSOFT_CF_EXCEPTION_SUPPORT */
    out_of_memory:  
      log(alert, functionName, ": out of memory");  

      return E_OUTOFMEMORY;
Listing 3

There's also the facility for allowing user-defined catch clauses, via the tersely named PANTHEIOS_EXTRAS_COM_EXCEPTION_HELPERS_CUSTOM_CLAUSE_0/1 macros.

After earlier arguing strongly against quenching exceptions in catch-all clauses, you might wonder why I give users the option of what behaviour to take. Well, it's just pragmatism, I suppose: it's not possible to know the nature of every use case. For example, it's possible that a COM component's methods cannot emit any operating-system exceptions because they're already using structured exception handlers [Richter], in which case a programmer may wish to capture other (C++) exception types via the catch-all clause.

Other than the restriction that the implementing method must have the exact same signature as the interface method, and that overloads can be a bit of a hassle, this is a pretty big gain for almost no pain.

Furthermore, the destination of the diagnostics here is, in common with any Pantheios client code, independent of the server code; output decisions can be made, for each link-unit, at compile, link, or even run-time. You can log to a file (via the back-end be.file) and/or the Windows system debugger (be.WindowsDebugger) and/or the Windows Event Log (be.WindowsEventLog), and so on. What's especially useful when writing COM servers is to also use be.COMErrorObject, which writes the details of the diagnostic log statement to the COM Error Object, a per-thread global 'error' context that can be queried by any part of the program. This is a standard mechanism for COM Automation servers to pass so-called 'rich error information' to clients. Clients of any COM servers written using Pantheios can receive such detailed context information about non-normative conditions, with virtually no additional effort from programmer even when it's contained within a thrown exception.

Summary

In this instalment we have begun the exploration of the use of exceptions in software programs and plug-in components. We've examined in detail the effect of uncaught exceptions reporting unrecoverable conditions, and shown that all quality programs must use explicit try-catch code at the application's top-level. Specifically, we've considered the ubiquitous hello-world program in a variety of languages, and seen a number of inadequacies in regards to whether failure to write to standard output is detected and, if so, whether it's reported and reflected in program exit code.

In C, it's necessary to explicitly test and report. In other languages that use exceptions, a reliance on the implicit handling of a thrown exception from the chosen output library is incomplete, to different degrees. In C++, execution is terminated and the program exit code reflects the failure, but no (precise) report is provided. Also, if you're using IOStreams you must do an explicit check, or remember to enable exceptions on the stream. C# and Java both record an epic fail. Only Python and Ruby could claim to satisfy the basic requirements of software quality, although even then one would prefer to explicitly handle the exception for the sake of neatness.

In all cases, it's clear that a program that does not have an explicit top-level try-catch is inadequate. A production-quality main() is non-trivial, involving generic and domain-specific aspects. Its order of catch clauses is important. It should provide adequate and appropriately-targeted reporting of the non-normative conditions that have resulted in the caught-exceptions.

We've looked at two different types of reporting - contingent reports and diagnostic log statements - and seen how derived exception classes can (and should) carry additional information to assist with detailed reporting, both for contingent reporting and for diagnostic logging. We'll follow up on this point further in the coming instalments.

We've seen that when implementing C-APIs, exceptions must be caught and translated into return codes, with appropriate contingent reporting and diagnostic logging. This must be done even in the case where an exception represents an unrecoverable condition (e.g. out-of-memory), and the implementer of such an API must trust that its clients faithfully examine and respond to its return codes.

We've considered the use of catch-all blocks in C++, intended to be able to catch any unhandled exception of whatever type, and seen that, 'enhanced' to be able to catch operating-system exceptions on a compiler-specific basis, throwing/catching any type not derived from std::exception is a problem.

In the next instalment we'll consider the use of exceptions for recoverable conditions, using some real world examples from my recent work. With one C++ program, we'll see the complexities involved in working with cache memory allocation failures, and the difficulties this brings in deciding what is recoverable and what is practically-unrecoverable. With another, I'll demonstrate that .NET programming for networking software is not the least bit easy as 1-2-3, and have a big swipe at the shipwreck that is .NET's exception hierarchy and the unjustifiable difficulties it imposes on programmers attempting to write robust and cleanly abstracted software.

Parting twist

There's one little perverse aspect to note about the abysmal performance of all the different languages to adequately recognise and/or report a failure of hello-world. At least with C, nothing (save from the flushing/closing of files and release of resources back to the operating system) is implicit, so no implicit help is expected. By contrast, the fact that much is implicit with more expressive languages such as C++, C#, and Java has, I believe, lead us to a false expectation, albeit not an unreasonable one in the case of hello-world. I conjecture that experienced C programmers may be less caught out by this than experienced programmers in other languages, precisely because their expectations are so much lower. I'd be interested to hear opinions on this, perhaps on the ACCU general mailing list after this is published. n

Acknowledgements

Thanks to Chris Oldwood, Ric Parkin, and the members of the Overload review team, for helping me out despite another hair's breadth skirting of the deadline.

References and asides

[FF-1] An Introduction to FastFormat, part 1: The State of the Art, Matthew Wilson, Overload 88, February 2009

[FF-2] An Introduction to FastFormat, part 2: Custom Argument and Sink Types, Matthew Wilson 89, April 2009

[FF-3] An Introduction to FastFormat, part 3: Solving Real Problems, Quickly, Matthew Wilson 90, June 2009

[IC++] Imperfect C++, Matthew Wilson, Addison-Wesley, 2004

[K&R] The C Programming Language, Brian Kernighan and Ken Ritchie, Prentice-Hall PTR, 1988

[QM-2] Quality Matters, Part 1: Correctness, Robustness and Reliability, Matthew Wilson, Overload 93, October 2009

[QM-5] Quality Matters, Part 5: Exceptions: The Worst Form of 'Error' Handling, Except For All The Others, Matthew Wilson, Overload 98, October 2010

[Richter] Advanced Windows, Jeffrey Richter, Microsoft Press, 1997.

[STRERROR] Safe and Efficient Error Information, Matthew Wilson, CVu, July 2009

[TC++PL] The C++ Programming Language, Special Edition, Bjarne Stroustrup, Addison-Wesley, 2000

[XSTLv1] Extended STL, volume 1: Collections and Iterators, Matthew Wilson, Addison-Wesley, 2007

Overload Journal #99 - October 2010 + Programming Topics