ACCU Home page ACCU Conference Page
Search Contact us ACCU at Flickr ACCU at GitHib ACCU at Google+ ACCU at Facebook ACCU at Linked-in ACCU at Twitter Skip Navigation

pinQuality Matters #7 Exceptions: the story so far

Overload Journal #114 - April 2013 + Programming Topics   Author: Matthew Wilson
Exception handling is difficult to get right. Matthew Wilson recaps the story so far.

This instalment is the long-awaited next instalment of the Quality Matters column in general and the exceptions series in particular. In it, I recap the previous thinking in regards to exceptions, update some information presented in the previous episode, and present some material that leads into the rest of the series.

Introduction

As the last three years have been extremely involved for me – I’ve been acting as an expert witness in a big software copyright case here in Australia – I’ve been pretty unreliable at meeting publishing deadlines: the Quality Matters column has suffered just as a much as my other article and book writing endeavours, not to mention my various open-source projects. (We’ve even had a new language standard in the interim!) Since finishing the latest big batch of work I’ve been ‘reclaiming my life’, including spending heaps of time riding my bike, during which the subject of software quality, particularly failure and exceptions, has kept my brain occupied while the roads and trails worked on the rest of me. I have, unsurprisingly, come up with even more questions about exceptions and therefore more material to cover, which will likely require an extension of the previously planned four instalments.

Given the extended break in this column – it’s been 15 issues! – I thought it appropriate to start back with a recap of the thinking so far, rather than jump straight into what I’d previously intended to be an instalment on ‘Exceptions for Recoverable Conditions’, particularly to ensure that the vocabulary that’s been introduced is fresh in your mind as we proceed. I also want to revisit material from the previous instalment – ‘Exceptions for Practically-Unrecoverable Conditions’ [QM-6] – insofar as it’s affected by the release of new (open-source) libraries that simplify the definition of program entry points and top-level exception-handling boilerplate. And to give you some confidence that a lot of new material is on its way in the coming months, I will also present an examination of exception-handling in a recent Code Critique that provides an insight into some of the deficiencies of exceptions for precisely reporting failure conditions (whether that’s for better contingent reporting or for use in determining recoverability), which will be covered in more detail as the series progresses.

New Vocabulary: Is it worth it?

One of the esteemed ACCU reviewers (and a friend of mine) Chris Oldwood offered strong criticism of the vocabulary that I have devised and employed in this column. Chris (quite rightly) points out (i) that the language seems precise, possibly to the point of pedanticism, and (ii) that no-one else is (yet) writing about these issues in these terms. He is right on both.

The problem is hinted at in Chris’ second objection: no-one else is (yet) writing about these issues in these terms. Indeed, as far as I have been able to research, no-one is writing about these issues. And that’s the real problem. (Acknowledging that I’m not omniscient in any Google-like sense) I have not been able to find any (other) books or articles that dissect the notion of failure in as detailed a manner as I feel I am compelled to do in this column and, indeed, as I have felt compelled to do in my work ever since beginning this column.

So, gentle readers, I’m afraid the high-falutin’ language stays, simply because there’s no other way I know of to precisely impart what I have to say. If it puts you off, well, I’m sorry (and I guess you won’t be alone), but I’m neither able nor willing to attempt to write about such important and nuanced material without having recourse to precisely defined terms, even if I have to invent half of them myself.

One thing I will do in moving towards Chris’ position: I will ensure that definitions and exemplifying material for all the terms used are gathered and available in a single place on the Quality Matters website: http://www.quality-matters-to.us.

Nomenclature

Although writing software is a highly creative process, it still behoves its practitioners to be precise in intent and implementation. In common with many, no doubt, I struggle greatly with the prevalent lack of precision in discussion about software development in general and in written form in particular: for example, common use of the terms ‘error’ and ‘bug’ are replete with ambiguity and overloading, making it very difficult to convey meaning precisely. Therefore, in order to be able to talk precisely about such involved concepts, I have found it necessary to provide precise definitions to some existing terms, and even to introduce some new terms.

In the first – ‘Taking Exceptions, part 1: A New Vocabulary’ [QM-5] – I suggested a new vocabulary for discussing programming conditions and actions, as follows (and as illustrated in Figure 1):

Figure 1
  • normative conditions and actions are those that are the main purpose and predominant functioning of the software; and
  • non-normative conditions and actions are everything else.

Non-normative conditions and actions split into two: contingent and faulted conditions:

  • Contingent conditions and actions are associated with handling failures that are according to the design of the software. They further split into two types:
  • practically-unrecoverable conditions and actions are associated with failures that prevent the program from executing to completion (or executing indefinitely, if that is its purpose) in a normative manner. Examples might include out-of-memory, disk-full, no-network-connection; and
  • recoverable conditions and actions are associated with failures from which the program can recover and to completion (or executing indefinitely, if that is its purpose) in a normative manner. Examples might be user-entered-invalid-number, disk-full (if a user/daemon is present and able to free/add additional capacity).
  • faulted conditions and actions are associated with the program operating outside of its design: the classic ‘undefined behaviour’.

These terms will be used (and their definitions assumed) in the remaining instalments of this series and, in all likelihood, most/all future topics of Quality Matters.

Essence of exceptions

Also in [QM-5] I discussed the various uses and meanings of exceptions – some desired; some not – and came to the conclusion that the only definitive thing we can say about exceptions are that they are (an alternative) flow-control mechanism. This might seem pointlessly pedantic, but it’s a necessary guide to issues I’ll get to in the coming instalments. Promise.

Stereotypes

Also in [QM-5] I discussed exception use stereotypes, such as exceptions-are-evil, exceptions-for-exceptional-conditions, and so on. I do not reference them directly in this instalment, and will assume you will man the browser if you want to (re)visit them in detail. One aspect of this discussion that does bear repetition here, however, is some of the obvious problems attendant with the use of exceptions, namely:

  • Exceptions break code locality: they are invisible in the (normative) code, and create multiple (more) function exit points. Consequently, they significantly impact on code transparency. This occurs in both positive and negative ways, depending on conditions/actions, as we will see in later instalments; and
  • Exceptions are quenchable: it is possible (with very few exception(!)s) to catch and not propagate those exceptions that are being used to indicate practically-unrecoverable conditions. (The same applies to exceptions that are used to indicated faulted conditions, though this is predominantly an unwise application of them.) This leaves higher levels of the program unaware of the situation, with obvious concomitant impacts on program reliability.

Reporting

In the second – ‘Exceptions for Practically-Unrecoverable Conditions’ [QM-6] – I considered the necessity for reporting to a human (or human-like entity) as part of contingent action, in two distinct forms: contingent reports, and diagnostic log statements.

Definition: A contingent report is a block of information output from a program to inform its controlling entity (human user, or spawning process) that it was unable to perform its normative behaviour. Contingent reports are a part of the program logic proper, and are not optional.

Definition: A diagnostic logging statement is a block of information output from a program to an optional observing entity (human user/administrator, or monitor process) that records what it is doing. Diagnostic logging statements are optional, subject to the principle of removability [QM-1], which states: “It must be possible to disable any log statement within correct software without changing the (well-functioning) behaviour.”

Typical contingent reports include writing to the standard error stream, or opening a modal window containing a warning. They are almost always used for issuing important information about recoverable or practically-unrecoverable conditions.

Similarly, diagnostic logging statements are often predominantly used for recording contingent events, but this not need be so. In principle, all interesting events should be subject to diagnostic logging, to facilitate detailed historical tracing of the program flow. A good diagnostic logging library should allow for statements of different severities to be selectively enabled with minimal intrusion on performance when disabled.

Exception-hierarchies

Also in [QM-6] I discussed the conflation, by some compiler manufacturers, of the C++ exception-handling mechanism and operating-system ‘exceptions’/faults, which can allow the quenching of unequivocally fatal conditions via a catch(...) clause! The consequence of this is that use of catch(...) should be eschewed in the broad, which, in combination with the fact that programs need a top-level exception-handler ([QM-6]; discussed in the next section), means that all thrown things should be derived from std::exception.

Recommendation: All thrown entities must (ultimately) be derived from std::exception.

Boilerplate handling

Also in [QM-6] I examined the inadequacy of the hello world introductory program totem: in all compiled languages examined (C, C++, C#/.NET, Java) the classic form is inadequate, and will indicate success even in some failure conditions! The starkly obvious conclusion is that programs written in all these languages require some explicit highest-level exception-handling. I then proffered as example a ‘production-quality main()’ used in several of the software analysis tools contemporaneously-(re)developed for the case I’ve been working on, within which a caller-supplied ‘main’ was executed within the scope of initialisation of some third-part libraries and subject to a rich set of catch clauses.

You may recall that the rough algorithm was as follows:

  1. Catch std::bad_alloc (and any equivalent, e.g. CMemoryException*) first, and exit;
  2. Catch CLASP – a command-line parsing library [ANATOMY-1] – exceptions. Though CLASP is a general-purpose (open-source) library, its use in the analysis tool suite is specific, so CLASP exceptions were caught within the context of expected program behaviour, albeit that the condition of an invalid command-line was in this case – and usually will be in the general case – deemed to be practically-unrecoverable;
  3. Next were exceptions from the recls library. Again, recls is a general-purpose (open-source) library but its use within the analysis tool suite was specific and could be treated in the same manner as CLASP;
  4. A ‘standard catch-all’ clause, catching std::exception; and
  5. Optionally (i.e. under opt-in preprocessor control), a genuine catch all (catch(...)) clause.

In all cases, appropriate diagnostic log statements and contingent reports were made (or, at least, attempted), balancing the extremity of the situation (e.g. out-of-memory) with the requirement to provide sufficient diagnostic and user-elucidatory information.

I also discussed a header file supplied with the Pantheios diagnostic logging API library [PAN] that abstracted the boilerplate exception-handling for COM components.

Since that time, I’ve continued to work on the subject, and have now released two new libraries relevant to the subject:

  • Pantheios::Extras::xHelpers (C++-only) – provides general exception-catching and translation-to-failure-code services for any components/libraries that need to exhibit a C-API, along with a COM-specific set implemented in terms of the general, and logs the conditions using Pantheios; and
  • Pantheios::Extras::Main (C, C++) – provides both automatic (un-initialisation of the Pantheios library (otherwise required explicitly in C code) and, for C++, handling of exceptions as follows:
    • Out-of-memory exceptions – issues minimal appropriate contingent reports and diagnostic log statements;
    • Root’ exceptions (std::exception; for MFC-only CException*) – issues contingent reports and diagnostic log statements including exception message; and
    • Catch-all clause – issues minimal appropriate contingent reports and diagnostic log statements; not enabled by default (for the reasons explained in [QM-6]), but can be enabled under direction of pre-processor symbol definitions.
  • (All other application/library-specific exceptions (such as those shown in Listing 1 of [QM-6]) are to be handled in a deeper, application-supplied handler, thereby achieving a clean(er) separation of the specific and the generic.)

I’ll provide more details for these libraries (all downloadable from the Pantheios [PAN] project), at a future time, when their specific features and utility are worth mentioning explicitly. Pantheios::Extras::Main will certainly appear in the upcoming issues of CVu when I continue –hopefully next month – my series looking at program anatomies [ANATOMY-1], and again in this series, when I consider the benefits of a dumped core (pointed out to me by readers of [QM-5]).

Use of these libraries now changes significantly the previous production main(), discussed and listed in [QM-6], to that shown in Listings 1 and 2.

There were three significant problems with the previous ‘production quality main()’:

  1. It was large and unwieldy;
  2. It mixed library initialisation in with contingent action; and
  3. Most seriously from a design perspective, it failed to distinguish, ascribe, or respect any abstraction levels of the sub-systems for which the contingencies were provided.

As I look back on the proffered main() from some distance in time, it is the third issue that strikes me as most grievous. Since that time I have done much work on program structure – discussed in one prior [ANATOMY-1] and several forthcoming articles about ‘program anatomy’ in CVu – supported by several new libraries, including those mentioned above.

Instead of a single initialisation utility function approach, I now layer the initialisation+failure-handling as follows.

First (i.e. outermost), comes the initialisation of the diagnostic logging facilities, and handling of otherwise-uncaught generic exceptions. This is achieved by the use of Pantheios::Extras::Main’s invoke(), in Listing 1’s main(), which:

  1. (un)initialises Pantheios, ensuring that the caller-supplied inner main is always executed within an environment where diagnostic logging facilities are available; and
  2. catches those exceptions that are appropriately top-level:
    1. std::bad_alloc for out-of-memory conditions. This exception can and should be readily ignored by all inner-levels of the program, and represents a bona fide practically-unrecoverable condition; the exception to that will be discussed in – you guessed it! – a later instalment;
    2. std::exception for the necessary outer-catch behaviour required of us due to the latitude in the standard discussed in [QM-6]. It may be that such general catching at this outer level represents valid behaviour; I tend to think it will rather indicate a failure on the part of the programmer to account for all facets of behaviour in the design: i.e. the more specific (though still std::exception-derived) exceptions should have been caught in the inner, application-specific, logic levels of the program. (Once again, this being a rather huge and contentious issue, it’ll have to wait for a further instalment for further exploration.); and
    3. catch(...), but only if the preprocessor symbol PANTHEIOS_EXTRAS_MAIN_USE_CATCHALL is defined, because this should almost always be eschewed.

Next, I chose to employ another of Pantheios’ child libraries: Pantheios::Extras::DiagUtil, to provide memory leak detection, in Listing 1’s main1(). (The naming has to be worked on, I know!) Importantly, this happens after diagnostic logging is initialised and before it is uninitialised, so that (i) these facilities are available for logging the leaks, and (ii) there are no false-positives in the leak detection due to long-lived lazily-evaluated allocations in the diagnostic logging layer. If you don’t want to do memory-leak tracing you can just leave this out (and wire up main() to main2()).

// in common header

extern
int main_proper_outer(
  clasp::arguments_t const* args
);
extern clasp_alias_t const aliases[];

. . .

// in file main.cpp
//
// common entry point for all tools

static
int main2(int argc, char** argv)
{
 return clasp::main::invoke(argc, argv,
    main_proper_outer, NULL, aliases, 0, NULL);
}

static
int main1(int argc, char** argv)
{
 return ::pantheios::extras
            ::diagutil::main_leak_trace
                ::invoke(argc, argv, main2);
}

int main(int argc, char** argv)
{
 return ::pantheios::extras
            ::main::invoke(argc, argv, main1);
}
			
Listing 1

Last, comes the command-line handling, in the form of CLASP’s child library CLASP::Main, in Listing 1’s main2().

Note that so far we’ve not seen any explicit mention of exceptions, even though all three libraries perform exception-handling within:

  • pantheios::extras::main::invoke() catches std::bad_alloc (and CMemoryException* in the presence of MFC);
  • pantheios::extras::diagutil::main_leak_trace::invoke() catches any exceptions emitted by its ‘main’ and issues a memory leak trace before re-throwing to its caller; and
  • clasp::main::invoke() catches CLASP-specific exceptions, providing the pseudo-standard contingent reports along the lines of ‘myprogram: invalid command-line: unrecognised argument: --oops=10’ (and, of course, diagnostic library statements in the case where the CLASP code detects it is being compiled in the presence of Pantheios); all other exceptions are ignored, to be dealt with by the outer-layers.

Each of these is designed to be entirely independent of the others, and it’s entirely up the programmer whether or not to incorporate them, and any similar layering facilities, in whatever combination is deemed appropriate. All that’s required is to provide whatever declarative information is required – the example shown just provides 0s and NULLs for default behaviour – and rely on a sensible, predictable, and dependable behavioural foundation to the program.

With the (boring) boilerplate out of the way in this fashion, the programmer may then focus on what are the more interesting aspects of failure (as reported by exceptions) his/her program may encounter. In a simple standalone program, the next and final stage might be the ‘main proper’. However, more sophisticated programs that depend on other libraries that may themselves throw exceptions to report practically-unrecoverable and recoverable conditions may elect to have shared, program-suite-specific ‘outer main proper’, as I have done so with the analysis tool suite, which looks something like that shown in Listing 2. (This illustrates use of a program_termination_exception class, which is yet another thing I’ll discuss later. Prizes for the first reader to write in with why such an exception might be used in preference to a simple call to exit().)

int
main_proper_outer(
   clasp::arguments_t const* args
)
{
  try
  {
    return main_proper_inner(args);
  }
  // 0
  catch(program_termination_exception& x)
  {
    pan::log_INFORMATIONAL(
     "terminating process under program direction;
      exit-code=", pan::i(x.ExitCode));
    return x.ExitCode;
  }
  // 1. out-of-memory failures now caught by
  //    Pantheios::Extras::Main
  // 2. CLASP failures now caught by CLASP::Main
  // 3. recls
  catch(recls::recls_exception& x)
  {
    pan::log_CRITICAL("exception: ", x);

    ff::fmtln(cerr, "{0}: {1}; path={2};
       patterns={3}", ST_TOOL_NAME, x,
       x.get_path(), x.get_patterns());
  }
  // 4. standard exception failures now caught by
  //    Pantheios::Extras::Main
  return EXIT_FAILURE;
}
			
Listing 2

Code Critique #73

One of the things I did to keep my programming mojo idling over during my long engagement was to solve the 73rd Code Critique [CC-73].

The first thing to note is that I misread the program description, and so got slightly wrong requirements for my own implementation! The actual requirement was to read from standard input, whereas I misunderstood it to mean that it should read from a named file. Fortunately, the change is useful in illustrating the points I wish to make. Other than that, I think I have it right, as follows:

  • obtain name of input text file from command-line (this is the requirement I misread);
  • open source text file, and read lines, then for each line:
    • upon finding a line – control line – with the format (in regex parlance) /^---- (.+) ----$/, close the previous output file, if any, and open a new output file with the name specified in (regex parlance again) group $1;
    • upon reading any other kind of line – data line – write it out to the currently-open output file;
  • when all lines are done, exit.

The reason this example sparked my interest in cross-pollination with QM is that it exercises one of my interests with respect to software quality:

  1. The impact of process-external interactions – in this case the file-system – on the software quality of a program;

along with two of my frequent preoccupations with C++:

  1. When is it more appropriate to use another language to solve a problem; and
  2. The IOStreams library, and the myriad reasons why I hate it.

When working with the file-system, we must care deeply about failure, since failure awaits us at every turn: capacity; hardware failures; input; output; search; user directions; and so on. I’ll cover some of the relevant issues (as they pertain to failure detection and handling, and exceptions) in the remainder of this instalment. (Note that the steps are written for maximum revisibility [ANATOMY-1], and don’t necessarily reflect how I would proceed normally.)

Let’s start with a fully working version – in the sense that it addresses the normative requirements outlined above – and then see how it behaves when presented with an imperfect execution environment. This is given in Listing 3.

int main(int argc, char** argv)
{
 std::ifstream ifs(argv[1]);
 std::ofstream ofs;
 std::string           line;

 for(; std::getline(ifs, line); )
 {
  if( line.find("--- ") == 0 &&
      line.find(" ---") == line.size() - 4)
  {
   std::string const path =
      line.substr (4, line.size() - 8);
   if(ofs.is_open())
   {
    ofs.close();
   }
   ofs.open(path.c_str());
  }
  else
  {
   ofs << line << "\n";
  }
 }
 return EXIT_SUCCESS;
}
			
Listing 3

When operated with the input file shown in Listing 4 – where ¶ denotes an end-of-line sequence, missing on the last line – it experiences normative-only conditions and produces the intended two output files (outputfile-a.txt and outputfile b.txt).

--- outputfile-a.txt ---¶
abc    ¶
--- abc¶
def    ¶
¶
--- outputfile b.txt ---¶
ghi¶
jklm¶
¶
nop
			
Listing 4

However, we don’t have to try very hard to break it. Simply omitting the input-file command-line argument will cause a dereference of NULL (argv[1]), and undefined behaviour (in the form of a segmentation fault / access violation => crash). This is easily fixed by the addition of a test in Step 1 (Listing 5), causing issue of a contingent report and return of EXIT_FAILURE; I’m omitting diagnostic logging from these examples for brevity.

int main(int argc, char** argv)
{
 if(NULL == argv[1])
 {
  std::cerr << "step1 : <inputfile>" << std::endl;
  return EXIT_FAILURE;
 }

 std::ifstream ifs(argv[1]);

 . . . // as before

 return EXIT_SUCCESS;
}
			
Listing 5

It’s not just omitting the input-file command-line argument. Equally wrong is to supply two arguments. This is easily addressed (Step 2; listing not shown) by changing the test expression from:

  if(NULL == argv[1])

to:

  if(2 != argc)

Furthermore, we learned last time that any production-quality program requires an outer try-catch, since otherwise any exception thrown by the program (including std::bad_alloc, which is the only one that may be thrown by the versions so far) will cause an unexplained – by dint of a lack of a guaranteed (i.e. standard-prescribed) contingent report – abort() (via std::terminate()). Fulfilling this gives Step 3, shown in Listing 6; all subsequent steps will assume main_inner().

static
int main_inner(int, char**);

int main(int argc, char** argv)
{
 try
 {
  return main_inner(argc, argv);
 }
 catch(std::bad_alloc&)
 {
  fputs("step3 : out of memory\n", stderr);
 }
 catch(std::exception& x)
 {
  fprintf(
    stderr
  , "step3 : %s\n"
  , x.what()
  );
 }
 return EXIT_FAILURE;
}

int main_inner(int argc, char** argv)
{
 if(2 != argc)
 {
  std::cerr << "step3 : <inputfile>" << std::endl;
  return EXIT_FAILURE;
 }

 . . . // as before

 return EXIT_SUCCESS;
}
			
Listing 6

With just a modicum of consideration, I’m able to come up with eight potential problems that would result in failure of the program, including the first two:

  1. Input file path argument not specified by user;
  2. 2+ arguments specified by user;
  3. Input file contains data line before the first control line;
  4. Input file does not exist;
  5. Input file cannot be accessed;
  6. Control line specifies output path that cannot exist (because directory component does not exist);
  7. Control line specifies output path that cannot exist (because path contains invalid characters); and
  8. Control line specifies output path of file that exists and is read-only.

I’ll measure the normative version, and the other versions I’ll introduce in response, against these eight problems. Note that all eight are eminently reasonable runtime conditions: none is caused by conditions, or should invoke responses, that are (or should be) outside the design. In our parlance: none of these should result in a faulted condition.

The first two problems we’ve already encountered; the last five pertain to the file-system, and will be the main points of interest in this section. Before that, however, I must address a key piece of functionality, which is to handle the case where the input file specifies data lines before control lines (problem #3). Let’s assume an input file, inputfile-3.txt, that is the same as inputfile-0.txt with a single data line "xyz" before the first control line. The current version, Step 3 (Listing 6), silently ignores it, ‘successfully’ creating the two (other) output files. This has to be caught, as shown in Step 4 (Listing 7): the ‘un-const-ising’ (forgive me!) of path, somewhat regrettable in its own terms, and moving it outside allows of the loop allows it to be used as an indicator as to whether an output file is open, with the corresponding contingent report and EXIT_FAILURE if not. Note the precise and sufficiently-rich explanation provided in the contingent report, which will allow a user (who knows how the program is supposed to work) to correct the problem with the input file.

int main_inner(int argc, char** argv)
{
 if(2 != argc)
 {
  std::cerr << "step4: <inputfile>" << std::endl;
  return EXIT_FAILURE;
 }
 char const* const inputPath = argv[1];
 std::ifstream ifs(inputPath);
 std::ofstream ofs;
 std::string line;
 std::string path;
 for(; std::getline(ifs, line); )
 {
  if( line.find("--- ") == 0 &&
      line.find(" ---") == line.size() - 4)
  {
   path = line.substr(4, line.size() - 8);
   if(ofs.is_open())
   {
    ofs.close();
   }
   ofs.open(path.c_str());
  }
  else
  {
   if(path.empty())
   {
    std::cerr << "step4: invalid input file '"
         << inputPath
         << "': data line(s) before control line"
         << std::endl;
    return EXIT_FAILURE;
   }
   ofs << line << "\n";
  }
 }
 return EXIT_SUCCESS;
}
			
Listing 7

Let’s now turn our attention to problems 4 and 5. As discussed in [QM-6], the IOStreams do not, by default, throw exceptions on failure conditions. In this mode, to detect whether an input file does not exist or cannot be accessed, the programmer must explicitly test state member functions. Absent such checks, as in Step 4 (Listing 7), the program will silently fail (and indicate success via EXIT_SUCCESS!).

We have two choices: either test the state of the streams at the appropriate point or cause the input stream to throw exceptions (and catch them). Step 5 (Listing 8) takes the former approach. While this detects the failure event – the fact that the input file cannot be open – it does not provide any reason as to what caused the failure: for our purposes there is no discrimination between problems 4 and 5. At a minium, we would want to include some information in our contingent report that indicates to the user which one it is.

int main_inner(int argc, char** argv)
{
 . . .
 char const* const inputPath = argv[1];
 std::ifstream ifs(inputPath);
 std::ofstream ofs;
 std::string line;
 std::string path;
 if(ifs.fail())
 {
  std::cerr << "step5: could not open '"
            << inputPath << "'" << std::endl;
  return EXIT_FAILURE;
 }
 for(; std::getline(ifs, line); )
 {
  . . .
}
			
Listing 8

We might consider the following logic: All C++ standard library implementations (that I know of) are built atop the C standard library’s Streams library (FILE*, fprintf(), etc.), which uses errno, so I can use errno and strerror() to include contingent reporting. This might take us to Step 6 (Listing 9). This appears to work for me with VC++ for both Problem 4:

#include <fstream>
#include <iostream>
#include <string>
#include <cerrno>
#include <cstdlib>
#include <cstring>

 . . .

 if(ifs.fail())
 {
  int const e = errno;
  std::cerr << "step5: could not open '"
            << inputPath << "': "
            << strerror(e) << std::endl;
  return EXIT_FAILURE;
 }

 . . .
			
Listing 9
  step6: could not open 'not-exist.txt': No such
  file or directory

and Problem 5:

  step6: could not open 'inputfile-0-NO-READ.txt':
  Permission denied

The same is to be had from the CodeWarrior 8 compiler (on Windows):

  step6: could not open 'not-exist.txt': No such
  file or directory

and:

  step6: could not open 'inputfile-0-NO-READ.txt':
  Operation not permitted

The problem is that nowhere in the C++ (03) standard, or nowhere that I could find anyway, does it mandate that the IOStreams will be implemented in terms of C’s Streams, nor that it will faithfully maintain and propagate errno. (All that is mandated is that the standard input, output, and error streams will be synchronisable with C++’s IOStreams’ cin, cout, and cerr.) Compiling Step 6 with Digital Mars compiler results in:

  step6: could not open 'not-exist.txt': No error

and:

  step6: could not open 'inputfile-0-NO-READ.txt':
  No error

indicating that errno is 0 in those cases.

Clearly this is not a portable strategy for propagation of failure reason to the user via the contingent report (or via diagnostic logging). Nor is it a reliable mechanism for making programmatic decisions regarding what contingent actions to take. In one of the next instalments I’m going to show just how much rubbishy hassle is attendant with this problem, particularly (though perhaps not exclusively) in .NET.

Let’s now consider the alternative approach of using exceptions, as in Step 7 (Listing 10). At face value, it appears that this is an improvement: we’ve certainly improved the transparency (with respect to the normative behaviour of the program).

int main_inner(int argc, char** argv)
{
 . . .
 char const* const inputPath = argv[1];
 std::ifstream ifs(inputPath);
 std::ofstream ofs;
 std::string line;
 std::string path;

 ifs.exceptions(std::ios::failbit);

 for(; std::getline(ifs, line); )
 {
  . . .
 }
 return EXIT_SUCCESS;
}
			
Listing 10

However, when we run this (with VC++) we get the following output:

  step7 : ios::failbit set

and:

  step7 : ios::failbit set

That’s pretty depressing. By setting the stream to emit exceptions we’ve gained the automatic and unignorable indication of failure, which is good. But we’ve failed to gain any useful qualifying information. Even to a programmer, being told “ios::failbit set” is pretty useless; to a user it’s arguably less than that. Furthermore, since the standard defines std::ios::failure to have only constructor, destructor, and what() overload – there’s no underlying_failure_code() accessor method or any equivalent – the exception cannot provide any information that could be used programmatically.

But hold on. It gets (much) worse. If we now supply a valid input file we still get the same contingent report, but after it has produced the requisite output file(s). What gives?

Sadly, the (C++03) standard prescribes (via a convoluted logic including clauses 27.6.1.1.2;2 and 21.3.7.9;6) that when std::getline() encounters the end-of-file condition it throws an exception if the stream is marked to throw on std::ios::failbit, even when, as in our case, it is not marked to throw on std::ios::eofbit. I’ve little doubt that there’s a necessary backwards-compatibility reason for this, but in my opinion this just leaves us with ludicrous behaviour. (Ignoring the almost equally ludicrous set-exceptions-after-construction anti-idiom), what could be more transparent than the following:

  std::ifstream ifs(path);
  ifs.exceptions(std::ios::failbit);
  for(std::string line; std::getline(ifs, line); )
  {
    . . . // do something with line
  }
  . . . // do post-read stuff

You don’t get to write that. At ‘best’, you must write something like the following (and in Step 8, Listing 11):

int main_inner(int argc, char** argv)
{
 . . .
 char const* const inputPath = argv[1];
 . . .
 ifs.exceptions(std::ios::failbit);

 try
 {
  for(; std::getline(ifs, line); )
  {
   . . .
  }
 }
 catch(std::ios::failure&)
 {
  if(!ifs.eof())
  {
   throw;
  }
 }

 return EXIT_SUCCESS;
}
			
Listing 11
  std::ifstream ifs(path);
  ifs.exceptions(std::ios::failbit);
  try
  {
    for(std::string line; std::getline(ifs, line);)
    {
      . . . // do something with line
    }
  }
  catch(std::ios::failure&)
  {
    if(!ifs.eof())
    {
      throw;
    }
  }
  . . . // do post-read stuff

Are we having fun yet?

Let’s now consider the final three problems (6–8), all of which pertain to the ability to create the specified output file. If we do this manually, it’ll involve testing ofs.fail() after the call to ofs.open() inside the loop. (For completeness we’d probably also want to test after the insertion of line into ofs, but I’m trying to be brief here …) But as we’ve seen with the input file/stream, we’re still out of luck in discriminating which of the actual conditions 6–8 (or others) is responsible for the failure – and even if we could, there are only two (C++03) standard-prescribed errno values, EDOM and ERANGE, neither of which are applicable here – and might not even get a representative (albeit platform/compiler-specific) ‘error’ string from strerror().

Similarly, if we call exceptions() on ofs without more fine-grained catching, we have two obvious problems:

  • loss of precision as to location of exception, and, hence, the cause; and
  • no more informative (i.e. not!) messages than we’ve seen provided with input file/stream failure.

There’s nothing we can do about the latter, but we can address the former, by catching with greater granularity to afford identification of the failure locations, as in Listing 12. I hope you agree with me that the implementation as it now appears is patently inferior: we’re polluting the pretty straight-forward functionality of this simple program with too much failure-handling, a clear loss of transparency. Furthermore, there is no clear delineation between application-specific failure – e.g. the presence of data prior to output-file-name in an input file – and general (in this case file-system) failure. This is hardly the promise of clarity in failure-handling that exception proponents might have us believe.

int main_inner(int argc, char** argv)
{
 if(2 != argc)
 {
  std::cerr << "step9: <inputfile>" << std::endl;

  return EXIT_FAILURE;
 }

 char const* const inputPath = argv[1];
 std::ifstream  ifs(inputPath);
 std::ofstream  ofs;
 std::string   line;
 std::string   path;

 ifs.exceptions(std::ios::failbit);
 ofs.exceptions(std::ios::failbit);

 try
 {
  for(; std::getline(ifs, line); )
  {
   if( line.find("--- ") == 0 &&
    line.find(" ---") == line.size() - 4)
   {
    path = line.substr(4, line.size() - 8);
    if(ofs.is_open())
    {
     ofs.close();
    }
    try
    {
     ofs.open(path.c_str());
    }
    catch(std::ios::failure&)
    {
     std::cerr
        << "step9: could not open output file '"
        << path << "'" << std::endl;

     return EXIT_FAILURE;
    }
   }
   else
   {
    if(path.empty())
    {
     std::cerr << "step9: invalid input file '"
        << inputPath
        << "': data line(s) before control line"
        << std::endl;

     return EXIT_FAILURE;
    }

    try
    {
     ofs << line << "\n";
    }
    catch(std::ios::failure&)
    {
     std::cerr
      << "step9: could not write to output file '"
      << path << "'" << std::endl;

     return EXIT_FAILURE;
    }
   }
  }
 }
 catch(std::ios::failure&)
 {
  if(!ifs.eof())
  {
   throw;
  }
 }

 return EXIT_SUCCESS;
}
			
Listing 12

This conflict is something I will be exploring further next time, including considerations of precision and accuracy of identifying and reporting (sufficient information about) the failure, use of state, issues of coupling between throw and call sites, and further conflicts in software quality characteristics arising from the use, non-use, and misuse of exceptions.

References

[ANATOMY-1] ‘Anatomy of a CLI Program Written in C’, Matthew Wilson, CVu volume 24, issue 4

[CC-73] CVu vol 23, issue 6

[PAN] http://www.pantheios.org/

[QM-1] ‘Quality Matters 1: Introductions and Nomenclature,’ Matthew Wilson, Overload 92, August 2009

[QM-5] ‘Quality Matters 5: Exceptions: The Worst Form of ‘Error’ Handling, Apart from all the Others’, Matthew Wilson, Overload 98, August 2010

[QM-6] ‘Quality Matters 6: Exceptions for Practically-Unrecoverable Conditions’, Matthew Wilson, Overload 99, October 2010

Overload Journal #114 - April 2013 + Programming Topics