ACCU Home page ACCU Conference Page
Search Contact us ACCU at Flickr ACCU at GitHib ACCU at Google+ ACCU at Facebook ACCU at Linked-in ACCU at Twitter Skip Navigation

pinQuality Matters #8: Exceptions for Recoverable Conditions

Overload Journal #120 - April 2014 + Programming Topics   Author: Matthew Wilson
Too many programs deal with exceptions incorrectly. Matthew Wilson suggests practical steps to improve your code.

In this instalment I turn to the use of exceptions with (potentially) recoverable conditions, and examine what characteristics of exceptions are necessary in order to support recovery. I also consider what may be done when the information to support recovery decisions is not provided, and introduce a new open-source library, Quench.NET, that assists in dealing with such situations. (To start with, though, I provide a demonstration of an earlier assertion: exceptions are essentially nothing more than a flow-control mechanism and we should be careful not to forget it.)

Introduction

This is the eighth instalment of Quality Matters, the fourth on exceptions, and the first since I averred in the April 2013 edition of Overload that I would be both regular and frequent from here on in. The reason I’d been so sketchy 2010–2013 was that I’d been on a very ‘heavy’ engagement as an expert witness on an all-consuming intellectual copyright case. Since then, I’ve been doing double duty as a development consultant/manager/software architect for a large Australian online retailer and working on a start-up, the latter of which is now occupying me full time (which means 70+hr weeks, as is the way of these things…). Both of these later activities have informed significantly on the issues of software quality in general and exception-handling in particular, and so I come back full of ideas (if not completely fresh and full of energy) and I intend to get back to more writing and even meet the odd publishing deadline.

In this instalment, I consider a little more about what exceptions are, how they’re used in recoverable situations, and what characteristics are required specifically to facilitate the discrimination between recoverable and practically-unrecoverable conditions. This is a much bigger sub-subject than I’d ever anticipated when I started on the topic those several years ago, and I now expect to be covering recoverability alone for several instalments. I won’t make any further predictions beyond that. I examine problems both with the design and use of exceptions (and exception hierarchies) in reporting failure, and the misapprehension regarding and misuse of thrown exceptions by programmers. I finish by introducing a new open-source library, Quench.NET, that is designed to assist with the hairy task of retrofitting fixes to the former, and tolerating the latter.

Exception-class Split Personality Syndrome

System.ArgumentException serves double duty, both as a specific exception class that is used to report specific conditions, and as a base class of a group of (argument-related) exception classes. This overloading of purpose raises two problems.

First, it means that one must be careful to remember to include the required derived types in a ‘multiple catch handler block’ in order to catch the specific types, and to order them correctly (most specific classes first).

Second, and more importantly in my opinion, it makes the intent (of the type) ambiguous:

  • Am I catching ArgumentException as ArgumentException, or am I catching all ‘argument exceptions’?; and
  • Why does condition A have its own specific exception type (e.g. System.ArgumentOutOfRangeException) and condition B does not and instead uses the general one ArgumentException)?

For example, why can we not have an ArgumentInvalidValueException, maybe even an ArgumentEmptyException, and so forth?

The same problem occurs with System.IO.IOException and its parsimonious stable of derived classes – you get a FileNotFoundException and a DirectoryNotFoundException ok, but if some failures that occur when trying to access a network path yield a plan-old IOException! Unlike argument exceptions, I/O exceptions are likely to be involved in making decisions about recoverability, so this design flaw has much more serious consequences.

Prelude of the pedantic contrarian

As I expressed at the end of the first instalment on exceptions, exceptions are not an error-reporting (or, more properly, a failure-reporting) mechanism per se; all they are intrinsically is a flow-control mechanism. It just so happens that they can be (and, indeed, were conceived to be and should be) used as a failure-reporting mechanism. But just because that’s how they’re often used does not mean that that’s what they are. I’ve been surprised lately just how hard a concept this is to convey to programmers, so I’m going to hammer it home with a simple sample (see Listing 1).

namespace ExceptionFlowControl
{
  using System;
  using System.IO;

  class FindCompletionException
    : Exception
  {
    public string FileDirectoryPath 
      { get; private set; }
    public DateTime FileDate { get; private set; }
    public long FileSize { get; private set; }

    public FindCompletionException(
      string   directoryPath
    , DateTime dateTime
    , long     size
    )
    {
      FileDirectoryPath = directoryPath;
      FileDate = dateTime;
      FileSize = size;
    }
  }

  static class Program
  {
    private static void FindFile
       (string directory, string fileFullName)
    {
      System.IO.DirectoryInfo di = 
         new DirectoryInfo(directory);
      foreach(FileInfo fi in
         di.EnumerateFiles(fileFullName))
      {
        if(fi.Name == fileFullName)
        {
          throw new FindCompletionException
           (fi.DirectoryName
          , fi.LastWriteTime, fi.Length);
        }
      }
	        foreach(DirectoryInfo subdi in
        di.EnumerateDirectories())
      {
        FindFile(subdi.FullName, fileFullName);
      }
    }

    static void Main(string[] args)
    {
      if(2 != args.Length)
      {
        Console.Error.WriteLine("USAGE: 
          program <root-directory><file>");
        Environment.Exit(1);
      }

      Environment.ExitCode = 1;
      try
      {
        FindFile(args[0], args[1]);
        Console.Error.WriteLine("could not find
          '{0}' under directory '{1}'",
          args[1], args[0]);
      }

      catch(FindCompletionException x)
      {
        Console.WriteLine("found '{0}' in '{1}':
          it was modified on {1} and is {2}
          byte(s) long", args[1],
          x.FileDirectoryPath, x.FileDate,
          x.FileSize);

        Environment.ExitCode = 0;
      }

      catch(Exception x)
      {
        Console.Error.WriteLine("could not find
          '{0}' under directory '{1}': {2}",
          args[1], args[0], x.Message);
      }
    }
  }
}
			
Listing 1

Now, there are a whole lot of simplifications and compromises in this code for pedagogical purposes (such as use of private-set properties rather than immutable fields – yuck!) since I’m trying to make my point in the smallest space. (The fact that I cannot force myself to skip the failure-handling code adds greatly to the evident size of this simple code, but I hope you understand why I cannot do so, gentle reader.)

I believe it’s self-evident that the FindCompletionException type is used exclusively within the normative behaviour of the program. It is not used to indicate failure; rather it is used specifically to indicate that the program has achieved its aim, which is to find and report the location, size and modification time of the first file matching the given name. Should I choose to do so, I could employ this tactic in real production code, and the program would not be in any sense ill-formed for the baroque nature of its implementation. (I confess that I have used exceptions as a short-cut normative escape from deep recursion, but it was a very, very long time ago, when I was only a neophyte C++ programmer, just out of college ... for shame! :$)

’Please note, still-gentle reader, that I’m not saying this is good design, or a good use of exceptions; indeed, I’m saying it’s poor design, and a foolish use of exceptions. But it does illustrate what is possible, and in a way that is not a perversion of the language or runtime.

’Since I have become incapable of writing code that is knowingly wrong/inadequate – especially in an article espousing quality – I have perforce included a catch(Exception) handler to deal with failing conditions (such as specification of an invalid directory). Consequently, this program not only illustrates my main point that exceptions are a flow-control mechanism, but also makes a secondary illustration that the use of the exceptions is overloaded – in this case we're doing both short-circuiting normative behaviour and failure-handling.

Had I been so foolish I could have gone further in overloading the meaning applied to the use of exceptions by eschewing the if-statement and instead catching IndexOutOfRangeException and issuing the USAGE advisory there, as shown in Listing 2. Alas, I’ve seen exactly this perversion in production code recently.

. . .
static void Main(string[] args)
{
  Environment.ExitCode = 1;

  try
  {
. . . // same
  }
  catch(IndexOutOfRangeException)
  {
    Console.Error.WriteLine("USAGE: program
      <root-directory><file>");
  }
  catch(FindCompletionException x)
  {
. . . // same
  }
  catch(Exception x)
  {
. . . // same
  }
}
			
Listing 2

The problems of such a choice are described in detail in [QM#5]; in summary:

  • IndexOutOfRangeException is taken by many programmers to be an indication of programmer error, not some valid runtime condition; and, given that
  • In a more complex program there may be a bona fide coding error that precipitates IndexOutOfRangeException, which is then (mis)handled to inform the surprised user that (s)he has misused the program.

Thus, we would see a conflation of (and confusion between) three meanings of the use of exceptions:

  • for short-circuit processing (normative);
  • for reporting runtime failure (recoverable and, as in this case, practically-unrecoverabl); and
  • for reporting programmer error (faulted).

Recoverable

When we looked at the business of handling practically-unrecoverable conditions [QM#5], we utilised two aspects of the exceptions that were being used to report those conditions. First, we relied on the type of the exception to indicate generally the broad sweep of the failure: a std::bad_alloc was interpreted as an out-of-memory condition in the runtime heap; a clasp::clasp_exception indicated that an invalid command-line argument (combination) had been specified by the user; a recls::recls_exception indicated that a file-system search operation had failed.

A second-level of interpretation was then done in some of those handlers to inform (the user, via contingent reports, the administrator/developer/power-user via diagnostic log-statements) the nature of the failure by utilising state ‘attributes’ of the exception types: the unrecognised command-line option/flag for a clasp_exception; the missing/inaccessible item for a recls_exception.

Information can be conferred from the site of failure to the site of handling in three ways. Two are obvious: type and state; the third is context, which can be so obvious as to escape observation.

Type, state, and context

In the examples examined in [QM#5], only type was used in discriminating between how to handle the practically-unrecoverable conditions, and only insofar as it determined what type-specific state information could be presented in the diagnostic log statement and contingent report in addition to (or, in some cases, instead of) the what()-message information provided by every C++ exception class (that derives from std::exception).

Handling (potentially) recoverable conditions requires more than this simple and somewhat vague recipe. This is because a decision has to be made by a chunk of decision-making software coded by we humans, and since (non-faulting) software only does what we tell it to do (however indirectly and/or inchoately), we are going to need information with a whole bunch of exacting characteristics.

Before I define them – and I won’t provide a fully-fledged definition until a later instalment – I would like to consider an example drawn from recent practical experience, which illustrates nicely issues of type, state, and context. Consider the C# code in Listing 3, simplified for pedagogical purpose. There are four important normative actions to consider:

string portName = . . .
int baudRate = . . .
Parity parity = . . .
int dataBits = . . .
StopBits stopBits = . . .
byte[] request = . . .
byte[] response = . . .

try
{
  SerialPort port = new SerialPort(portName,
    baudRate, parity, dataBits, stopBits);
  port.Open();
  port.Write(request, 0, request.Length);
  int numRead = port.Read(response, 0,
    response.Length);
  . . .
}

catch(Exception x)
{
  Console.Error.WriteLine("exception ({0}): {1}",
    x.GetType().FullName, x.Message);
  Environment.Exit(1);
}
			
Listing 3
  1. Construct an instance of the port (System.IO.Ports.SerialPort);
  2. Open the port instance;
  3. Write the request to the port;
  4. Read the response from the port.

The constructor of System.IO.Ports.SerialPort has the following signature:

  SerialPort(
    string    portName
  , int       baudRate
  , Parity    parity
  , int       dataBits
  , StopBits  stopBits
  );

Open() has no parameters and a return type of void. Read() and Write() have the following signatures:

  int
  Read(
    byte[]  buffer
  , int     offset
  , int     count
  );
  
  void
  Write(
    byte[]  buffer
  , int     offset
  , int     count
  );

There are numerous ways in which this set of statements can fail, ranging from programmer error through to hardware failures, and they can be said to fall into one of two failure classes: unacceptable arguments, and runtime failures.

Unacceptable arguments

This class of failures pertain to ‘unacceptable arguments’, and comprises predominantly those that arise from conditions that could, in theory, be dealt with by other means (without incurring thrown exceptions); in some cases failure to do so can be deemed reasonably to be programming error; in others a judgement has to be made as to the best manner to deal with the possible condition; in only a minority is an exception the only credible alternative. The findings are summarised in Table 1.

Condition Method Exception ParamName
Null Port Name ctor System.ArgumentNullException PortName
Null Request Write() System.ArgumentNullException buffer
Null Response Read() System.ArgumentException buffer
Empty Port Name ctor System.ArgumentException PortName
Invalid Port Name Open() System.ArgumentException portName
Zero Baud Rate ctor System.ArgumentOutOfRangeException BaudRate
Negative Baud Rate ctor System.ArgumentOutOfRangeException BaudRate
Zero Data Bits ctor System.ArgumentOutOfRangeException DataBits
Negative Data Bits ctor System.ArgumentOutOfRangeException DataBits
Out-of-range Data Bits ctor System.ArgumentOutOfRangeException DataBits
Unnamed Parity Values ctor System.ArgumentOutOfRangeException Parity
Unnamed StopBits Values ctor System.ArgumentOutOfRangeException StopBits
Invalid Buffer Length Read(), Write() System.ArgumentException (null)
Table 1

Null port name; null request; null response

If portName is null, the constructor throws an instance of System.ArgumentNullException (which derives from System.ArgumentException) with the ParamName property value (string) "PortName", and the Message property (string) value "Value cannot be null.\r\nParameter name: PortName".

When considering the constructor alone, this is definitive, because no other parameter of the constructor can (conceivably) throw this exception. If we consider all four statements, though, it is clear that an ArgumentNullException can also come from Read() and Write(): if request is null, the Write() method throws an instance of ArgumentNullException, with ParamName value "buffer" and the Message value "Buffer cannot be null.\r\nParameter name: buffer"; if response is null, the Read() method throws an instance of ArgumentNullException with exactly the same properties.

Were we to need to identify programmatically the specific source of the problem given the try-catch block structure of Listing 3, we would not be able to discriminate between null argument failure to Write() and Read() at all, and to do so between either of these methods and the constructor would require reliance on the value of the ParamName property. Thankfully, we do not have to worry about this, since it is hard to conceive of any scenario where we might want to wrest recovery from such a circumstance. In my opinion, passing a null value for portName, request, or response is a programming error, plain and simple, and all thoughts of recovery should be forgotten.

Empty port name

If portName is the empty string, "", the constructor throws an instance of ArgumentException, with the ParamName value "PortName", and the Message value "The Portname cannot be empty. \r\nParameter name: PortName".

Invalid port name

If portName is "C:\Windows", the constructor completes, but the Open() method throws an instance of ArgumentException, with the ParamName value "portName", and the Message value "The given port name does not start with COM/com or does not resolve to a valid serial port.\r\nParameter name: portName".

Zero baud rate

If baudRate is 0, the constructor throws an instance of System.ArgumentOutOfRangeException, with the ParamName value "BaudRate", and the Message value "Positive number required.\r\nParameter name: BaudRate".

Negative baud rate

If baudRate is -1, the behaviour is exactly the same as with a zero baud rate.

Zero data bits

If dataBits is 0, the constructor throws an instance of ArgumentOutOfRangeException, with the ParamName value "DataBits" and the Message value "Argument must be between 5 and 8.\r\nParameter name: DataBits".

Negative data bits; out-of-range data bits

If dataBits is -1, or any number outside the (inclusive) range 5–8, the behaviour is exactly the same as with a zero data bits.

Unnamed parity values

If parity is (Parity)(-1), the constructor throws an instance of ArgumentOutOfRangeException, with the ParamName value "Parity" and the Message value "Enum value was out of legal range.\r\nParameter name: Parity".

It has the same behaviour for other unnamed Parity values [ENUMS].

Unnamed StopBits values

The constructor exhibits the same behaviour as for unnamed values of Parity (except that the parameter name is "StopBits").

Invalid buffer length

If we specify, say, response.Length + 10 in our call to Read() (and having written enough such that it will attempt to use the non-existent extra 10), then we will be thrown an instance of ArgumentException, with ParamName value null and the Message value "Offset and length were out of bounds for the array or count is greater than the number of elements from index to the end of the source collection.".

Analysis

Unlike the case with null port name, it may be argued that empty and invalid port names are legitimate, albeit practically-unrecoverable, runtime conditions: a user may have entered either in a dialog, or they may be obtained via configuration file settings (that can be wrong). Clearly, being able to filter out an empty name at a higher level is both easy and arguably desirable, and it depends on the details of your design as to whether you choose to do so.

It is clear that zero and negative baud rates are incorrect, and specification of either could be stipulated to be programmer error (and therefore could have been prevented outside the purview of this class, i.e. in the client code’s filtering layer). There are several widely-recognised baud-rates, but, as far as I know, there is no final definitive list of serial port baud rates. Hence, we have to be able to supply a (positive) integer value, and we need to be able to account both for reading this from somewhere (e.g. config., command-line) at runtime and for the fact that the value presented may be rejected by the device (e.g. as being outside its minimum/maximum range).

In most respects, data-bits may be considered in the same way as baud rate, just that the possible range is (by convention) small and finite, i.e. between five and eight.

Where things get more interesting are in the two enumeration type parameters, parity and stopBits. By providing an enumeration, the API implies that the set of options is small, known exhaustively, and fixed, and that there should be no way to specify programmatically an invalid value. Furthermore, since .NET has reasonably good (not great!) facilities for helping interconvert between enumeration values and strings, we should be able to rely (at the level of abstraction of SerialPort) on receiving only valid values.

There are three important specific points to make:

  1. The port name is not validated (completely) until the port is opened. I suggest that this is reasonable, albeit not necessarily desirable;
  2. The value of the ParamName property, "portName", differs from that in the first two cases, where it was "PortName". Clearly there’s an inconsistency in the implementation, and I suggest that this means we cannot rely on the values contained in ParamName as being definitive (which I doubt anyone would be contemplating anyway). I further suggest that we must distrust any framework exception string properties as having accurate, precise, and reliable values;
  3. While it’s easy to imagine how it may have come to be implemented this way, it is nonetheless inconceivable to me that a parameter-less method – Open() in this case – can throw an instance of ArgumentException (or any of its derived types)! Perhaps I’m being precious, but I find this tremendously confidence-sapping when faced with using this component.

Runtime failures

This class of failures comprises those that arise entirely through circumstances in the runtime environment, and could be experienced by even the ‘best designed program’ (whatever that means). The findings are summarised in Table 2.

Condition Method Exception [HRESULT] Message
Unknown Port Name Open() System.IO.IOException 0x80131920 "The port 'COM9' does not exist."
Device Disconnected Before Write Write() System.IO.IOException 0x80070016 "The device does not recognise the command."
Device Disconnected Before Read Read() System.InvalidOperationException 0x80131509 "The port is closed."
Table 2

Unknown (but valid) port name

If portName is "COM9", which does not exist on my system, the constructor completes but the Open() method throws an instance of System.IO.IOException, and the Message (string) property has the value "The port 'COM9' does not exist."; the InnerException property is null.

The HRESULT value associated with the exception is 0x80131920, which is COR_E_IO, as documented in MSDN for System.IO.IOException. Note that this is not available via any public field/property/method, and requires reflection to elicit (when possible).

Device disconnected before write

In the case where portName is of valid form and corresponds to an existing and available port on the current system, the call to Open() may return (indicating success). If the port device becomes subsequently unavailable – e.g. by depowering it or unplugging the device from the computer – a call to Write() results in the throwing of an instance of System.IO.IOException, with the Message value "The device does not recognize the command."; the InnerException property is null.

The HRESULT value associated with the exception is 0x80070016, which is no well-known constant (of which I’m aware) in its own right. However, since it uses the well-known FACILITY_WIN32 (7), the lower 16-bits should correspond to a Windows ‘error’ code (defined in <Filename>WinError.h</Filename>). The Windows constant ERROR_BAD_COMMAND (22L == 0x16) is associated with the message "The device does not recognize the command.", so that seems like our culprit. Clearly, some .NET exceptions carry Windows failure codes (wrapped up in HRESULT values).

Device disconnected before read

In the case where the Write() succeeds but the device then becomes unavailable, the subsequent call to Read() results in the throwing of an instance of System.InvalidOperationException, with the Message value "The port is closed."; the InnerException property is null.

The HRESULT value associated with the exception is 0x80131509, which is COR_E_INVALIDOPERATION, as documented in MSDN for System.InvalidOperationException. Note that, just as with IOException, this is not available via any public field/property/method, and requires reflection to elicit (when possible).

Furthermore, in some circumstances (e.g. when performing reads on another thread via SerialPort’s DataReceived event, which may have ‘first bite’ at the device failure, resulting in a change of state, such that) a call to Write() may also result in a thrown InvalidOperationException, rather than IOException.

Analysis

There are clear problems presented by these runtime failures for arbitrating successfully between recoverable and practically-unrecoverable conditions, and for providing useful information in any diagnostic log statements/contingent reports in either case.

Perhaps the least important problem is that two of the three messages border on the useless:

  • "The port is closed" is not likely to enlighten the normal or power user much beyond the self-evident insight that ‘the system is not working for me’;
  • "The device does not recognise the command" sounds like the basis of something useful, but it neglects to specify (or even hint at) which command: does it mean the .NET class method just invoked, or the underlying system device command. Whichever, it seems bizarre since the device must at some other times recognise the ‘command’, so isn’t the real message that ‘the device cannot fulfil the command <COMMAND> in the current state’, or some such;
  • Only the message "The port 'COM9' does not exist" is adequate, insofar as it will provide something genuinely meaningful to whomever will read the diagnostic log/contingent report

All the remaining problems are more serious, and reflect what I have experienced to be a very poor standard of design in the .NET exception hierarchy and of application of the exception types in other components, particularly those that interact with the operating system.

I also note here that the Exception.Message property documentation is unclear, particularly with respect to localisation, which means we cannot rely on the message contents with any precision such as we would need were we to think of, say, parsing information in exception messages in order to make recoverability decisions. The information contained therein can only be relied upon to be potentially helpful in a diagnostic log statement / contingent report.

First, although I’ve mentioned somewhat casually the HRESULT values associated with the exceptions, these are not publicly available. They are stored in a private field _HResult within the Exception type, which is read/write-accessible to derived types via the HResult protected property. Some exception classes, such as IOException, provide the means to supply an HRESULT value in the constructor that directly pertains to the condition that precipitated the exception; when not provided a stock code is used that represents the subsystem or exception class (e.g. COR_E_IO for IOException), as seems always the case for those exception types that do not provide such a constructor (e.g. COR_E_INVALIDOPERATION for InvalidOperationException).

The only way to access this value is to use reflection, whether directly (e.g. using Object.GetType(), Type.GetField(), FieldInfo.GetValue()) or via System.Runtime.InteropServices.Marshal.GetHRForException(). In either case, this will succeed only in execution contexts in which the calling thread has the requisite rights. Absent that, we cannot assume we’ll be able to access the value, which pretty much kills it for general-purpose programming.

And it gets worse. The meaning ascribed to the HRESULT value is not consistent. In the above three cases it is condition-specific only in the Device Disconnected Before Write case, which is reported by an instance of IOException. In the other two cases it is sub-system/exception-class specific, one reported by InvalidOperationException and the other by IOException! We cannot expect condition-specific information even within one (IOException-rooted) branch of the Exception family tree.

Third, given that (i) .NET does not have checked exceptions and documentation must always be taken with a grain of salt, and (ii) I’ve discovered the above behaviour through testing, which will necessarily be inexhaustive, we must pose the obvious question: How do we know there aren’t others? Indeed, the documentation states that: Open() may also throw System.UnauthorizedAccessException; Read() may also throw System.TimeoutException; Write() may also throw System.ServiceProcess.TimeoutException. Note the two different TimeoutException types. If the documentation is correct, it’s bad design, which does not engender confidence. If the documentation is wrong, it’s bad documentation, which does not engender confidence.

Imagine now, if you will, how we might actually use the serial port in the wild. In one of the daemons that we’re developing we are interfacing to an external hardware device via a serial port. The device runs continually, and the daemon must (attempt to) maintain connectivity to it on an ongoing basis. In such a case it is essential to be able to react differently to a practically-unrecoverable condition, such as the wrong port name being specified in the daemon configuration information (Unknown Port Name), and a (potentially) recoverable loss of connectivity to the device (Device Disconnected ...). In the former case, we want to react to the reported condition by issuing diagnostic and contingent report information and terminating the process (since there’s absolutely no sense in continuing); in the latter we want the daemon to issue diagnostic information (which will raise alarms in the wider system environment) but retry continually (until it reconnects or until a human tells it to stop). In order to write that system to these requirements, we need to be able to distinguish between the failures.

Thus, the final and most compelling, disturbing, and disabling problem is that the .NET SerialPort component does not support our eminently sensible failure behaviour requirements, because the Unknown Port Name condition and the Device Disconnected ... conditions are reported by the same type of exception, IOException, whose messages we must not parse (since we do not know if we can trust them), and whose putatively definitive discriminating HRESULT information is assigned inconsistently and may not even be accessible!

There is no way, with the given type and state information provided, to discriminate between these two conditions.

Contrast this with the situation in C, programming with the Windows API functions CreateFile(), WriteFile(), and ReadFile(): if we pass an unknown COM port we get ERROR_FILE_NOT_FOUND; if we pass "C:\Windows" we get ERROR_ACCESS_DENIED; if we have a comms failure we get ERROR_BAD_COMMAND. It’s certainly arguable that the latter two constants’ names do not directly bear on the conditions that precipitated them, but the point is that programming at the C-level allows us to discriminate these conditions by state, relying on accessible and (as far as I can tell) predictable and reliable values; programming at the C#-level (using the .NET standard library) does not.

Resort to context

This only leaves context. Our only recourse is to wrap separately the calls to Open() and Write() in try-catch in order to intercept the useless-in-a-wider-context exceptions and translate them into something definitive, along the lines shown in Listing 4.

SerialPort port = new SerialPort(portName,
  baudRate, parity, dataBits, stopBits);

try
{
  port.Open();
}
catch(IOException x)
{
  throw new UnknownPortNameException(portName,
    x);
}

try
{
  port.Write(request, 0, request.Length);
}
catch(IOException x)
{
  throw new PortWriteFailedException(x);
}
catch(InvalidOperationException x)
{
  throw new PortWriteFailedException(x);
}

try
{
  int numRead = port.Read(response, 0,
    response.Length);
  . . .
}
catch(InvalidOperationException x)
{
  throw new PortReadFailedException(x);
}
			
Listing 4

Let’s be clear about this: what we’re trying to do with the serial port is a combination of good practices – simplicity, abstraction, transparency – in so far as we’re focusing on the normative code, and relying on the ‘sophistication’ of exception-handling to allow us to deal with failures elsewhere. Unfortunately, the .NET exception hierarchy is poorly designed and badly applied, so we’ve been forced to pollute the code with low-level try-catch handlers, using context to rebuild the missing type/state information before passing on the reported failure condition; it’s arguable that using P/Invoke to get at the Windows API calls and using return codes would have been better, which is quite a reversal!

We’ve had to violate one of the important purposes/advantages of the exception-paradigm, the separation of normative code from failure-handling code, for the purposes of increased transparency and expressiveness. In this case, making the code adequately robust results in a serious detraction from both. Thankfully, we can rely on old reliable ‘another level of indirection’ by wrapping all the filth in a class, although we cannot hide from the burden of creating appropriate exception types, nor hide our client code completely from the burden of understanding and coupling to them. All the details of which I now cunningly leave until next time. And now for something slightly different ...

Quench.NET

Quench is an open-source library designed to:

  1. Facilitate diagnosis and correction of badly-written application code that inappropriately quenches exceptions; and
  2. Provides assistance in the use (and correction of that use) of badly-designed standard and third-party components that indicate failure, via thrown exceptions, without providing adequate information to delineate unambiguously between practically-unrecoverable and recoverable conditions

In both regards, Quench facilitates post-hoc discovery and adjustment of application-behaviour, while ensuring that ‘safe’ defaults are applied, at varying levels of precision.

Quench.NET is the first (and currently only) application of the Quench design principle; others will follow in due course.

There be dragons!

Consider the (C#) code fragments in Listings 5–8.I find it deeply concerning to see such code in production software. Over the last couple of years I’ve had occasion to work with codebases containing literally thousands of inappropriate exception quenches; indeed, the extent of such constructs in one codebase meant that a case-by-case remediation was quite impractical. (I find it even more disheartening to see code such as this in strongly-selling and widely-recommended text books – I’ve encountered the second in one such book I read last year. There is a considerable challenge in writing worthwhile material about programming because publishers – and readers, according to publishers – want only pithy tomes that can fit in a pocket and be read in a day. As a consequence, many books show simplistic views of real programs (and program fragments) that may not reflect fairly their author’s practice in order to focus on their subject and to present digestible quanta of material to the reader. As I have already acknowledged in [QM#5], this is a hard conflict to redress satisfactorily: certainly, I am not sure that I have not erred previously in this way myself. Nonetheless, now having realised the full import and complexity of failure-handling, I can’t forget it, and I can’t forgive books and articles that mislead, since I meet far too many programmers who have been misled.)

In the first case (Listing 5), the user intends to try an operation that may fail (and will indicate its failure by throwing an exception), and to provide some reasonable default instead if it does so. This is a perfectly reasonable intent, just realised badly. The problem is, catching Exception (rather than the appropriate precise expected exception type) is far too broad: every exception in the .NET ecology derives from Exception, and this code will quench (almost) all of them, including those that are practically-unrecoverable (such as System.OutOfMemoryException). Dragon!

try
{
. . . // something important
}
catch(Exception /* x */)
{
  return someValue;
}
			
Listing 5

The way to do this properly is as shown in Listing 9: it’s hardly any more effort to catch only the exception representing the failure we want to intercept, rather than (almost) all possible failures.(Note: prior to .NET 2 this Parse()&catch was the way to try-to-parse a number (or date, or IP address, or ...) from a string. Thankfully, this ugly and inefficient technique was obviated with the introduction of the TryParse() methods, which return true or false, and do not need to throw System.FormatException.)

The second case (Listing 6) also represents good intentions, and has a veneer of robustness insofar as it issues a diagnostic log statement. Alas, this is specious comfort. First, diagnostic log statements are subject to the principle of removability [QM#6], so provision of a diagnostic log statement alone is an invitation to do nothing. Rather, the (removable) diagnostic log statement should be associated with additional (non-removable) action, whether that be to set a flag, throw a different exception, return, or whatever. Second, we still have the huge but subtle issue that we’re catching everything, including those things that should denote practically-unrecoverable conditions.

try
{
 . . . // something important
}
catch(Exception x)
{
  LogException(x);
}
			
Listing 6

Furthermore, the text book to which I’ve alluded above has a construct like this – albeit that it’s notionally a contingent report, in the form of Console.WriteLine(x); – at the outermost scope in Main(), so it fails the requirement to indicate failure to the operating environment [QM#6]: any exception is caught and yet the program returns 0, indicating successful execution, to the operating environment. Dragon!

The third case is not even defensible from the position of good intentions gone bad. This is flat-out, unequivocal, inexcusable, malpractice. If you write code such as this, you don’t deserve to earn a crust as a programmer. If you encounter code such as this and don’t raise the alarm to your team lead, manager, head of IT..., then you’re being incredibly reckless and risking having to deal with system failures that you may be literally clueless to diagnose. The problem is a distilled version of the slip we’ve seen in the first two cases: when you write code such as this you are asserting ‘I know everything that can possibly happen and I deem it all to be of no consequence’. Apart from the most trivial cases, I doubt anyone can stand behind either half of that assertion. Dragon!

The fourth case (Listing 8) is the same as the third (Listing 7) – just a bit of syntactic sugar(!) provided by the C# language, to avoid the unused reference warning that would result from compiling the code from Listing 7 at warning level 3 or 4. Here, the language designers have gone out of their way to ease the application of a total anti-pattern. Honestly! (More a case of Bats in the Belfry than Dragon!)

try
{
. . . // something important
}
catch(Exception x)
{}
			
Listing 7
try
{
. . . // something important
}
catch
{}
			
Listing 8

Insufficient information to determine recoverability

As I mentioned earlier, it is conceivable that other exception types may be thrown in circumstances not yet considered, indeed, in circumstances never encountered before the system goes into production. As we’ve already seen with the ambiguity in the serial port Write() operation, sometimes exceptions may be thrown that should be quenched and handled in ways that are already established for other types.

But a key plank of the exception paradigm is that if you don’t know of/about an exception type, you cannot be written to expect it and should instead allow it to percolate up to a layer that can handle it, albeit that that handling may mean termination.

How do we deal with this situation?

int i;
try
{
  i = Int32.Parse(s);
}
catch(System.FormatException)
{
  i = -1;
}
			
Listing 9

Quench(.NET) to the rescue

In the first situation – the carelessly high-level Exception quenching – the answer was to rewrite all the offensive constructs in terms of Quench.NET + diagnostic logging facilities + a throw statement, as shown in Listing 10, which illustrates Quench’s (attempt at a) fluent API. Due to the sheer number of cases, and the fact that most of them followed just a few simple forms, more than 90% of the cases were effected automatically by a custom-written Ruby script; the remainder by hand and/or IDE macro.

try
{
  . . . // something important
}
catch(Exception x)
{
  LogException(x);
  if(Quench.Deems.CaughtException.MustBeRethrown
     (x))
  {
    throw;
  }
}
			
Listing 10

Changing the behaviour of a large commercial production system with a mountain of such technical debt, even if the changes are to stop doing the wrong thing (of ignoring important failures), is a sensitive and risky undertaking. Even though theoretically changing things for the best, the complex behaviour profile is something to which the ‘organisational phenotype’ – the business systems, the admin and support function, the development team, the users, the customer assistance operatives – has become accustomed. Because of this, it was important to allow the system to carry on precisely as is, and to tune its behaviour in a slow, incremental, methodical, and observed way. Quench supports this because its default response to the (fluent API) question Quench.Deems.CaughtException.MustBeRethrown() is ‘yes’ (true). This method (and others in the API) are overloaded in order to allow the caller to specify more precisely the catch context, allowing fine-grained tuning of recoverability for exception types and/or catch contexts; I’ll provide more details next time.

In the second situation – the use of ‘surprising’ APIs, usually (in my experience) on operating system façades – the solution looks much the same. The difference is that this is not retrofitted in extremis, but is designed and coded up-front. Listing 11 is an extract of some a daemon control routine from one of our current systems under development. Since I (have learned to) mistrust expectations (and documentation) about what .NET APIs will throw, I pre-empt any possible surprises by using Quench. I want practically-unrecoverable exceptions (such as OutOfMemoryException, already acknowledged with its own catch) that don’t directly pertain to a failure of registration per se to be handled at a higher level (and stop the program), and since I do not (yet) know all the exceptions that may emanate from the system registration, I employ Quench to allow me to tune this at runtime (likely via configuration); integration testing (and, unfortunately, actual use) may inform more definitely, in which case the code can be refactored to catch (and rethrow) specific exceptions rather than use Quench.

private static bool
DoInstallationOperation(
string installationOperationName
, Action<AssemblyInstaller, IDictionary> func)
{
  using(Pantheios.Api.Scope.MethodTrace
    (Severity.Debug))
  {
    try
    {
      AssemblyInstaller installer = 
        new AssemblyInstaller
        (Assembly.GetEntryAssembly(),
        new string[0]);
      IDictionary state = new Hashtable();
      func(installer, state);
      installer.Commit(state);
      Pantheios.Api.Flog(RegistrationLog,
        Severity.Notice,
        "{0} service installed successfully",
        Program.Constants.ProcessIdentity);
      Console.Out.WriteLine("{0} service
        installed successfully",
        Program.Constants.ProcessIdentity);

      return true;
    }
    catch(OutOfMemoryException)
    {
      throw;
    }
    catch(Exception x)
    {
      Pantheios.Api.Log(Severity.Alert,
        "Could not ", installationOperationName,
        " service ",
        Program.Constants.ProcessIdentity,
        ": ", Pantheios.Api.Insert.Exception(x));
      if(Quench.Deems.CaughtException
        .MustBeRethrown(x, typeof(Program)))
      {
        throw;
      }

      Console.Error.WriteLine("Could not {2}
        service {0}: {1}",
        Program.Constants.ProcessIdentity,
        Pantheios.Api.Insert.Exception(x),
        installationOperationName);
    }
  }

  return false;
}
			
Listing 11

There’s a third, lesser motivation for using Quench, albeit one that, based on my experience, I think is particularly relevant with .NET. Consider the case where we’ve done the mooted experiential learning in testing/production and now wish to remove Quench, since its use rightly gives us an uneasy feeling that we’ve somehow done the wrong thing. However, it can be the case that we’ve identified a large number of such exceptions, and either they do not share a base class that we could catch in their stead, or some of their peer classes are ones that we do not wish to catch. Whatever the reason, we’re now left with a large number of catch-handlers, several of which we wish to take the same action, as in Listing 12. In this case, use of Quench may be the lesser of two evils, since the list of which exceptions can be quenched (and, by inference, which others must be rethrown) can be maintained, either in code or in configuration, more flexibly and neatly. This is even more advantageous where we may find ourselves having the same list of quench-vs.-throw rules in several similar contexts.

public static void Blah1()
{
  Exception y = null;

  try
  {
    . . . // complex operation, 
          //can throw many exceptions
  }
  catch(OutOfMemoryException)
  {
    throw;
  }
  catch(SomeLeafException x)
  {
    y = x;
  }
  catch(AnotherLeafException x)
  {
    throw;
  }
  catch(SomeParentException x)
  {
    y = x;
  }
  catch(SomeEntirelySeparateException x)
  {
    y = x;
  }
  catch(AnotherEntirelySeparateException x)
  {
    y = x;
  }
  catch(Exception)
  {
    throw;
  }
  System.Console.WriteLine("{0}: ", y);
}
			
Listing 12

In whichever case, Quench is not a library to be applied lightly, and it should never be used as a licence to slack off on thinking, reading (documentation), or testing. But, when you have absolutely, positively got to handle every exception in the room, accept no substitutes!

In the next issue

I’m somewhat gun-shy about making predictions of when, but I do feel certain that the next instalment, when it comes, will consider more definitively the kinds of information that exceptions should contain, including how to (re)define exception hierarchies that offer rich information on which one can base solid recovery decisions, along with some/all of the following:

  • the design principles, implementation, and customisation and use of Quench.NET. In the meantime, please check it out (at http://www.libquench.org/);
  • details of the serial port abstraction, and the supporting exception hierarchy; and
  • maybe the new STLSoft C++ exception hierarchy (if I get time between all my C# coding).

References

[ENUMS] Enumerating Experiences, Matthew Wilson, CVu, September 2011

[QM#5] Quality Matters 5: Exceptions: The Worst Form of ‘Error’ Handling, Apart from all the Others, Matthew Wilson, Overload 98, August 2010

[QM#6] Quality Matters 6: Exceptions for Practically-Unrecoverable Conditions, Matthew Wilson, Overload 99, October 2010

Overload Journal #120 - April 2014 + Programming Topics