C-side Re-sort

In spite of the attention given to object-oriented development, TDD and modern testing frameworks, it is worth understanding how and why unit testing has an important role to play in general, regardless of the technologies or broader development philosophy involved. This article walks through a simple example, highlighting some of the motivation behind basic unit testing and the practices involved but without following through into TDD.

Testing is one way of making the essentially invisible act of software development more visible [ Henney05_1 ]. Testing can occur at many different levels of a system's architecture, from whole system down to individual code modules, and in many different stages of a system's development. In the context of modern agile software development, testing is normally associated with Test-Driven Development (TDD), an approach that subsumes conventional unit-level testing, complementing its traditionally quantitative role with a qualitative practice of design.

In spite of the attention given to object-oriented development, TDD and modern testing frameworks, it is worth understanding how and why unit testing has an important role to play in general, regardless of the technologies or broader development philosophy involved. This understanding also applies to TDD when considered solely in terms of its testing aspect.

This article walks through a simple example, highlighting some of the motivation behind basic unit testing and the practices involved [ Henney05_2 ] but without following through into TDD. Thus it is a programmer testing responsibility to carry out code-centric tests, but automated tests ensure that unit-level tests are executed as code on code rather than by programmer on code. Example-based test cases, as opposed to exhaustive tests, ensure that test cases adopt a sustainable black-box approach to testing.

Standard portable C is used in the narrative example to remind and emphasise that the applicability of unit-testing techniques is not restricted to object-oriented programming. It also demonstrates that, at the very minimum, nothing more than the humble assert is needed to realise test cases. The narrative also highlights the relationship between the contract metaphor and testing, and the tension between programming by contract as an assertion style and the use of contracts to inform test cases. Thus the story also helps to clarify some of the common misunderstandings concerning the relationship between programming by contract and testing.

A Sort of Beginning

Consider the following situation: a cross-platform C program whose logic involves, amongst other things, the sorting of a large number of integers; furthermore, sorting is found to dominate the performance and needs to be optimised.

The sorting function used is the C standard library qsort function. In spite of its name, it is not required to be implemented as quicksort. However, it is not the time complexity that is considered to be the issue - the implementations on the supported platforms appear to be fairly optimal in this respect - but the cost of comparison. The qsort function is generic in the sense that it works with untyped chunks of memory, i.e. void * to pass an array and size_t to indicate element count and size, but this means that the comparison must be passed in as a pointer to a function that performs the comparison correctly based on the actual types:

int int_comparison(const void *lhs_ptr,
                   const void *rhs_ptr)
{
  int lhs = *((const int *) lhs_ptr);
  int rhs = *((const int *) rhs_ptr);
  return lhs < rhs ? -1 : lhs == rhs ? 0 : 1;
}

This function is based on strcmp semantics, so that the result is less than zero, equal to zero or greater than zero depending on whether the left-hand argument compares less than, equal to or greater than the right-hand argument. The cost of continually calling a function through a pointer to perform what is essentially the single machine instruction for comparing integers is what is incurring the overhead.

The Programmer's Responsibility

The favoured solution is to replace the use of qsort with a custom-written function - let's call it quickersort - that is written directly in terms of int :

void quickersort(int base[], size_t length);

It is not necessarily hard to find a good algorithm - there is no shortage of published examples available - but who is responsible for ensuring that the function is correctly implemented? It is supposed to be a drop-in replacement and an improvement: a function that does not work or whose performance is worse than with qsort is unlikely to impress.

Of course, it is a programmer's responsibility to be diligent in implementation, but care in coding is not enough: sorting algorithms are notorious for defects that arise from typos, thinkos, assumptions and oversights; second guessing a would-be optimisation's effectiveness is often a fool's game; someone else may know a more effective solution. Peer review and static analysis of code can help - and should be employed - but the ultimate proof of the pudding is in the eating: the code needs to be tested. But with whom does the responsibility for testing lie? Ultimately a test of the software as a whole will pick up many problems, but this system view is somewhat second hand, removed from the point of the change that has been so precisely specified. Given that system-level testing is often a responsibility separated from that of code development, leaving problems to be discovered so indirectly introduces long feedback loops and a greater element of chance into the process of development. And given the precise nature of the requirement - to speed up sorting - and of the change - replace qsort with a hand-crafted alternative that is functionally equivalent but faster - to hand off a slower, defective piece of code as a fait accompli can be considered less than professionally responsible.

It is therefore a programmer testing responsibility to ensure a basic level of confidence in the code. Programmers are in the best position to catch a large class of possible problems before they introduce friction into the development process by causing others and themselves further problems. However, no programmer is perfect and programmers do not have unlimited time to spend crafting and checking their code. Some defects may remain, appearing only in a larger, more integrated context. This system perspective can be considered a separate responsibility, owned and exercised by others.

Test Automation

Returning to the programmer perspective, the next question is what to test for and how? Perhaps the most obvious feature of the requirement, as related to the code, is that quickersort should run faster than qsort. To do this effectively requires more data than can be typed in conveniently at a command-line driven test harness. Other than being arbitrary, the specific values don't matter, so a function can be used to populate an array of given length at runtime:

int *fill_unsorted(int base[], size_t length);
// returns base

A simple implementation can be written in terms of the standard rand function, but alternative, more application-specific distributions may be appropriate. Given this, it is possible to write a simple function to compare qsort and quickersort against one another on the same arbitrary data set:

void a_test_of_sorts(size_t length)
{
  size_t byte_length = length * sizeof(int);
  int *for_qsort     = fill_unsorted((int *)
      malloc(byte_length), length);
  int *for_quickersort = (int *)
      memcpy(malloc(byte_length), for_qsort,
      byte_length);
  clock_t start;

  start = clock();
  qsort(for_qsort, length, sizeof(int),
      int_comparison);
  print_result("qsort", length, 
      clock() - start);

  start = clock();
  quickersort(for_quickersort, length);
  print_result("quickersort", length,
      clock() - start);

  free(for_quickersort);
  free(for_qsort);
}

Arrays of the specified length are allocated and populated with the same data set, and then the times for qsort and for quickersort are taken and reported. The following reporting function presents results in comma-separated format with times in milliseconds:

void print_result(const char *label,
      size_t length, clock_t elapsed)
{
  printf("%i, %i, %s\n", (int) length,
      (int)(elapsed * 1000 / CLOCKS_PER_SEC),
      label);
}

This simple infrastructure allows simple and repeatable automated tests. The code above can be wrapped in an outer loop driven either from outside the C code in a script or from within main. The outer loop can step through different array lengths, generating results that can be piped into a file for analysis. This performance test can be included for automatic execution as part of the check-in process or the integration build.

A Contractual Dead End

However, having demonstrated that quickersort is indeed quicker than qsort, the programmer still has unfinished business. Mindful of producing an algorithm that is fast but wrong, the programmer needs to check that the resulting array satisfies the functional contract for sorting. In this case the functional contract can be phrased in terms of pre- and postconditions, but it is perhaps more subtle than many might first assume [ Henney2003 ]:

precondition : base is a valid non-null pointer to the initial element of an array of initialised ints at least length elements long.
postcondition : The values in the array defined by base and length are sorted in ascending order, with equal values adjacent to one another, and they are a permutation of the values in the array before it was sorted.

It is not uncommon for programmers to assume that sorting operations have no precondition and have only the requirement for the array to be sorted as the postcondition.

Given a contract phrased in assertive form, there is a common assumption that the "right thing to do" with it is to place corresponding assertions within the production code and check the pre- and postcondition directly. It is worth noting that, with the exception of base being non-null, the truth or falsehood of the precondition cannot be established within a function written in standard C. It is also worth noting that attempting to check the postcondition in the function has some unfortunate consequences:

void quickersort(int base[], size_t length)
{
  #ifndef NDEBUG
  size_t byte_length = length * sizeof(int);
  int *original = (int *) malloc(byte_length);
  assert(base);
  memcpy(original, base, byte_length);
  #endif

  ...// sorting algorithm of choice
  #ifndef NDEBUG
  assert(is_sorted(base, length));
  assert(is_permutation(base, original,
        length));
  free(original);
  #endif
}

Perhaps the most visible effect is the reduced readability of the code: the plumbing needed to support the postcondition check is anything but discreet. The infrastructure code is also error prone and somewhat incomplete: although undoubtedly useful, is_sorted and is_permutation are not standard functions. They must be written by the programmer especially for this task... and they need to be tested. They are sufficiently complex that to be sure of their correctness more than a walkthrough is needed: is_sorted is not so much of a challenge, although it is easy to introduce off-by-one errors; an effective version of is_permutation is more demanding, even with the simplifying assumption that base is already sorted; getting either of them wrong leads to wasted time and effort spent determining whether it is the quickersort algorithm or the implementation of the postcondition check that is incorrect - or, in the worst case, both.

The resulting complexity rivals and typically exceeds that of the actual quickersort implementation. Indeed, this observation can be taken quite literally: it is not only the complexity of the code, the coding and the testing that have risen; it is also the operational complexity. The complexity of is_sorted is O(n) but, unless more heap memory is used, that of is_permutation is O(n2) . It is unlikely that the programmer, whose goal is to produce a faster sorting operation, would be happy to decelerate the performance to O(n2) . Even with an O(n log n) implementation of is_permutation, the effect of the extra algorithms and the additional dynamic memory usage will trounce any performance advantage that quickersort might otherwise have delivered.

This issue serves to highlight the difference between production and debugging builds. By default, the standard assert macro is enabled, but when NDEBUG is defined it compiles to a no-op, hence the use of the NDEBUG conditionally compiled code in the previous sketch of a self-checking quickersort. A release build should ensure that NDEBUG is defined for compilation; any bugs in the implementation of quickersort can be trapped only in non-release builds.

There is also duplicity in this attempt to writing self-checking code. Conditional compilation often causes more problems than it resolves [ Spencer- ]. In this case it leads to a style of development perhaps best described as debug by contract . The operational test, a_test_of_sorts , can only be run meaningfully with a release build of quickersort , so any faults cannot be detected by this active use of quickersort . A released version of the whole program will also not detect any defects, so any defects that remain in the final version of quickersort will emerge only as incorrect results at runtime after the program has been deployed - assuming that its users notice.

What about system-level testing? Those responsible for testing the system as a whole - whether another programmer on the same the team, a dedicated tester on the same team or a separate testing team - will have to be supplied with two builds of the system: a release version and a debug-enabled version with different operational behaviour, but (one hopes) identical functional behaviour. This is more often a recipe for extra work than it is one for assuring quality. Assuming that system regression testing misses something as fundamental as incorrectly sorted data, if the debug-enabled version trips an assertion in a faulty quickersort, the tester must now feed the problem back to the programmer to fix... wait for the fix... and then proceed. This longer feedback cycle was precisely one of the issues supposed to be addressed by introducing programmer testing responsibility. The code is fast but wrong, and wrong in a way that could have been easily detected by the programmer.

Here's One I Prepared Earlier

What is lacking is some basic sign-off on the functionality of the code. The programmer could poke values in at a simple command line test harness to see whether they were sorted correctly or not. Or the programmer could use automated tests.

There are two refactoring steps that provide a (very) basic reality check on the functional behaviour.

The first is to do away with the need for is_sorted and is_permutation :

void quickersort(int base[], size_t length)
{
  #ifndef NDEBUG
  size_t byte_length = length * sizeof(int);
  int *sorted        = (int *)
      malloc(byte_length);
  assert(base);
  memcpy(sorted, base, byte_length);
  qsort(sorted, length, sizeof(int),
      int_comparison);
  #endif

  ... // sorting algorithm of choice
  #ifndef NDEBUG
  assert(memcmp(base, sorted, byte_length) 
        == 0);
  free(original);
  #endif
}

In other words, compare the result against the expected result produced by another means - a luxury that is not always available, but is on hand in this particular example. The next step is to recognise and capitalise on the similarity between this new assertion code and the functionality of a_test_of_sorts :

void a_test_of_sorts(size_t length)
{
  ...
  assert(memcmp(for_quickersort, for_qsort,
      byte_length) == 0);
  free(for_quickersort);
  free(for_qsort);
}

For the scenario defined by a_test_of_sorts , the introduction of this single assert and the elimination of all the infrastructure in quickersort is a great simplification that achieves the same effect as before. The assertion for a non-null base in quickersort can be retained, but the benefits of keeping it appear somewhat marginal - most modern operating systems will trap incorrect uses of null pointers and, unless an operation's implementation actually requires it, it is entitled to weaken its precondition and therefore accommodate a null.

Test by Example

However, although the performance-related automated tests now have the additional effect of checking the functional correctness of their results - and at no extra cost - the programmer can only confirm that quickersort passes for large sets of randomly generated data. Certain scenarios and boundary cases may or may not have been checked as a result. Without checking or rigging the generated data sets, the programmer cannot be sure that a_test_of_sorts is providing suitable coverage of these.

The programmer can make these situations explicit by writing example-based test cases that check certain situations with specific values. For example, the following test ensures that sorting an empty array does not contain any off-by-one memory accesses:

int empty[] = { 2, 1 };
quickersort(empty + 1, 0);
assert(empty[0] == 2);
assert(empty[1] == 1);

In a similar vein, the following test ensures the same for sorting a single-element array and ensures that the single-element's value remains unchanged:

int single[] = { 3, 2, 1 };
quickersort(single + 1, 1);
assert(single[0] == 3);
assert(single[1] == 2);
assert(single[2] == 1);

The following ensures that sorting an array of identical elements is an identity operation:

int identical[] = { 3, 2, 2, 2, 1 };
quickersort(identical + 1, 3);
assert(identical[0] == 3);
assert(identical[1] == 2);
assert(identical[2] == 2);
assert(identical[3] == 2);
assert(identical[4] == 1);

And likewise for an ascending sequence of elements:

int ascending[] = { 2, -1, 0, 1, -2 };
quickersort(ascending + 1, 3);
assert(ascending[0] == 2);
assert(ascending[1] == -1);
assert(ascending[2] == 0);
assert(ascending[3] == 1);
assert(ascending[4] == -2);

Of course, it is worth testing that quickersort actually does something, given that all the previous test cases could be met successfully with an empty implementation! Here is a test on a reverse-sorted array:

int descending[] = { 3, 2, 1, 0, -1 };
quickersort(descending + 1, 3);
assert(descending[0] == 3);
assert(descending[1] == 0);
assert(descending[2] == 1);
assert(descending[3] == 2);
assert(descending[4] == -1);

Further tests, based on other sample data characteristics, can be added in this style: ascending with some adjacent identical values, arbitrarily ordered distinct values, arbitrary values including some duplicates, arbitrary values including INT_MIN , etc.

At this point programmer testing responsibility can be said to have been reasonably fulfilled: both operational and functional aspects of quickersort are checked in automated tests, with example-based test cases providing basic unit-testing coverage of the functionality.

Sortie

The mistaken notion that testing is someone else's problem is unfortunately quite common, regardless of development process [ Maguire ]:

Programmers today aren't sure their code is bug-free because they've relinquished responsibility for thoroughly testing it. It's not that management ever came out and said, "Don't worry about testing your code-the testers will do that for you." It's more subtle than that. Management expects programmers to test their code, but they expect testers to be more thorough; after all, that's Testing's full-time job.

The notion that testing is part of programming and not something foreign can be seen to have support both from perspectives that support agile development [ Coplien- ]:

Responsibilities of developers include understanding requirements, reviewing the solution structure algorithm with peers, building the implementation, and performing unit testing.

And from dedicated testing practitioners [ Black ]:

I find the projects I work on usually go more smoothly when programmers do some unit and component testing of their own code. Through the ascendance of approaches like Extreme Programming, such a position is becoming less controversial. [...] So, a good practice is to adopt a development process that provides for unit testing, where programmers find bugs in their own software, and for component testing, where programmers test each other's software. (This is sometimes called 'code swapping.') Variations on this approach use concepts like pair programming and peer reviews of automated component test stubs or harnesses.

Systems should be tested at the many levels at which they are conceived, from the system level down to the function level. These different views often suggest a division of responsibility and ownership with respect to different kinds of testing [ Booch ]:

Unit testing involves testing the individual classes and mechanisms; is the responsibility of the application engineer who implemented the structure. [...] System testing involves testing the system as a whole; is the responsibility of the quality assurance team.

Programmers inevitably write code and scripts to test parts of their system, but these are often ad hoc fragments scattered around personal directories rather than versioned along with other project artefacts and integrated into system builds. There are good grounds for aiming for a higher level of automation [ Poppendieck- ]:

As long as you've got them, developer and customer tests should be automated as much as possible and run as part of the daily build. If the tests are not automated or if they take too much time, they won't be run often enough. Big batches of changes will be made before testing, which will make failure much more likely, and it will be much more difficult to tell which change caused the tests to fail.

We can also appreciate automation from the human as well as the technological perspective [ Cockburn ]:

Teams do deliver successfully using manual tests, so this can't be considered a critical success factor. However, every programmer I've interviewed who once moved to automated tests swore never to work without them again . I find this nothing short of astonishing.

Their reason has to do with improved quality of life. During the week, they revise sections of code knowing they can quickly check that they hadn't inadvertently broken something along the way. When they get code working on Friday, they go home knowing that they will be able on Monday to detect whether anyone had broken it over the weekend-they simply rerun the tests on Monday morning. The tests give them freedom of movement during the day and peace of mind at night.

Assertions in production code often end up encouraging a code-and-fix mindset rather than a more considered one. A failed production-code assertion in an arbitrary situation gives you something - but not a great deal - to work with. A failed assertion in a specified test case gives you a precise scenario that defines the context for failure.

The relationship between testing and contracts is often misunderstood, and often in a way that creates extra work without offering significant improvement in design quality. There is a far higher probability that a programmer will incorrectly specify a postcondition than a unit test. For example, at the time of writing, the first entry that Google brings up for "design by contract" is an explanation of how to write bug-free software in Eiffel, a language that famously supports pre- and postcondition checking [ DBC ]. It can be considered ironic that this paper contains poorly and incorrectly specified pre- and postconditions. Tests would have uncovered these defects immediately.

The specification of a precondition and a postcondition for a sorting operation is subtle but not necessarily hard - just harder than many expect. However, converting those conditions into practical assertions raises a good many questions and generates a great deal of code complexity. By contrast, it is trivial to test such an operation and there is little complexity involved - literally: the example-based test cases in the sorting example have O(1) operational complexity and a McCabe cyclomatic complexity value of 1, whereas the attempt to write a correct postcondition in code involved greater complexity on all fronts. Thus contracts lay down the lines that tests can follow, and vice versa [ Hunt- ]:

We like to think of unit testing as testing against contract. We want to write test cases that ensure that a given unit honors its contract. This will tell us two things: whether the code meets the contract, and whether the contract means what we think it means. We want to test that the module delivers the functionality it promises, over a wide range of test cases and boundary conditions.

And this relationship between contracts and testing provides us with a hint and a useful bridge to expanding the role of testing into design. The example in the sorting story was tightly scoped in terms of interface and implementation, and it was strictly focused on demonstrating the responsibilities and practice of more conventional unit testing. Test-Driven Development takes this a step further, empowering unit testing to support and contribute to design. But that's another story.

References

[Black] Rex Black, Critical Testing Processes , Addison-Wesley, 2004.

[Booch] Grady Booch, Object-Oriented Analysis and Design with Applications , 2nd edition, Benjamin/Cummings, 1994.

[Cockburn] Alistair Cockburn, Crystal Clear: A Human-Powered Methodology for Small Teams , Addison-Wesley, 2005.

[Coplien-] James O Coplien and Neil B Harrison, Organizational Patterns of Agile Software Development , Prentice Hall, 2005.

[DBC] "Building bug-free O-O software: An introduction to Design by Contract™", http://archive.eiffel.com/doc/manuals/technology/contract/ .

[Henney2003] Kevlin Henney, " Sorted ", Application Development Advisor, July 2003, available from http://www.curbralan.com .

[Henney05_1] Kevlin Henney, " Five Considerations ", keynote at ACCU Spring Conference, April 2005.

[Henney05_2] Kevlin Henney, " Driven to Tests ", Application Development Advisor, May 2005.

[Hunt-] Andrew Hunt and David Thomas, The Pragmatic Programmer , Addison-Wesley, 2000.

[Maguire] Steve Maguire, Writing Solid Code , Microsoft Press, 1993.

[Poppendieck-] Mary Poppendieck and Tom Poppendieck, Lean Software Development , Addison-Wesley, 2003.

[Spencer-] Henry Spencer and Geoff Collyer, " #ifdef Considered Harmful, or Portability Experience with C News ", USENIX, June 1992, http://www.literateprogramming.com/ifdefs.pdf