Vorsprung Durch Testing

At times it might seem as if the T in TDD stands for Trendy, but there is more to Test-Driven Development than just a statement of fashion. There is also more to it than just testing.

It is possible to identify a subset of three motivating practices in TDD that characterise a fairly conventional and uncontentious form of unit testing [ Henney1 ]: programmer testing responsibility , automated tests and example-based test cases . These form a unit-testing base that can be employed in the context of both static and agile development macro processes, and were motivated and demonstrated previously on the humble but surprisingly rich example of a sorting function in C [ Henney2 ]. Thus, programmers are responsible for unit testing their work, with system-level testing a separate and complementary role and activity; tests should be executed automatically - execution of code by code - rather than manually; tests are black-box tests expressed as specific examples of typical or edge cases of using the unit under test.

The next step is to recognise that effective testing can be more than just bug hunting. In TDD unit testing helps to support and drive design, and vice-versa. Three more practices can be identified that build on the core unit-testing foundation to provide us with a micro-process component that also supports design: active test writing, sufficient design and refactoring. These design-focused practices expand the role of the basic unit-testing practices: examples drive the scope of design [ Marick ], programmer responsibility extends to the suitability and quality of code over time - not just at a single point in time - and automation underpins the practical execution of this approach.

Active Test Writing

Black-box testing by example is not just limited to exploring the correctness of an implementation against an interface contract: it is also useful for framing and presenting it, and for formulating and exploring the contract itself. In other words, design.

Passive testing is essentially the process whereby the feedback of tests is limited to defect detection. Tests are typically written some time after the code they test, where they play what is essentially a destructive role: they cannot confirm total correctness, only the presence of incorrectness. Although such an approach to testing has obvious value, it can encourage an approach to both design and testing that is overly formal and sequential. The opportunity to learn about what is being designed, and how to design it better as a whole, is missed [ Henney3 ]. Defects lead to localised fixes, but the test-writing process does not influence the key decisions in a design, which in effect is considered frozen. The feedback loop is too long, so there is less motivation to change things because of the feeling of "what's done is done". The code has effectively gone into conventional maintenance mode early, even though initial development may be ongoing.

Active test writing adopts a more balanced perspective, using the act of test writing as a creative exercise to balance the more destructive intentions of test execution. Tests represent a first point of use of an interface, and the ease or difficulty of writing test cases gives instant feedback on the qualities of the interface and the implementation behind it.

High coupling manifests itself in tests that are difficult or - in simple unit-testing terms - impossible to write. For example, an object that depends on data that could be passed in, but has instead ended up being coupled to a configuration file or registry, a database connection or some global variable (whether expressed obviously as such a variable or disguised as a singleton object).

Low cohesion manifests itself in supernumerary test cases that test quite unrelated features, suggesting that inside a given unit there are smaller units struggling to get out. For example, the standard C realloc function expresses three quite distinct behaviours: malloc , free and, err , realloc [ Henney4 ]. The standard java.util package contains miscellaneous unrelated facilities - collections, event-handling models, date and time handling, internationalisation features... and further miscellaneous miscellanea. It also stands as a caution to anyone who might consider util , utils , utilities , utility , etc, to be a clear and cohesive name for a header, a package, a library, etc.

In terms of organising the active part of active test writing , there are many options. The bottom line is that writing of test code is carried out in close proximity - both space and time - to writing of production code. The writing of test cases and corresponding implementation can be interleaved, with one following or preceding the other closely, or stepped a little further apart. Being able to write a test case first is a useful and helpful discipline, but only dogma would suggest that its exclusive use is an absolute requirement and a necessary prerequisite of TDD. However, although writing test cases much later than the target code can work, both the quality of the feedback and the motivation to do so is weaker.

Sufficient Design

This continuous and reflective view of design at the code face may raise another question in some minds about the whole nature of developing iteratively and incrementally: why not just "do the right thing first time"? Perhaps surprisingly, I have heard this question posed as a serious criticism, but the question itself raises more questions about the meaning of the question and the questioner's assumptions than it does about agile development techniques at any level. It assumes that the "right thing" is in some way knowable "first time" and constant thereafter. However, the "right thing" is dependent on time and is anything but constant, so both "right thing" and "first time" lose their simple interpretations. The learning nature of software development pretty much guarantees that the knowledge of what it is to be built and how it can be built are moving targets. While they may not necessarily be wild and erratic, their variability stands to undermine any approach that is based on constancy and precognition. The difference between a process with no variables and one with some is the difference between defined and empirical processes. Treating an empirical process as a defined process is a problem waiting to happen [ Schwaber ].

Yet there can still be a lingering sense that sorting everything out up front is both reasonable and do-able, leading one way or another to a big up-front design (BUFD) phase (see sidebar, "Big", as in "a Lot of", not just "a Bit of" ). This inevitably leads to overdesign. Design based on assumptions that turn out to be incorrect needs to be reworked, often quite late. Design that tries to tackle uncertainty by being less specific becomes lost in technical detail focused on generality rather than on the actual problems that need to be addressed. At the opposite end of the spectrum is no up-front design (NUFD), which represents a failure to exercise, in a timely manner, even the most basic knowledge about what is to be developed. An approach based on a view that accepts change but seeks stability is likely to be a more reasoned one, albeit a little rougher in its detail up front, where roughness implies sketched rather than shoddy. An approach based on what I have referred to in recent years as rough up-front design (RUFD) can steer this middle path. Establish a stable baseline architecture that expresses a common vision and a sketch of what is to be worked on, without wasting time on details that are better expressed and handled in code or that are best left until more concrete knowledge is available. Note that stable is not the same as static, so the architecture is open to change rather than being frozen. This approach can also be dubbed sufficient up-front design (SUFD).

"Big", as in "a Lot of", not just "a Bit of"

It is worth clarifying what BUFD (or BDUF, as it is also known) entails, because this appears to be an occasional source of confusion. For example, misunderstanding its meaning can lead to proclamations such as the following [ Spolsky ]:

I can't tell you how strongly I believe in Big Design Up Front, which the proponents of Extreme Programming consider anathema. I have consistently saved time and made better products by using BDUF and I'm proud to use it, no matter what the XP fanatics claim. They're just wrong on this point and I can't be any clearer than that. And, to demonstrate the point, Joel Spolsky makes available for download a so-called functional spec of a commercial product, codenamed Aardvark. However, the deeds do not support the words. The document may have been written up front, but hunt all you like for big design because you won't find it. Strong belief and pride appear to have clouded correct use of accepted terminology.

The accepted archetype of BUFD arises from the strict waterfall approach of defining development as a precisely phased pipeline of activities, so that requirements analysis strictly precedes design activity, which strictly precedes coding, which strictly precedes testing. In a bid to reduce risk from unknowns later in the lifecycle, a BUFD approach doesn't just do a bit of design up front, it does a lot. Hence the use of the term big rather than a bit of or some . The BUFD path is paved with good intentions - even if somewhat suspect - but the idea is that the design goes into a lot of detail, specifying internal structure to the nth degree - from packages and classes right down to private methods and private data. In essence, a blueprint that supports a plan-driven model of development.

However, at the beginning of the Aardvark spec is the following note:

This specification is simply a starting point for the design of Aardvark 1.0, not a final blueprint. As we start to build the product, we'll discover a lot of things that won't work exactly as planned. We'll invent new features, we'll change things, we'll refine the wording, etc. We'll try to keep the spec up to date as things change. By no means should you consider this spec to be some kind of holy, cast-in-stone law.

So, of all the things this spec might be, a big, up-front design document is not one of them. It makes this quite clear to the reader by describing itself as "a starting point for the design" not "the design". Reading further into the spec uncovers frequent use of words such as "maybe", "probably" and "possibly" to describe certain technical decisions. And then there is the length of the document itself: twenty pages. When you strip away the extraneous details, such as the front cover, preamble and the neo-Hungarian coding conventions, you are left with a shorter document that outlines some of the core requirements, proposes a user interaction model and sketches a few features of the architecture. The document is also not heavy on text and is fairly generous with its use of spacing. Whichever way you look at it, this is not big design. Which all comes as a welcome relief, but does rather undermine the claim of its author.

Advocates of genuine BUFD would regard the Aardvark spec as incomplete and insubstantial, lacking detailed specifications of code structure or the look and feel of the application. They would tar it with the same brush that the article uses to daub XP. I believe that the contrast the article is trying to make is to compare no up-front design with some up-front design, not with big up-front design. Joel Spolsky is actually advocating a design approach based on sufficiency, exploration and incrementalism. So although he may not be on the same page as XP advocates, he is many pages short of being a fully paid-up BUFD practitioner.

Sufficient design in TDD manifests itself in test-bounded design increments, where tests describe the scope of what is being worked on at any point in time. This moderates creeping featurism, cuts extraneous code and encourages incremental and measured progress. Active testing supports the goal of sufficient design by keeping the role of functions, classes and packages clearly defined. Tests bound the functional behaviour of these units, keeping them 'honest' with respect to their current role in the enclosing system.

Driving the design from the baseline architecture through tests leads to more cleanly separated units with a close dependency horizon (a dependency occurs where one unit, e.g. class or header, depends on another unit for its definition, e.g. inheritance or inclusion, and the dependency horizon for a given unit is where its dependencies end, i.e. where its immediate dependencies, and their immediate dependencies in turn, and so on, have no further dependencies). Of course, there needs to be some coupling at certain levels otherwise, by definition, no coupling results in no system.

Refactoring

Mrs Beeton's Victorian domestic advice [ Beeton ] is surprisingly relevant to modern code:

A dirty kitchen is a disgrace to all concerned. Good cookery cannot exist without absolute cleanliness. It takes no longer to keep a kitchen clean and orderly than untidy and dirty, for the time that is spent in keeping it in good order is saved when culinary operations are going on and everything is clean and in its place. Personal cleanliness is most necessary, particularly with regard to the hands.

This is the very motivation and essence of refactoring. Refactoring preserves the functional behaviour of a piece of code while changing - and, one hopes, improving - its developmental qualities. Refactoring is a stable and local change, typically motivated by a required change in functionality. Operational behaviour, such as performance or memory usage, may change, but improvement of operational qualities rather than developmental qualities is the focus of the similar but distinct activity of optimisation.

Changes to functionality may follow the line of the existing code easily, requiring no more than a consistent extension or in-place modification of the code. At other times a change in functionality may also suggest a change in implementation of an interface. An existing implementation may be OK in other respects, but may support the functionality change poorly, requiring undue effort to implement it. For example, the need to perform general date arithmetic on an existing date representation that favours presentation over calculation, such as YYYY-MM-DD , suggests that a change in representation may be appropriate before extending the functionality [ Henney5 ]. Alternatively, the quality of an existing piece of code may generally be poor, caught in a tangle of spaghetti flow or spaghetti inheritance. For example, a self-aware class hierarchy, where the root of the class hierarchy depends on other classes in the hierarchy, can be a troublesome knot in the dependency graph of a program, rather than an exemplary pattern to be followed elsewhere.

Refactoring acknowledges that we can lay down code in confidence but still learn better ways of achieving the same end. Indeed, it is more than this: the learning is not simply passive; it is put into practice and draws from practice. Of course, there is a risk that making such a change is not necessarily an improvement: any modification runs the risk of introducing a bug. Therefore, practise with a safety net: refactoring should be undertaken with a clear head, with another pair of eyes, with tools, with tests, or with any suitable combination of these. In the context of a test-driven approach, test cases offer a regression test suite that act as a baseline for both refactoring and optimisation.

Given that the inevitability of change is one of the few constants in software development, this active acknowledgement and positive support of change through tests is reassuring. Refactoring is the other side of the design coin from what we might consider to be prefactoring . Refactoring adjusts the design vision and detail after the fact to balance the formulation beforehand.

Test Match Report

Test-Driven Development is a bar-raising, learning process. Removing the tests leaves the safety net at ground level and knowledge localised, isolated and transitory. A TDD approach offers more than just a pile of tests: it offers specification as well as confirmation. Both of these reasons are sufficient to justify writing tests that sometimes apparently test the trivial. And specifying even the trivial to be sure that it always works means that regression testing comes for free as part of the deal.

Another consequence of TDD is the resolution of an imbalance in the traditional view of testing. Testing is often characterised as a destructive activity, and one that is predominantly quantitative in its feedback. TDD makes testing a constructive activity, with qualitative feedback on design, not just defect reports.

TDD is not a total process: you need other complementary drivers to move development forward. For example, an incremental macro process where each increment is scoped with respect to functional or technical objectives provides a good backdrop to the code-facing emphasis of TDD. Likewise, practices such as reviewing, joint design meetings and continuous integration support and are supported by TDD. It is also important to distinguish TDD from XP: although historically it emerged from XP, TDD is neither a synonym nor a metonym for XP. Implementing XP necessitates employing TDD, but the converse is not true. TDD fits with many different macro-process models. There are many more programmers practising TDD in other processes than are using it in a strict XP environment.

References

[Henney1] Kevlin Henney, "Driven to Tests", Application Development Advisor , May 2005, available from http://www.curbralan.com .

[Henney2] Kevlin Henney, "C-side Re-sort", Overload 68 , August 2005.

[Marick] Brian Marick, Driving Software Projects with Examples , http://www.exampler.com .

[Henney3] Kevlin Henney, "Learning Curve", Application Development Advisor , March 2005, available from http://www.curbralan.com .

[Henney4] Kevlin Henney, "No Memory for Contracts", Application Development Advisor , September 2004, available from http://www.curbralan.com .

[Schwaber] Ken Schwaber and Mike Beedle, Agile Development with Scrum , Prentice Hall, 2002.

[Beeton] Isabella Beeton, Mrs Beeton's Every-Day Cookery and Housekeeping Book , Ward, Lock & Co Ltd, 1872.

[Henney5] Kevlin Henney, "The Taxation of Representation", artima.com , July 2003, http://www.artima.com/weblogs/viewpost.jsp?thread=8791 .

[Spolsky] Joel Spolsky, "The Project Aardvark Spec", August 2005, http://www.joelonsoftware.com/articles/AardvarkSpec.html .