The Case Against TODO

TODO - a neat way to label work in progress or an easy way to disguise the flaws in a codebase?

TODO, TO_DO, TO DO, @todo, FIXME, FIX_ME, FIX ME, HACK
No other keyword has so many aliases. No other keyword is quite as open to interpretation.

Don't worry, our compiler hasn't gone soft on us. TODO isn't really a keyword: it lives in comments and can therefore take whatever form a programmer chooses, safe in the knowledge it won't cause trouble at compile or run time.

On the surface, TODO seems both useful and inevitable. No piece of code is ever really finished: there will always be something more to do, and where better to record this information than in the code itself? If we think more carefully though, we realise that TODO actually indicates a point at which a decision was made - a decision to defer action, a decision, in fact, to not do something. Clearly this decision is less than ideal.

This article investigates the use of TODO and friends more closely.

First, we shall consider what it is meant to mean and what it often turns out to mean. Next, we'll search through some code, uncover some use cases, and think about the alternatives.

These alternatives are usually better. TODO, it turns out, is not so innocent after all. When used as a shorthand for "more work required" it tells us too little - and often too late - and when used as a convenient label for broken code, it can cause serious damage to a codebase.

What does 'TODO' mean?

As stated in the introduction, TODO isn't a keyword, it's a comment. Its exact meaning will depend on the local coding culture and often individual programming style. For example, the Sun Java programming conventions [ Java Conventions ] state:

10.5.4 Special Comments

Use XXX in a comment to flag something that is bogus but works. Use FIXME to flag something that is bogus and broken.

Already we've found a shocking new member of our TODO set: XXX!

I think it's clear, though, that the phrase TODO in a comment indicates something more is required. With any luck, the rest of the comment will indicate exactly what that something is.

FIXME often seems to be used interchangeably with TODO but carries a stronger suggestion that something is broken - if TODO is a feature request, then FIXME is a defect report. Again, with luck, the rest of the comment will explain what needs fixing.

Unfortunately the rest of the comment is often inadequate. Sometimes we'll find a bare:

    // TODO

sometimes an initialled note-to-self:

    // FIXME TAG

sometimes a plea for attention:

    // TODO Fix this hack !!!

and sometimes even a garbled attempt to cover all bases:

    // TODO FIXME XXX HACK

Of course, TODO isn't meant to be pretty. The capital letters shout at the reader, drawing attention to the deficiency. The noise will continue until something gets done.

Ask the code

Maybe I'm being unfair. Let's take a look at TODO in action by searching some source code. On a Unix platform the following command should output matches in any files beneath the current working directory:

    grep -ERi "TO[_ ]?DO|FIX[_ ]?ME|HACK|XXX" *

I cannot publish the output of this search on any of the proprietary code I work on. It's confidential information. (The small amount of open source code I have written or contributed to is TODO free.) Nor am I going to publish the output of this search on any of the open source code I use - I do not think it would be fair, since this article questions the use of TODO. I should stress, though, that I'm grateful to have source access, so that any such comments are at least visible to me.

I would be interested to know how useful the output of your search is. Does the TODO list correlate with work-in-progress? Are the FIXMEs actively being fixed? Or have we merely generated a list of half-baked ideas, abandoned experiments and neglected suggestions?

The next section considers some specific use cases which I hope overlap with our search results.

Use cases and alternatives

Place holders

In another article [ Guest ], I describe my use of an Emacs elisp program to generate a skeleton C++ file, which - amongst other things -inserts placeholders for `Doxygen' comments.

These placeholders had TODOs in them for me to fill in. Or at least they used to! I have now decided I would rather leave a class undocumented than have it ever look like this:

    /**
     * TODO write a description of MyNewClass.
     */
    class MyNewClass {
        ...
    };

The TODO in the example above serves no useful purpose - I'm well aware the new class needs describing, `Doxygen' will warn me should I forget - and as a consequence my elisp now generates smaller but better-formed skeleton files. This reflects more accurately the way I aspire to work: no broken code.

While we're on the subject, note that editor macros make it all too easy to create unfinished or extravagent comments. A block of asterisks is a poor wayto fill a source file:

  /* * * * * * * * * * * * * * * * * ** * * * * * * *
  *                                                 *
  *           M Y   N E W    C L A S S              *
  *                                                 *
  * * * * * * * * * * * * * * * * * * * * * * * * * */

Generated code

My elisp metaprogram is of course a code-generator. When I realised it was producing broken output I stepped in and fixed it. With third-party code generators we may not be so lucky.

Consider a GUI builder which allows us to design our user interface and map buttons events to actions. The output of this builder is some auto-generated code with placeholders:

    void MyDialog::OnButtonClick(Button button) {
        // TODO. Insert button action code here.
    }

Clearly there's a potential problem if the auto-generation is repeated. Will any callbacks we implement be replaced with TODOs?

There's not much we can do about this - unless we are GUI builder writers, in which case we might consider better ways to separate the generated code from the application-specific implementation. (See [ Brown ] for a more detailed discussion of this issue.)

Perhaps this is why many seasoned GUI developers drop the framework early on, preferring the control they get from hand-crafted code?

Work-in-progress

Suppose a programmer creates a new class, NewClass, whose interface offers clients, amongst other things, a pair of serialization functions:

    NewClass {
        ...
    public: // Serialization
        void put(std::ostream & put_to) const;
        void get(std::istream & get_from);
        ...
    };

The output function is quickly coded up. The input function turns out to be rather trickier. Now, the programmer sensibly wishes to put her work-in-progress into the source management system as soon as possible. She checks in an empty implementation:

    void NewClass::get(std::istream & get_from) {
        // TODO
    }

Here, we understand the TODO to mean that, although the code compiles, the implementation is incomplete. Client code shouldn't try to `get' instances of this new class just yet.

The trouble is, as far as client code is concerned - at both compile and run time - the comment has no effect. It's not hard to imagine a scenario where we accidentally end up reading in a NewClass , leading to some unwanted effect downstream, possibly in an apparently unrelated piece of code.

Any time spent tracking down such a problem has been wasted. The moment the TODO was written, we were aware of its exact location and cause.

A better technique is to call a function:

    void NewClass::get(std::istream & from) const {
        NotYetImplemented();
    }

Here, NotYetImplemented() might fire an assertion, raise an exception, or log an error message. At the very simplest, we could put a rough and ready macro into service:

    #define NotYetImplemented()
       assert(!"Not yet implemented!")

As usual, the move from comment to code improves things. Now the system offers more useful diagnostics in the event of an unimplemented function being called: but it won't save us from trouble if this event happens once the software has been shipped. We are still reliant on someone remembering to finish what got started. Here, test-driven development techniques are invaluable. They are so important they merit fuller discussion later in this article.

Notes To self

Consider the following struct:

   struct FileSize { // TODO 64 bit
       unsigned long ms_4_bytes;
       unsigned long ls_4_bytes;
   };

I classify this TODO as a note-to-self since it may not be obvious to anyone but the programmer who wrote the comment exactly what should be done to the code on a 64 bit platform. If we grep around surrounding source files, we'll probably find a few similar comments and get a better idea of what's required.

Sometimes it's even more clear that we're dealing with a note-to-self.

   struct FileSize { // TODO 64 bit. TAG
       ...
   };

Here, the author uses his initials to stake a claim on the struct . He seems to be saying, "I know what needs doing, and I'll probably be the one who does it". Ideally, of course, when we undertake a 64 bit port, he will still be around to tie up these loose ends; even if he isn't, he has left us some useful pointers.

This isn't a bad use of TODO. It's certainly preferable to the programmer keeping the TODO list in his head, or even in his log book. The information is where it needs to be, often down to individual lines of code. My only quibble is with the note-to-self attitude. When we check in code, we release it to a wider audience; we are publishing our work. These notes-to-self should really be notes-to-everyone.

In an ideal world, then, there will be a little more documentation explaining the porting task in more detail. Pair-programming, peer review and open source development can help us maintain a disciplined approach.

Latent defects

Suppose we're digging around some legacy code and we discover the following bizarre integer literal in an assignment:

    flags &= 0xFFFFFFF7;

What to do? Evidently bit three of the unsigned int flags variable is being cleared, but so are bits thirty-two and above. Is this the intended behaviour?

Now, the code in question is regarded as sound. It works. Target platforms have always had thirty-two bit int s, and will continue to do so for the forseeable future.

We could make a note in the code and move on:

  flags &= 0xFFFFFFF7;//FIXME this assumes 32 bit ints

If we're feeling less confident, the note might read:

  flags &= 0xFFFFFFF7;
  // TODO doesn't this assume 32 bit ints !?

Or, if we're feeling more confident, we make the fix directly:

    flags &= ~(1U << 3);

Personally, I dislike adding comments as shown above. Yes, it's better than ignoring the problem, but only marginally. By adding a comment we're ducking the issue. On the other hand it's also risky to modify legacy code even when we think we're fixing it (see Sidebar, Legacy Code). Who knows what compensating code there might be elsewhere in the system?

Legacy Code

I haven't defined what I mean by legacy code here - but the risk associated with change is surely a defining characteristic. In his book Working Effectively with Legacy Code , Michael Feathers [ Feathers ] chooses a specific and objective definition: legacy code is code which has no unit tests. He goes on to offer some sound advice on how to work with such code: that is, how to put it under test. Once the tests are in place, the risks associated with change reduce.

A more responsible reaction is to dig a little deeper and find exactly how pervasive the problem is. The assumption that int s occupy thirty-two bits may cut across many lines in many files - and yet, as stated earlier, this assumption may actually be reasonable, at least for the forseeable future. Not all code needs to be portable.

Even if we are prepared to continue with this assumption about target platforms, the important thing is not to throw away our insight. With barely any extra effort we can replace the TODO with a compile-time assertion [ Boost ]:

    BOOST_STATIC_ASSERT(sizeof(int) * CHAR_BIT == 32);

With this assertion in place, the code won't even compile if we try and port to a platform which doesn't meet this assumption.

On the other hand, if digging deeper reveals more latent defects, and if we already have reason to believe the legacy code is in poor shape, then more radical action may be needed (see Legacy Code).

Grand designs

At the other end of the scale are the TODOs which claim a glorious future for a sound - if straightforward - block of code.

  // TODO use an adaptive search algorithm:
  // 1) keep a log of past commands
  // 2) when the system is idle, review this log to
  //    predict the next command
  // 3) pre-fetch the results of the predicted
  //    command.
  //    This should make the UI more responsive.

Let's hope this ingenious scheme isn't attempted until careful analysis shows that it really is necessary and calibrated test runs prove that it really can improve response times.

Until then, this particular TODO is simply an incitement to over-engineering.

Future optimisations

Closely related to Grand Designs are those points in a code-base where a progammer uses TODO to indicate where a sub-optimal solution has been used.

Maybe a hand-crafted container might offer client code quicker lookups than the one which was picked, for convenience, from the C++ Standard Library.

    typedef std::map<person, phone_number> phone_book;
    // TODO replace with a hash map for efficiency

This TODO looks reasonable enough, but again I think it may lead us astray.

Why should we consider making this replacement? Why should the suggestion be repeated every time we look at this code?

If we do find our code runs too slowly we need to measure first. The hand-rolled hash map might make no perceptible difference; but simplifying the arithmetic in that innocent looking loop the profiler warned us about might realise a 50% speed-up with far less effort.

The hideous hack

Consider the following scenario. Testing reveals that a peculiar - but reproducible - combination of events can lead to deadlock. A SHOWSTOPPER defect is raised and assigned to some unfortunate programmer. The programmer must delay her current task to investigate. After a couple of frustrating days of poring over log files she narrows the problem down. It looks very like a race condition in a particular function.

To test her suspicions, she injects a ten millisecond delay into one of the calling threads. The defect goes away!

Armed with this evidence, she consults the programmers involved. The original author of the function has nothing helpful to say - clearly it's the client code (which does the synchronisation) which is at fault. Equally unhelpfully, the authors of the client code suggest the proper solution is to move responsibility for synchronisation into the function itself.

All this blame-storming is holding up development. The project manager makes the call: check in the hack, change the defect to LOW priority, get on with new features.

I don't think it will come as a surprise to find out that the code still reads as follows:

    Sleep(10); // HACK. Workaround a race condition.
               // See DEFECT 5678 for details.

Nor should it surprise us to learn that the defect hasn't really gone away, it has just gone under cover. The software still deadlocks. Less often, less reproducibly, but just as disastrously.

This article is about the use of TODO, not dysfunctional development teams. A proper solution will involve the organisation as much as the code-base, and will have to remain beyond the scope of this article.

Clearly, though, something very wrong has happened. I'm not claiming that TODO - or, in this case, its unsavoury relative, HACK - is to blame.

What I do suggest is that this use of HACK opens the door (breaks the window perhaps? see bottom of page) to similar abuse in future. When we introduce such code into our system we sanction the approach it takes, inviting more of the same.

Sure enough, as features continue to be added, we find more and more Sleep() s, HACKs and TODOs attempting to disguise a broken threading model.

Test frameworks

Remember the programmer who checked a stubbed function into the source management system?

    void NewClass::get(std::istream & get_from) {
        // TODO
    }

In a test-first environment, the TODO is superfluous. The accompanying unit tests show exactly what needs doing. In a console window, we see something like:

-----------------------------------------------------
   Test    : testNewClassGet
   testNewClass.cpp(57): Expected "foo" but got "bar"
------------------------------------------------------
   Ran 13 tests, 12 Passed, 1 Failed.

Here, the feedback is swift and accurate, and continues to be so even once the class is complete (that is to say, once it passes its tests). Should something cause NewClass to regress, a good set of tests will isolate the error even before the offending code gets checked in.

In other words, the TODO list and the FIXME list have been replaced by test results. We have done the best we can to ensure our NewClass does not end up being yet another LegacyClass .

Now remember the HACK which covered up a broken threading model. We will need to work much harder to stop the rot advancing any further (in both the code-base and the organisation), but again, my main recommendation would be to invest in a test framework - a system test framework in this case.

If we can create a suite of tests which exercise the code to systematically expose the threading problems, we may have a chance of understanding them. If we can automate these tests and publish results in a user-friendly form - bearing in mind that users are not just engineers, but everyone with an interest in the code-base - then we may yet fix them.

The case against TODO

This article has covered the use and abuse of TODO. In some cases, it is redundant; in others inadequate; in others misleading; and in yet others it could more usefully be replaced by code. TODO is sometimes a note written in some personal shorthand, which, like many such notes, is in danger of becoming meaningless to even its originator.

I have no major problem with any of these uses, though personally I avoid them. It's the times when TODO (or FIXME or HACK) gets roped in to defer the proper treatment of a defect which make it suspect.

In Overload 68, Alan Griffiths [ Griffiths ] writes:

The worst thing that can be done on encountering a problem is to ignore it on the basis that "someone else" should deal with it. The next worst thing is to bury it in a write-only "issues list" in the hope that one day someone will deal with it. If everyone behaves like that then nobody deals with anything.

Griffiths is talking about problems in a wider sense, perhaps, than this article, but he expresses my frustration with TODO perfectly. A search through code for TODOs is all too likely to reveal a "write only issues list". Too often, FIXME silently marks a place where we know something is wrong, but we haven't bothered to do anything about it. Worst of all, HACK gets deployed when we know something is wrong and we fear we might have made it worse. n

Acknowledgements

I would like to thank Dan Tallis and the editorial teams at The C++ Source and Overload for their help with this article.

References

[Boost] http://www.boost.org

[Brown] `Automatically-Generated Nightmares', Silas S Brown, CVu 16.6, ACCU

[Feathers] Working Effectively With Legacy Code, Michael C. Feathers, Prentice Hall, ISBN: 0131177052

[Griffiths] `Editorial: Size Does Matter', Alan Griffiths, Overload 68 , ACCU

[Guest] `Metaprogramming is Your Friend', Thomas Guest, Overload 66 , ACCU, Also available at: http://www.wordaligned.org/

[Hunt & Thomas] The Pragmatic Programmer: From Journeyman to Master , Andrew Hunt and David Thomas, Addison-Wesley Oct 1999, ISBN: 020161622X

[Java Conventions] `Code Conventions for the Java Programming Language', http://java.sun.com/docs/codeconv/

Broken Windows

In The Pragmatic Programmer: From Journeyman to Master [ Hunt, Thomas ] Andrew Hunt and Dave Thomas advise us not to live with broken windows:

One broken window, left unrepaired for any substantial length of time, instills in the inhabitants of the building a sense of abandonment - a sense that the powers that be don't care about the building. So anothern window gets broken. People start littering. Graffiti appears. Serious structural damage begins. In a relatively short space of time, the building becomes damaged beyond the owner's desire to fix it, and the sense of abandonment becomes reality.

Hunt and Thomas suggest the same is true of software: when we discover something is broken, we must repair it promptly. Deferring the repair work with a TODO or a FIXME risks making things worse. The next programmer to visit the code is unlikely to make the fix, but he may well be encouraged to adopt TODO to defer the treatment of similar problems elsewhere.