ACCU Home page ACCU Conference Page ACCU 2017 Conference Registration Page
Search Contact us ACCU at Flickr ACCU at GitHib ACCU at Google+ ACCU at Facebook ACCU at Linked-in ACCU at Twitter Skip Navigation

pinFailure is an option

Overload Journal #129 - October 2015 + Journal Editorial   Author: Frances Buontempo
Motivational speeches and aphorisms are clichés. Frances Buontempo wonders if they sometimes do more harm than good.

At the expense of sounding like a broken record, and sticking in a rut I have dug for myself, I still do not have a proper editorial for you. Given my persistent failure this should come as no surprise to you. Despite several of ways of trying to motivate myself, musing on how to do this has distracted me as ever. Unfortunately, no snappy one-liner managed to snap me out of it. Indeed, the first statement of Hippocrates’ Aphorisms could, and did, leave one thinking for hours:

Life is short, art long, opportunity fleeting, experience deceptive, judgement difficult. [Aphorisms]

A most pertinent adage, ‘Failure is not an option’ was impotent. As with many sayings in common parlance this stems from a film. Many proverbs originate from films, books or plays. In order to become well-known they need a vector to disseminate. Failure not being a possibility was used as a title of an autobiography by Gene Kantz, director of Mission Control for the Apollo 13 team. The phrase itself has seeds in an interview with people involved in the team. We are told:

One of their questions was “Weren’t there times when everybody, or at least a few people, just panicked?” My answer was “No, when bad things happened, we just calmly laid out all the options, and failure was not one of them. We never panicked, and we never gave up on finding a solution.” [Failure]

Staying calm in the face of difficulties and trying to find a way to fix things is honourable, though I suspect many of us over our careers have been told “Failure is not an option” as a deadline attempts to go whooshing by, wherein we discover this is being misused as a code-phrase meaning you have to stay all night to make some software work. Though this has failed to motivate me to write an editorial, it did spark a train of thought about failure and if it is an option after all.

Consider for a moment how to get a grade A in an exam. Though a complete success, with a mark of 100% would be expected to gain an A, less than perfection will usually do. A measly 80% will often prove sufficient for a top grade. The extent to which an accomplishment must be a total triumph can vary with context. Watching a student heart-broken because they only got a B interact with a supportive parent who is delighted they passed with ‘flying colours’ is not uncommon. The different perspectives and hopes shade the result in different tones. Sometimes 80% sounds splendid, while at other times 4 out of 5 doesn’t sound so good. If my mobile phone sends texts, allows internet access, has a camera and an alarm clock but will not make phone calls anymore, 80% is not good enough – this might indicate it’s time for an upgrade. We can conclude “Failure is sometimes an option, or even considered success but it depends on context and the person involved.” This is probably not pithy enough for a proverb or succinct enough for a saying, though.

If we cannot fail, how do we practise test-driven development (TDD)? Writing a failing test first is an important part of this discipline, even if just to make sure you get a clear and precise failure message. I have seen many people, yours truly included, write a test which happens to pass first time and then discover they need to break open the debugger if the test fails at a much later date when the code gets changed rather than just seeing everything they need to know in the test fail message. People not used to TDD are often surprised by how frequently the practitioner might use the compiler – if the language is compiled – feeling their approach of coding for a couple of days before resorting to kicking off the compiler and hoping for the best is vastly superior. The compiler is a tool of last resort, and perhaps can’t really be leant on? (Oh yes it can – see [Feathers04].) The initial failure is important, though transitional. Sometimes you discover a bug as you add tests to legacy code and manage to write a test that characterises the problem. If you don’t have time to fix the bug – for example it may be long and involved, or you would rather do one thing at a time to avoid getting distracted – most testing frameworks will allow you to mark the failing test as ignored. Ignoring failure is a transient option. Mind you, all code goes away in the end, so perhaps all code failures are transient. Avoid saying this out loud to your managers or customer though.

If you consider the use of the word failure in software development, you could conclude we expect it. We design failover clusters, fault tolerant systems, checksums so we can detect something went wrong. We catch exceptions – sometimes just logging them and carrying on regardless, though not always. Most ‘automatic’ algorithmic trading systems have a kill-switch, just in case something goes awry. In fact I can’t think of a machine without an off-switch in my house. Admittedly, the off switch on my phone no longer works; however, I digress. Nature appears to build in some degree of fail-over. Most organisms have two lungs, kidneys and so on. If one fails they can still survive. On a smaller scale, biochemical functions are often encoded in two or more genes. This means things can function normally in the face of some mutation. Now, evolution would tell us that mutations cause the new normal. Genetic changes that start out as perceived failures can end up becoming the new norm provided that the mutant individuals do not die out. Apparent failure can lead to greater success. This may loiter behind the famed Silicon Valley phrase “Fail fast, fail often”, not to be confused with fail-fast and may be closer to the idea of fail-safe. At very least we can say failure is interesting.

Having been considering my pension recently I noticed I might manage to retire by 2038, which is disappointing because I was looking forward to seeing how the next Y2k-style apocalypse/bug pans out. Some of us will remember the hard work put in by programmers to avoid the so-called Y2K bug – the media reported potential Armageddon when the clocks rolled over from 1999 to the year 2000 since many dates were just recorded with two digits. Many of us will be aware of the upcoming problem with storing UTC in a signed 32-bit integer – this will roll-over on a Tuesday in Jan 2038. I observe there are a few web pages up and ready for this. For example feel free to fill in the surveys (e.g. http://2038bug.com/) to let them know how aware the mass populace is of the problem. I hope we don’t skew the results. At the start of the new millennium, many people complained that nothing went wrong. If consultants have been paid to fix a potential problem and they appear to have done so, complaining seems churlish. Perhaps public relations would have been better had one or two catastrophic failures been left behind, or sneaked into the system. Perhaps deliberately failing in order to look good is taking it too far.

Along with the idea of evolution continuing due to constant seeming failures or mutations, many inventors have succeeded at the wrong thing. For example post-it notes are fabled to have come from an attempt to build something else. Penicillin is often attributed to Fleming’s discovery of mould on a petri dish, though ancient Egyptians are purported to have used mouldy bread in poultices on wounds [Penicillin]. Sometimes history has lessons of people succeeding for the wrong reasons. A case in point is the plague mask, which doctors wore when ‘treating’ people suffering from the bubonic plague in Europe. The mask was designed to stuff the beak with various herbs to hide the smell, the thinking being that the smell carried the disease. It seems that actually the material of the whole outfit was heavily waxed, which provided a hard barrier to the infectious fleas. I suspect you can think of many other examples of either outright failure or success for the wrong reasons.

Failure is not something to be feared. This is not an endorsement of deliberately sabotaging things, but an encouragement to try new things. As children we tend to be excited by new things, but as some people get older they become more anxious about trying new things. However, not all childhood experience is completely fearless, and failure seems less harsh if a supportive adult is to hand to smooth things over. As Batman’s father says to him in Batman Begins when he falls down deep into the bat cave, “And why do we fall, Bruce? So we can learn to pick ourselves up,” [Batman]. I hope to remain excited about trying out new programming languages or trying new technologies but do sometimes experience a twinge of worry when reading the documents or trying out a new machine for the first time. I choose to interpret this as adrenaline and carry on regardless. It is nice to have a supportive adult to share the experience with. If, no longer crippled by fear, you continue to practise TDD, try out new technologies or allow yourself to make mistakes, how do you respond to people around you when you think they are ‘doing it wrong’, to coin yet another phrase?

Before Cassandra became widely known as a database [Apache] she was more commonly known in Greek mythology as a woman cursed by never being believed, after fighting off an attempt at seduction. There are other stories, such as the boy who cried wolf, where the main character is not believed, having lied on previous occasions. Cassandra’s curse lay in her foretelling truthfully what would come to pass. If Batman’s father had stopped him playing near the end of the garden in case he fell down, we would not have Batman. If my parents had warned me I would fall off my bicycle when I tried to learn, that may have stopped me trying. Being a constant voice of gloom saying how things will probably go wrong is likely to lead to being ignored, but saying nothing if a path is fraught with danger is unproductive and probably cruel. If you are pair programming or code reviewing you need to choose a balance between banning someone from doing things their way and warning about unusual or downright dangerous approaches. If you resort to check-in gates – various automatic ways of enforcing code standards – people might rebel and try to find workarounds. Sometimes there is a genuine need to do something unusual. In this case a conversation rather than a convoluted hack might be more productive. If you warn somebody that recording every function call, with the precise parameters used, as an expectation in a mock might lead to brittle unit tests which need to change each time the code changes you might be proved right and might be listened to in the future. Banning such tests may have a different outcome. It is important to allow people space to fail while they learn. We learn from our mistakes.

Bad Unit Tests (BUTs [Henney15]) are one matter; they will usually become clear in the long run and can easily be deleted just like any other code. Other potentially dangerous behaviour might need ‘nipping in the bud’ or stopping quickly. I recall a colleague pointing out I might like to run a couple of SQL commands in a transaction, and though I muttered under my breath, they had a point. The SQL wouldn’t have done quite what I intended and it would have taken a long while to sort out, had I not been able to simply roll-back to the previous state. The suggestion to use a transaction allowed me to fail, but safely. The support team who didn’t use a transaction and managed to delete all of the risk figures for a year were a different matter. They were subsequently banned from making any C, U or D crud commands – strictly Read-only SQL for them from then on, with more dangerous commands saved for those who used transactions properly. Of course, if you want to get banned from over-night support this might give you ideas.

Failure has a time and a place. Perhaps you would rather not ‘die on your feet’ in a public situation, for example at demo to an important, or even unimportant, customer. [Are any customers unimportant?] If you run through first, check your specialist hard work works at their site, have two working laptops and so on, unexpected errors can still occur. [Are any errors expected?] The mark of your professionalism might be how you deal with the unexpected. I am not suggesting deliberately writing buggy software in order to look heroic by fixing the bug within hours of its discovery. Such an approach is bound to be discovered quicker than an un-needed sleep hiding in a loop, just so you can speed things up easily. Don’t be afraid of failure, but rather try to create a safe place to fail while you learn. To end with another aphorism,

“Ever tried. Ever failed. Try again. Fail again. Fail better.” [Beckett]

References

[Apache] http://cassandra.apache.org/

[Aphorisms] Hippocrates. ‘Aphorismi’ according to the internet – see (for example) http://www.perseus.tufts.edu/hopper/text?doc=Perseus%3Atext%3A1999.01.0248%3Atext%3DAph

[Batman] Batman Begins Film, 2005 http://www.imdb.com/character/ch0000246/quotes

[Beckett83] Westward Ho Samuel Beckett, 1983

[Feathers04] Working Effectively with Legacy Code Michael Feathers, 2004.

[Henney15] ‘What we talk about when we talk about testing’ ACCU Conference 2015, see http://www.infoq.com/presentations/unit-testing-tips-tricks

[Failure] https://en.wikipedia.org/wiki/Failure_Is_Not_an_Option

[Penicillin] http://www.acs.org/content/acs/en/education/whatischemistry/landmarks/flemingpenicillin.html

Overload Journal #129 - October 2015 + Journal Editorial