Don’t Design for Performance Until It’s Too Late

People claim optimisation can cause unreadable code. Andy Balaam argues good performance should be at the core of code design.

There is a piece of ancient wisdom which states:

Premature optimisation is the root of all evil

This ancient wisdom [ Knuth74 ] is, like all ancient wisdom, correct.

However.

It appears to have been reinterpreted as essentially meaning:

Don’t design for performance until it’s too late

which is clearly, and very importantly, very wrong.

Performance is a feature

Before I begin I want us all to agree that performance is a feature. [ Atwood09 ]

I work on a real-life ‘enterprise’ [ Vinh07 ] application. Its features are entirely driven by the need for immediate cash, not by developers following pipe dreams. And yet, for the last 6–12 months the majority of my time has been spent trying to retrofit performance into this application. Believe me, this is not because we have users who are obsessive about wasting valuable seconds – it’s because our performance sucks so hard it’s deeply embarrassing.

What is your favourite program? How well does it perform? What is your least favourite? Why?

For me, and many other people, their answers to those questions demonstrate the importance of performance. Firefox was launched to improve the performance of Mozilla. People love git because of how fast it is. Lotus Notes is hated so much partly because of its performance. My main complaints about programs I use involve performance (e.g. Thunderbird is too slow for IMAP email).

A fast response to the user is one of those crucial inches [ Spolsky07 ] on the journey to software that makes people happy. Making people happy gives you the kind of scary fanboyism that surrounds git. Wouldn’t you like that for your product?

What is optimisation?

When my hero said that premature optimisation was the root of all evil, he was talking in the days when you had to hand-optimise your C in assembly language. Or, more likely in his case, you had to hand-optimise your assembly language into faster assembly language. Optimisation like that very often obfuscates your code.

These days, 99% of the time, your compiler does all of this work for you, so you can have relatively comprehensible code in whatever trendy language you like, and still have the fastest possible local implementation of that in machine code.

Meanwhile, Knuth knew that 99% of your code is simply not performance-critical – it only runs a few times, or it’s just so much faster than some other bit that it doesn’t matter. The lesson we all learn eventually is that the slow bit is never quite what you thought, and you have to measure to find out where to concentrate your effort.

So, if optimisation is obfuscation, and 99% of your code isn’t the bit you need to make faster, it becomes clear that premature optimisation is the root of much evil.

But optimisation is not designing for performance.

Design for performance

Fundamentally, to get good performance, you are going to need to measure the time spent in various parts of your code (I suggest Very Sleepy [ VerySleepy ] if you’re on Windows) and make the slow bits faster (or happen less often). However, there are still some principles you can follow that will mean you need to spend less time doing this (which is a pity, because I really love doing it).

If you don’t design for performance you are almost certainly going to need to restructure large parts of your program later, which is very difficult and time-consuming.

There are two aspects to designing for performance: writing good local code, and creating good global structure.

Write good local code

Before you write an algorithm, think for a few minutes about how to make it work efficiently. e.g. if you’re writing C++, consider whether a deque or a list would be better than a vector for how you’re going to use it.

Always consider using vector in C++

C++’s standard containers are specified to fulfil certain ‘performance’ criteria, and also to give certain guarantees (e.g. about how long iterators stay valid) that are influenced by the algorithms that are expected to be used. The ‘performance’ criteria specify algorithm complexity, which means they are important when the number of elements in a container is very large, but real-world performance when numbers are smaller may be very different. Bjarne Stroustrup (among others) has observed [ Stroustrup12 ] that even with thousands of elements, vectors can comprehensively outperform the other containers, even for operations that sound like the kinds of operations more suited to lists or maps.

Stroustrup provides a rule of thumb he calls ‘Use compact data’, which reflects the fact that often the most performance-critical aspect of code is the locality of the data being operated on. The enormous differences in memory access speed between different levels of cache and the system memory mean that the effect of keeping data close together (as a vector does) can swamp any effects of algorithmic complexity. [ Wicht12 ]

When choosing a C++ container, normally the most important consideration will be simplicity and comprehensibility of the code, but when performance is the primary consideration, you should measure, and you may be surprised how often vector is the right choice. Many argue vector should be your default choice.

Think about what is idiomatic for your language and why. Think about what the computer really has to do to produce the results you are asking for. Are there going to be a lot of objects about? Maybe you can avoid copying them too many times. Are you creating and deleting a lot of objects? Can you reuse some instead? (Exercise caution with that one, though – if you start obfuscating you come into conflict with the ancient wisdom.)

Often, if you think through what you are doing, and the most efficient way to do it, you will end up with a faster and more memory-efficient algorithm, that expresses your intention better than if you’d written the first thing that came into your head. There is no downside to that.

Try to minimise the number of times you need to ask the operating system for a chunk of memory: this is surprisingly slow. E.g. in C++, prefer creating by-value data members instead of pointers to objects allocated with their own call to new.

By the way, don’t worry if this sounds intimidating. The way to learn this stuff is to measure what you have done and then work out why it is slow. Next time you’ll jump straight to the fast solution without the detour to the slow one.

Of course, none of this will matter if you don’t have good global structure.

Create good global structure

The hardest and most important work you need to do to have good performance is to have good structure in the ways the different parts of your program interact.

This means thinking about how classes and components communicate with and control each other.

It may be helpful to use a streaming style of communication – can you send little chunks of information to be processed one by one instead of a huge great blob?

Try to make sure your components to use a common infrastructure: if different parts use different string classes you are going to spend most of your time copying from one to the other.

The hardest and deepest mystery in getting good performance (and in programming generally) is choosing the right fundamental data structures. I’ll never forget the lesson I learnt when a friend of mine had a conversation with me about a toy project I was doing (that was particularly focussed on trying to be fast) and then went away and produced code that was orders of magnitude faster, simply because he had chosen the right data structure. The lesson I learnt was that I am not as good as I think I am.

To be honest this section is a little shorter than I’d like because I know I don’t have a lot of answers about how to do this well. I do know, though, that if you don’t think about it now you will have the pain of restructuring your program later, when it’s full of bug fixes that are going to get re-broken by the restructuring.

Of course, if you do think about it now you’re still pretty likely to need to change it later…

Ancient wisdom

Ancient wisdom is usually right, but misinterpreting it and using it as a license to write bad code is a bad idea. Carry on.

References

[Atwood09] Atwood, Jeff ‘A Few Speed Improvements’ http://blog.stackoverflow.com/2009/08/a-few-speed-improvements/

[Knuth74] Knuth, Donald ‘1974 Turing Award Lecture’ from Communications of the ACM 17 (12), (December 1974), p 671

[Spolsky07] Spolsky, Joel ‘A game of inches’ http://www.joelonsoftware.com/items/2007/06/07.html

[Stroustrup12] Stroustrup, Bjarne ‘Why you should avoid Linked Lists’ from GoingNative 2012: https://www.youtube.com/watch?v=YQs6IC-vgmo

[VerySleepy] ‘Very Sleepy’ http://www.codersnotes.com/sleepy

[Vinh07] Vinh, Khoi ‘If It Looks Like a Cow, Swims Like a Dolphin and Quacks Like a Duck, It Must Be Enterprise Software’ http://www.subtraction.com/2007/10/19/if-it-looks-/

[Wicht12] Wicht, Baptiste ‘C++ benchmark – std::vector VS std::list’ https://dzone.com/articles/c-benchmark-%E2%80%93-stdvector-vs