What are you optimizing for?

What are you optimizing for?

By Frances Buontempo

Overload, 30(167):2-3, February 2022


Sometimes attempts to improve things make it worse. Frances Buontempo encourages you to think about what you’re doing when you try to optimise, and to check it really is working.

Welcome to the first issue of Overload, 2022. Before you ask, I haven’t written an editorial. I have been fighting a battle with the number of tabs open in my PC’s browser. I might be winning, with only 67, after a steady 69. I did try closing several, but unfortunately opened other links while tidying up and kept ending back at 69. Clearly, for me 69 is the optimal number of tabs and 67 is just showing off. That only wasted a few hours of my life. Advent of Code [Advent] took even more time, and is hogging a couple of tabs as we speak.

If you’ve not come across Advent of Code before, I did a short write up for our members’ magazine CVu [Buontempo22]. Puzzles are set during December, each having two parts. The first part tends to nudge you towards a simple way to solve the problem and the second part then slaps you in the face by blowing out RAM or similar. There are often warning signs in the description, such as ‘exponential’ getting a mention, or simply a gut feeling that this might get really big, really quick. Knuth told us “premature optimisation is the root of all evil.” Though this quote is about how long something might take to run, rather than memory, it can apply to both. However, speed and RAM tend to tug in opposite directions – it can be hard to do things quickly and use little memory. Caching requires somewhere to store the cache, but can avoid duplicate calculations. It may even slow down your code if it introduces cache misses. Many attempts to solve problems tug in opposing directions. To find out if you have covid, you could take a lateral flow test. The results are available quickly, but may not be so accurate. Instead, you could take a PCR test, but have to wait longer for the results. The PCR – a polymerase chain reaction – requires a few iterations, splitting up DNA and making copies of the target of interest if present, making it much more sensitive than the lateral flow test, which relies on cells in the sample hitting detector molecules leading to a colour change. (Forgive me if my summary of this year’s Royal Institute Christmas Lecture is a bit mis-remembered and only vaguely correct. There are some helpful links on their website [Royal Institute21]). Speed versus accuracy is a common optimization tradeoff.

If you are optimizing, you need to decide your requirements. RAM, speed and accuracy might matter, but comprehensibility and even happiness might come into play as well. Loop unrolling tricks may speed up your code, but confuse your colleagues (or your future self) and cause unhappiness. You could experiment with -funroll-loops if you are using GCC, rather than try to do this by hand, however the manual says “This option makes code larger, and may or may not make it run faster.” [GCC]. The question lurking in the background for any optimization is always, “Has this worked?” For a one-off task, it’s all too common to spend more time attempting to automate it than doing the task manually. XKCD [XKCD] reminds us that writing code to automate something might not be the time saver we hoped for. More than that, if you are trying to speed up your code, measure and see what happens. Your instincts may be wrong.

Machine learning frequently involves optimization in one form or another. A fundamental part of the process involves checking the algorithm is working by calculating a score in a so-called fitness or cost function. Bigger scores are better if you want to maximize something, like profit or happiness, and smaller scores are better for minimization problems, such as travel time or fuel used. The algorithm makes course corrections to get a better score in the fitness function, usually by selecting different, randomly chosen inputs. Using randomness to solve a problem might seem odd on the face of it. If there is time to brute force a solution that might be better. However, some situations have so many possible combinations of inputs it’s quicker to try a few and see what happens. One selection might be good enough, job done. Otherwise, try, try, try again. Now, pure randomness might never work. First, the same inputs may be selected several times over, wasting time. Many machine learning algorithms actually do this, though not deliberately. For example, genetic algorithms don’t conventionally track what they have tried before. If the algorithm gets stuck in a rut, trying the same thing over and over again, you can try again from the top with different parameters or a new fitness function. Second, a pure random algorithm might never solve a problem: the so-called bogosort springs to mind [Wikipedia]. This algorithm checks if the inputs are sorted. If they are great, return them. Otherwise randomly permute and try again. Potentially forever. The internet [Morgan-Mar] assures me there are other even worse variants of this algorithm. In order to optimize my tab count, I have closed that link, so I’ll leave you to find out about the sorting algorithm that violates the second law of thermodynamics and report back. Randomness can work, but might not.

Machine learning can also be used to make predictions. There are many ways to do this, but each tries to minimize the difference between predicted values and actual values, again giving another optimisation problem. Often this requires a gradient calculation, in order to move ‘downhill’ or closer to the required outputs. This number crunching can take a while, so you can speed up the optimisation by using stochastic gradient descent. This “inexact but powerful technique” [Stojiljković21] finds the gradient on a small subset of the training data, clearly an attempt to optimise the optimisation. Where will this meta-optimising end? That randomness can be used to solve problems may be a surprise. It shouldn’t; trial and error is a sensible way to explore a problem. Furthermore, Stochastic (random) processes underpin many machine learning algorithms. In fact they also form much of finance and possibly some rocket science too. The c2 wiki does question the connection [c2-09] but quickly devolves into flippancy, quoting words allegedly overheard at NASA: “Come on! You make things sound so difficult. Sending probes to Mars isn’t rocket science, you know!

Now, in order to optimize, we may need to think beyond our initial requirements. Nobert Wiener, a mathematician who made some fundamental steps towards modern artificial intelligence, warned us to be very thoughtful about the “purpose put into the machine.” [Wiener60] He reminds us of the tale of the sorcerer’s apprentice, who automates a broom to carry water, which it dutifully does, nearly drowning the apprentice in the process. He then says,

If we use, to achieve our purposes, a mechanical agency with whose operation we cannot efficiently interfere once we have started it, because the action is so fast and irrevocable that we have not the data to intervene before the action is complete, then we had better be quite sure that the purpose put into the machine is the purpose which we really desire and not merely a colorful imitation of it.

There are many similar thought experiments concerning potential problems with automation, AI and optimization. The paperclip maximizer springs to mind. If we task an AI to collect paperclips, and leave it to its purpose, it might put “first all of earth and then increasing portions of space into paperclip manufacturing facilities”. [LessWrong]. Fear not: not all attempts to optimize pose an existential risk to the human race, the planet or the universe. 2021’s Reith Lectures were given by Stuart Russell and explored ‘Living with Artificial Intelligence’ [Russell21]. He mentioned Wiener’s concerns about fully specifying our objectives. Asimov’s 3 Laws of Robotics are an attempt to constrain AI, covering us when we fail to give clear purpose to a robot, and allowing many interesting stories to develop. They are interesting, but fictional. Russell suggested a better solution involves assistance games [Shah19], where a human provides feedback while the AI learns rather than giving a fixed objective up-front. I suspect an analogy about waterfall versus agile is lurking in there somewhere. I’ll leave that for you to think about.

When we optimize, we need to keep an eye on our solution, to ensure it is solving our problem and not causing other complications in the process. We may not find the best possible approach and that’s OK. Trial and error experiments might find an acceptable algorithm or set of inputs. Acceptable might be good enough. If you need your code to run faster, quick enough is fine. Spending weeks trying to find the quickest possible code might take longer than running a slightly slower version that works. As we try to improve our code’s performance, we might need to consider its comprehensibility and the happiness, or otherwise, of future developers. Many clever techniques can speed up code, but might make your head hurt each time you encounter them. Duff’s device, a loop unrolling technique, always makes my eyes glaze over. It’s also important to realise that a clever technique that works in one situation may not work in another. If your target architecture changes or new floating point operations become available, or you change compiler, old optimization techniques may in fact pessimise your performance. Things change.

What happens if we abandon our quest to optimize and try to pessimise instead? There are various ways to achieve this, for example the ‘Multiply and Surrender’ strategy, which

consists in replacing the problem at hand by two or more subproblems, each slightly simpler than the original, and continue multiplying subproblems and subsubproblems recursively in this fashion as long as possible. At some point the subproblems will all become so simple that their solution can no longer be postponed, and we will have to surrender. [c2-14]

This may seem like a case of chronic procrastination, but sometimes switching one problem for another works. However this can lead to analysis paralysis, where you freeze because every possible route forward appears to cause further problems. Multiply and surrender falls into a category known as reluctant algorithms. ‘Pessimal Algorithms and Simplexity Analysis’ by Broder and Stolfi [Broder84] explores other ways to solve various problems slowly. They open by posing the following problem: “Consider the following problem: we are given a table of n integer keys A1, A2, ..., An and a query integer X. We want to locate X in the table, but we are in no particular hurry to succeed; in fact, we would like to delay success as much as possible.” They also consider the sloppiest path problem for graphs, a slowsort and many other problems. Deliberately writing slow algorithms may seem foolish, but it’s a great read and might get you thinking.

Sometimes trying to speed up makes things worse. How often have you watched a driver on a crowded motorway switching lanes to get one car ahead? Usually you pass them several times over if you stick to one lane. They have a greater speed than you, because they have travelled a larger distance in the same period of time, but have failed to make better progress towards their final destination. Slow is fast as the saying goes. In fact, traffic flow models are fascinating, which reminds me I should draw to a close because I have an ACCU conference talk to prepare. On a final note, don’t forget Goldratt’s theory of constraints. All processes have constraints or bottlenecks. If you try to optimise other parts of the process, you are unlikely to make any difference. Counterintuitively, you might find slowing down can give a steady input to the bottleneck which might speed things up overall. The Theory of Constraints “shifts the focus of management from optimizing separate assets, functions and resources to increasing the flow of throughput generated by the entire system.” [Constraints]. Here’s to a slow and steady 2022.

References

[Advent] https://adventofcode.com/

[Broder84] Andrei Broder and Jorge Stolfi (1984) ‘Pessimal Algorithms and Simplexity Analysis’, available at https://www.mipmip.org/tidbits/pasa.pdf

[Buontempo22] Frances Buontempo (2022) ‘Advent of Code’, CVu 33.6, available from: https://accu.org/journals/cvu/33/6/buontempo/

[c2-09] ‘Rocket Scientist’ (last updated 18 October 2009): https://wiki.c2.com/?RocketScientist

[c2-14] ‘Multiply and Surrender’ (last updated 26 October 2014): http://wiki.c2.com/?MultiplyAndSurrender

[Constraints] ‘Theory of Constraints (TOC) of Dr. Eliyahu Goldratt’ available at: https://www.tocinstitute.org/theory-of-constraints.html

[GCC] Using the GNU Compiler Collection (GCC) ‘Options that Control Optimization’ available at https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html

[LessWrong] ‘Paperclip Maximiser’, Less Wrong, available at https://www.lesswrong.com/tag/paperclip-maximizer

[Morgan-Mar] David Morgan-Mar, ‘DM’s Esoteric Programming Languages’ available at https://www.dangermouse.net/esoteric/

[Royal Institute21] Royal Institute Christmas Lectures 2021, ‘Going viral: How Covid changed science forever’, available at https://www.rigb.org/christmas-lectures/2021-going-viral-how-covid-changed-science-forever

[Russell21] Stuart Russell (2021) Living with Artificial Intelligence (a set of 4 talks), The Reith Lectures, available at https://www.bbc.co.uk/programmes/m001216k/episodes/player

[Shah19] Rohin Shah (2019) ‘Human-AI Collaboration’, published on Less Wrong, 22 Oct 2019, available at https://www.lesswrong.com/posts/dBMC63hjkc5wPqTC7/human-ai-collaboration

[Stojiljković21] Mirko Stojiljković (2021) ‘Stochastic Gradient Descent Algorithm With Python and NumPy’ on Real Python, available at https://realpython.com/gradient-descent-algorithm-python/

[Wiener60] Wiener, Norbert. ‘Some Moral and Technical Consequences of Automation’ Science, vol. 131, no. 3410, American Association for the Advancement of Science, 1960, pp. 1355–58, Available here https://nissenbaum.tech.cornell.edu/papers/Wiener.pdf

[Wikipedia] Bogosort: https://en.wikipedia.org/wiki/Bogosort

[XKCD] ‘Automation’, available at https://xkcd.com/1319/

Frances Buontempo has a BA in Maths + Philosophy, an MSc in Pure Maths and a PhD technically in Chemical Engineering, but mainly programming and learning about AI and data mining. She has been a programmer since the 90s, and learnt to program by reading the manual for her Dad’s BBC model B machine.