# Predictions and Predilections

Journal Editorial + Overload Journal #153 - October 2019   Author: Frances Buontempo
Forecasting the future is difficult. Frances Buontempo has a foreboding sense that a lack of impartially makes things even harder.

Predicting whether or not Overload will have an editorial while I am the editor is easy. I attended the Agile 2019 conference this year, co-chairing the Dev Practices and Craft track with Seb Rose [Agile]. This long trip, just for a few days, has increased my carbon footprint and decreased the time I had to think of an editorial. This means, yet again I haven’t written one. Your prediction was correct: Overload is editorial-free yet again.

How do you make a prediction? Listing all the possible outcomes and assigning a probability to each gives a sense of the likelihood of a specific outcome. This approach has two problems: listing the outcomes and getting accurate probabilities. There are several formal approaches, such as Bayes, for working out these probabilities. In one sense the probability represents the uncertainty of a given event. Uncertainty can come in two flavours: epistemological, or due to lack of knowledge, and aleatory, or due to inherent randomness (see ‘Bayesian statistics’ [Spiegelhalter09] for more background]. If you sure you are uncertain, are you sure how uncertain you are? In other words, how do you decide how accurate these probabilities are? Some experiments can help, but you may not be able to try something a statistically valid number of times first. Furthermore, how do you predict how likely something is that has never happened before? Tough question to answer, but people try to do this. If you have no empirical data (what other types of data are there?), how can you guess how often something might happen? One approach is a forecasting model [Gelman98]. This requires a model, obviously. This might fit known cases, but I still find it difficult to accept forecasts around previously unseen events. I do understand the maths, but it all seems inherently odd.

Physics models, though often inspired by data, often take the form of a closed-form formula, rather like a function, taking inputs and returning a single output, rather than several outputs with confidence intervals. Sometimes models have been calibrated to data at some point, to find parameters, such as acceleration due to gravity. Some models are based on a combination of other known models. If all the forces acting on a body can be calculated, the total force can be deduced. In other domains, the idea of a straight summation breaks down. For example, if two chemicals have a known toxicity, the level of harm cannot be worked out by adding the two numbers. They may interact, and be less toxic overall, or even worse in conjunction. Going back to physics, though the motion of two bodies, such as the sun and a planet can be calculated, the three-body problem [Wikipedia] says that there is no closed-form solution for finding the motion of three, or more, objects, given their starting velocity and positions. Numerical methods are required instead. Or as I put it, “left a bit, right as bit” until you get something close enough to an answer. For some definition of close.

Not all prediction systems are based on statistics or models. Decision trees fall under the umbrella term machine learning. They give classifications of new data based on summaries of training data, in the form of flow charts or list of rules. Something like

If Utility module updated on a Monday then build broken all week.

In order to find the tree or rules, some decision trees use entropy, which is formally an Information Theory idea. At a high level, it measures the chaos present in a system. If you toss a fair coin, you would expect it to be heads about half the times, and tails the remaining times. This is higher entropy, or more chaos, and making it hard to predict what the next toss will be. In contrast, if an unfair coin always comes down heads, every single time, it is much easier to predict accurately what will happen. Less chaos means you can compress this down very easily. In the first case, of a fair coin, writing a function to predict what happens next is harder and needs more lines of code. In the second case, the function need only return “Heads” each time. In a sense, this is still based on counting possible outcomes, but is taking a different perspective. I recently wrote a short blog post about decision trees [Buontempo19a]. At the expense of repeating myself, armed with rows of data, each with features and a category – either yes/no, heads/tails or one of many classes – you can build a classifier, which will tell you which category new data falls into. There are various ways to decide how to split up the data, including entropy. Regardless of the method, each algorithm follows the same overall process. Start with a root node then

1. If all the data at a node is in the same category (or almost all in the same category) form a leaf node.
2. For a non-leaf node, pick a feature, according to your chosen method.
3. Split the data at this node, some to the left branch, and some to the other branch (or branches) depending on the value of the chosen feature.
4. Continue until each node is a leaf node.

This is a bit like a sorting algorithm: in quick sort, you choose a pivot value and split the data down one branch or the other, until you have single points at nodes. Here we don’t choose a pivot value but features. For example, is the coin heads or tails? The way to pick a feature can be based on statistics, information theory or even at random. At each step, you want to know if all the items in one category tend to have the same value or range of values of a feature. Once you are done you have a tree (or flow chart) you can apply to new data. Each way to split has various pros and cons. You can even build several trees. A random forest will build lots of trees and they vote on the class of new, unseen data. You could build your own voting system, using a variety of tree induction techniques. This might avoid some specific problems, like over-fitting from some techniques. Decision trees can be used to spot what features problematic scenarios have in common. Maybe all your bug reports end up with a fix in the same module. That might not be immediately clear until you analyse the data. If you want to know what certain things have in common, a decision tree is worth a try. My Genetic Algorithms and Machine Learning for Programmers book [Buontempo19b] has a chapter on building a decision tree from scratch if you want some details, but there are plenty of frameworks out there that will automatically build one for you. You may not be able to predict the future with your tree, and your machine may learn nothing, as is often the case in Machine Learning, however, you may spot something of interest.

Every classifier, be that a decision tree or not, has a set of possible outcomes, classes or categories. In various disciplines, including machine learning, the possible outcomes are described as a “search space”. These can be more generally useful than for classifiers. When we moved house, we kept our cat shut in for a bit, so he could get the hang of his new home first, before exploring the great outdoors. He explored by trying a corridor and then returning to his starting point. Then he went a bit further, but always went back to the starting point. Each iteration added a small new part of his search space. In this case, the constant returning to the base station wasn’t a bias, it was sensible. In a fit of laziness, we bought a Roomba, robot vacuum cleaner. This uses a similar algorithm. It has a base station, which it tends to stick near initially, gradually adding paths round the room, often making its way round the outside edges, just like our cat. I wonder if we could use the cat to sweep the floor. Nah, bad idea. I think I can see a few potential problems with that. This exploration of possibilities, though not predicting the future, includes an element of premonition. “Going forwards here means hitting a wall.” Does learning mean you think you know what outcome is likely, given an initial set of conditions and a specific choice? Maybe. What do you think learning means? That’s quite a big thing to think about.

Now, a spatial search brings various extra ways to make predictions. Sometimes you can guess where something might be, like mugs in a kitchen. There are often within reaching distance of a kettle. A combination of logic and expectations can be used to make an initial guess. I wonder if we tend to apply the same heuristics when dealing with code. Which header is `std::vector` in? What about `std::map`? Easy. What about `std::less`? That’s another matter.

Predictions are often driven by some kind of bias. Why would you look in the fridge for dishwasher tablets? Because the light comes on when you open the door, so it’s easier to see. Sense does not always prevail. More sensibly, if you plan a journey and hate public transport, you are more likely to consider driving, cycling, getting a taxi or walking over some other modes of transport. I wonder if all the recent AI research into self-driving cars is somewhat biased. I have maintained for a long time that Star Trek’s transport technology would be far better. I presume this might be less polluting. It certainly wouldn’t need upkeep of roads. And I don’t recall any episodes where transporter accidents involve innocent pedestrians or cyclists. Not to say transporter accidents are unheard of. I just maintain someone, somewhere, has a predilection for cars and that is driving, pun intended, the research into modes of transport in the wrong direction.

I love AI, and find its twists and turn through history fascinating. Trying to predict where it will move in the futures is very difficult. As with much research, it is partially driven by those who fund the work, which is turn might be biased towards return on investment rather than usefulness or some kind of inherent value, whatever that means. Other technological innovations are, frankly, more disquieting. Recent stories of facial recognition in use at King’s Cross station in London have causes questions and possible GDPR related issues. It seems the company responsible, Argent, claims the system will ‘ensure public safety’ [BBC]. I have questions. If it recognises faces, does it have a database of faces of people who should be arrested on sight? I would imagine if you tracked individuals’ paths through the station, you may spot bottlenecks and bad signage and be able to improve the situation; however, this would not require saving people faces. Indeed, this immediately made me think of a variety of sci-fi stories, including Face Off and Minority Report. This possibly tells you more about my background and point of view than the event itself. Here’s a conjecture:

Many physics models are based on odd theological or Weltanschauung (world-view) assumptions. How many colours are there in a rainbow? An English rainbow has seven, apparently because Newton regarded seven as a mystically significant number. Other cultures have different counts of the colours. Pythagoras refused to believe in irrational numbers, resisting them. He also felt circles were significant, so planets had to be spheres, and orbits had to be spherical. The starting assumptions colour the final models. Climate-change deniers are also working on a set of assumptions and biases. How to notice a bias, or predilection, underpinning a model or prediction is a hard question. Sometimes the predictions are close enough or the model seems to work, which might allow incorrect, or at least suspect starting points to slip through. Question your own assumptions when you next make a prediction. What point of view are you operating from? What does the world look like through someone else’s eyes?

One final question. Why are you trying to make a decision anyway? Frequently, predictions are made in order to aid decision making. For example, guessing if it will rain will help me decide if I will need an umbrella. Figuring out what could possibly go wrong can help prepare for the worst. However, an impending sense of doom can lead to self-fulfilling prophecies. Can you predict the future without influencing it? Thinking through what might happen can be useful, though. Being accurate isn’t the most important thing. Don’t forget:

A completely predictable future is already the past.

~ Alan Watts

What does matter is being aware of possible outcomes, probable contributing factors, and recognizing your assumptions. Bias in, bias out. A sense of wonder and enquiry in, endless possibilities and hope out.

## References

[Agile] Agile 19 conference: https://www.agilealliance.org/agile2019/

[BBC] ‘Data regulator probes King’s Cross facial recognition tech’, posted 15 August 2019 at https://www.bbc.co.uk/news/technology-49357759

[Buontempo19a] Frances Buontempo (2019) ‘Decision trees for feature selection’, posted on http://buontempoconsulting.blogspot.com/2019/07/decision-trees-for-feature-selection.html

[Buontempo19b] Frances Buontempo (2019) ‘Genetic Algorithms and Machine Learning for Programmers’, https://pragprog.com/book/fbmach/genetic-algorithms-and-machine-learning-for-programmers

[Gelman98] Andrew Gelman, Gary King, and John Boscardin (1998) ‘Estimating the Probability of Events that Have Never Occurred: When Is Your Vote Decisive?’ Journal of the American Statistical Association, 93 pp1–9. Accessed via https://gking.harvard.edu/files/gking/files/estimatprob.pdf

[Spiegelhalter09] David Spiegelhalter and Kenneth Rice (2009) ‘Bayesian Statistics’, published on Scholarpedia, available from: http://www.scholarpedia.org/article/Bayesian_statistics

[Wikipedia] Three-body problem: https://en.wikipedia.org/wiki/Three-body_problem

has a BA in Maths + Philosophy, an MSc in Pure Maths and a PhD technically in Chemical Engineering, but mainly programming and learning about AI and data mining. She has been a programmer since the 90s, and learnt to program by reading the manual for her Dad’s BBC model B machine.