Why My Teenage Code Was Terrible: Sorting Algorithms and Big O Notation

When I was about 15,
I had a week of ‘work experience’. Americans would call that an ‘internship’,
but that’s not really right: work experience is a thing in British schools that basically just gets the older kids
out of the way for a few days and palms them off on local businesses so
the teachers can have a break. I can’t actually remember the name of the
company now, but anyway, they had a problem: they had recently scanned in
a load of photos, all the history of the place,
but those photos weren’t categorised: they just had hundreds, maybe a
couple of thousands of pictures. all in one folder with
arbitrary filenames. And the work experience kid got assigned the
job of going through them all and writing descriptions. “Well, that’s boring”, I thought, because of course I did.
It was really dull. It’s the job you give the
work experience kid, because they’re not disenchanted enough
with the world yet. So in between typing the descriptions, I figured I’d show off and write a system
to search through them. It’d be like Yahoo, or that new Google thing
that was starting to become popular. It’d look great.
I’d look really smart. I didn’t know anything about
how to build that, but with utterly unjustified confidence, I figured it couldn’t be that big a problem. And it wasn’t. Take the search word,
loop through all the descriptions, count how many words match, then return all the matching results,
sorted by the number of matches. And yeah. It worked fine. Until I got over about fifty photographs. At which point it started to
get a little bit slow. And after a hundred,
it started to get really slow. And I started to spend more time trying to
fix it and less time actually doing my job. Actually, I mean, I wasn’t getting paid, so… Anyway, to explain why my code sucked: let’s talk about Big O Notation. Whatever you’re coding, there are almost
certainly many different ways to approach the problem. But even though all those approaches
can end up with the same result, they may take vastly different amounts of
time or memory to get there. And the best example of that is sorting. An animation like this is
a classic of computer science. I remember seeing demonstrations of
things like this on a DOS computer, complete with bleep-bloop sound effects,
when I was tiny. So we’ve got a bunch of different-sized
blocks, we’ve got a list. What algorithms can we use
to put our list in order? Let’s start with something called
bubble sort. Take the first two blocks and compare them. If the first one is longer than the second,
swap them. If not, leave them where they are. Now compare the next two blocks.
Same again. If they’re in the wrong order,
swap them. If they’re not, leave them. And so on, and so on, until we get
all the way to the end of the list. And at that point, we have moved the
biggest block in the list to the very end. And that’s all. And it’s not very efficient,
but it has now been done. So now we can go again from the start: we can compare, swap if we need to, compare, swap if we need to, compare, swap if we need to,
and so on, and so on, and so on, and if we get all the way start-to-finish
without moving a thing, then we know everything is sorted
and we can stop. And if not: we go back to the start. And eventually, after
many, many, many passes, congratulations, we have a sorted list! Bubble Sort, with each block “bubbling up”,
is terrible. It has to check every block in the list on
every pass, and in most cases it’ll have to pass through
the list as many times as there are blocks on it. Ten blocks means ten operations,
performed ten times. A hundred blocks means a hundred operations,
performed a hundred times. The number of operations it has to perform
goes up with the square of the number of blocks. Which means that it has a ‘Big O’ of n². The foundations of Big O notation, plus that
O symbol, were laid down towards the end of the 19th
century by German mathematicians working in
analytic number theory. But it was popularised in computer science by
the great Donald Knuth in the 1970s. “Ka-nooth”?
I should have checked that. Anyway. Big O distills an algorithm down to
a single expression, which indicates how it performs
as you add more blocks, more inputs of any kind. It lets you compare, roughly, approximately, in a big-picture sense, how slow an algorithm will get as you add
more and more complexity. n² means that the processing time goes up
with the square of the number of inputs: double the number of inputs,
and the run-time quadruples. The lowest Big O notation an algorithm can
reasonably have is ‘constant’, O(1). That means that no matter
how much stuff you throw at it, it’s going to return its answer in the
same amount of time. And O(1) is usually something
really simple, like “return the first item in the list”. Some common algorithms have a Big O of n,
or ‘linear’. This means that processing time goes up at
a steady rate with the number of items. So if your code needs to go through a list
and just do one thing to each item in it: that would be linear. Double the items, double the processing time. So let’s try some other sorting algorithms. Here is ‘insertion sort’: start with the
second item, compare it to the item before. If they’re the wrong way round,
swap them. Now take the third item,
and compare it to the one before: if they’re the wrong way round, swap them and then check again
with the one before that. Basically, you keep moving each item in turn
towards the start, checking as you go, until either you find something smaller
and you know it’s in the right place, or you reach the start of the list
and you know it’s the smallest. If you’ve got a hand of cards and
you are putting them in numerical order, this is the algorithm you’re probably using
without even thinking about it. And yes, that is usually faster
than bubble sort. But crucially,
its Big O notation is also n². And this is a really important distinction
to make: if two algorithms have the same Big O, it doesn’t mean they perform the same. It just means that if you draw the graphs
of how they get worse with more inputs, those graphs have the same shape. For both bubble sort and insertion sort, if you double the number of items,
it’ll quadruple the run time: but with insertion sort, that run time is usually
shorter to begin with. Now I’m not going to go into the details of
every sorting algorithm, there are a lot of them, but there are two I want to flag up. The first is Quick Sort. Choose an item on the list
and call it the ‘pivot’. Then go through the list, putting anything smaller than the pivot
somewhere to its left, and anything larger somewhere to its right. Once that’s done, look at just the ones to the left of the pivot
and you do the same, you choose a pivot and you split those. Repeat that divide-and-conquer
all the way down. And then do the same to the other side of each
of those pivot points, and work your way back up. And the worst-case Big O notation for this is
still n². But the average performance on a random list
is n log n. I’m not going into the mathematics, but
suffice it to say that is a lot, lot better. And then, on the other end of the scale, there is bogosort, which is designed
as a joke. Bogosort is really simple: randomise the list. Is it sorted? Great! If it’s not, randomise again
and keep going until it’s sorted. The average performance there is O((n+1)!). Factorial. You can draw a graph of all the common
Big O notations to compare them. But the important thing is:
the Big O of whatever you’re coding is going to be mostly affected by the
worst one of those that you have anywhere in it. If your code does ten operations that are
linear and one that’s got a Big O of n²: well, your program is n² now. Remember it’s just an approximation, it’s a guide for humans to work out which
algorithm is best to use. Despite being about computer science,
it’s a little bit fuzzy. So what should I have done with my search
system, back when I was a kid? Well, I could have used code that was
already out there, realised that other people had done the work
for me. That’s true. But ultimately what I should have done is
what the client asked for: typed the descriptions into a word processor
and saved it as a document. Because that was a better solution: they could have just searched it by pressing
Control-F. Text doesn’t break down or need updating
when an operating system gets upgraded. If they had new photos to add,
anyone who knew how to type could update it. They could still open that document today. The big problem wasn’t that I used
a bad algorithm: the big problem was that I was ignoring what
my users actually needed because I wanted to show off
how clever I thought I was. It’s important to think about
how fast things will work, sure, but the best solution isn’t always the fastest,
or the smartest. It’s the one that works for everyone, long-term. This series of The Basics is sponsored by
Dashlane, the password manager. And there’s one aspect of password managers
that I haven’t talked about in these sponsored sections yet: protecting you against phishing attacks. This is a slightly embarrassing story, but
let’s say you’ve signed up to Dashlane, which, incidentally, you can do by going to
dashlane.com/tomscott for a free 30-day trial of Dashlane Premium. And let’s say that at some point later on, you’re exhausted and tired and you click
a link to view a Google Doc that someone’s shared with you,
and you get asked to sign into Google, and you just assume that your session has
timed out for security as it does sometimes, so you type in your email address. I’m saying “you” here, but this literally
happened to me because I was exhausted and
not thinking straight. So, let’s say I did that. And then my password didn’t autofill. Because the domain name wasn’t right. It wasn’t a Google page. And autofill checks the domain that you’re
logging into. If the domain doesn’t match, it won’t
type your password in for you. And that was the clue I needed to wake up
and realise that, actually, that wasn’t a real login link and that was
a really close call. The people trying to pull off a phishing attack
only have to fool you once. Now, password managers aren’t foolproof,
there’s always someone more foolish, I could have assumed something was broken
in the software and tried to copy-and-paste my password instead, but it was enough of a red flag, enough of
a warning, that I saw what was happening. So: dashlane.com/tomscott for a 30-day free
trial of Dashlane Premium, which includes unlimited password storage
and sync, plus a load of other features. And if you like it, you can use the code “tomscott”
for 10% off.