31 August 2018

Do 40 points mean your football team is safe from relegation?

This post is about football (soccer) and variation of data over time.  I hope you're a fan of both!  If you're not, you're still likely to find at least some of what follows useful in the workplace.

In the English Premier League, it's a commonly held assumption that when a team reaches 40 points they will be safe from getting relegated to the next league down.

I recently read an article from the Guardian newspaper that challenged this belief.  It suggested around 36 points would mean safety.  If you're not familiar with this 'rule of thumb', I recommend reading the article before continuing.  You'll find it here.

Like at work, when people start throwing single numbers around, I started to wonder:  "how have they come up with that number?"  "have they understood the variation?" and "what would it tell us if we put it in a control chart?"  I did that, and here's what it looks like:

















As you can see, it's a type of line chart, and I've plotted it as a time series.  Each dot shows the points needed to escape relegation (one point more than the relegated team) each season.

Like just about any data, there is variation.  In some seasons more points would have been needed than in others.  The three coloured lines on the chart help us make sense of that variation.

Average line


The red line in the middle is the mean average.  Although it's useful, there are problems when people only report on the average.  In my experience, the average becomes the number.  This is how they came up with 36 points in the Guardian article.  But the average doesn't take in to account variation.

For example, what's the average of 1 and 19?  And what's the average of 9 and 11?  The answer to both is 10, but the average doesn't tell us how much variation there is.  9 and 11 are much closer together than 1 and 19, but the average alone hides this information.   The same point is nicely made with this picture.

An average of 36 points (or 37 in my calculation) tells us nothing about how many points are needed to avoid relegation.  Roughly half the time more than 36 points will be needed, and the other half fewer will be needed.

Upper limit and lower limit lines


These are sometimes called the UCL (upper control limit) and LCL (lower control limit).

Data points going up and down between these lines are 'common cause' variation the normal changes in the points needed for safety between seasons.  You shouldn't pay too much attention to the differences between these points.  They represent the 'noise' in the data.  The 43 points needed in 2002-03 is just as likely as needing 31 points in 2009-10.

A mistake I often see people make is to pay attention to common cause variation, and act as if something out of the ordinary has happened when it hasn't.

These lines also help with prediction.  If you want to be confident your Premier League team avoids relegation this coming season, 44 points should be enough.  And as long as you've got 29 points you've still got a chance.  Anything less than that and you're most likely doing down to the next division.

You might have spotted a couple of data points 1992-93 and 1994-5 above these lines.  That's what's called 'special cause' or 'assignable cause' variation.  It's a signal that something is different.  How you react to this would be different to how you'd react to common cause variation.  With special causes you'd ask "what's different?" or "why the change?"

In this case, there was a change.  For the first four seasons on the chart, the league was made up of 22 teams.  After then, it goes down to 20.  From 1995-96 onwards, teams are playing fewer games and therefore accumulating fewer points.

To show the change, the chart should really look like this:

















Here are some handy 'rules' for spotting other signals in control charts.  This is where the average line becomes particularly useful.


Should football clubs set themselves a points target?


Probably not!  When you set a target, making the number becomes the focus, rather than doing the right thing in this case, the right thing for the football club and its fans.  It can also encourage a behaviour where people ease off when the targets is met, or looks like being met.  

In the 2012-13 season, Barnet Football Club were relegated from their division.  Their then manager, Edgar Davids was quoted as saying

"It's even more disappointing because we have reached all the objectives that the chairman set and reached the 51 points target but we've still gone down."

They would have probably have fared better if they'd focused on winning as many games as possible, rather than achieving an arbitrary number.


How does this relate to work?


This is all well and good when looking at sports league tables, but this blog is supposed to be about work and improving services.  With that in mind, there are some lessons we can take away from this post.

1. Be suspicious when people quote a single number often the average.  For example, I've seen the average time it takes to process benefit claims become the only figure used.  It was about 17 days.  Managers thought this was good performance.  Customers and stakeholders were given this figure, and they came to expect that's how long they'd be waiting for.   But a control chart revealed the predictable variation to be anywhere between 0 and over 100 days.  The average alone tells you almost nothing about performance.

2. Be careful not to confuse common cause with special cause variation.  I was once at a meeting where a department's figures were all 'red' because they were worse than the previous month.  The manager was asked to go away, investigate what had happened, and write a report to bring to next month's meeting.  This was a complete waste of time.  I put the data in to a control chart, and it was just normal common cause variation.  The senior managers had unwittingly reacted to is as though it was a special cause.  The next month they were back to 'green' possibly because of regression toward the mean.

3. Comparing just two data points tells you almost nothing certainly not about variation.  Although not covered in the above example, we see performance reports that compare this month to last month, now to this time last year, etc.  They might have arrows or colours applied, to indicate if performance if 'good' or 'bad' in relation to a target.  People are supposed to make judgements or decisions based on this information, with absolutely no context.  Displaying the same amount of information in a control chart would look like this:

















If you took this chart to a meeting, people would probably laugh at you or tell you to leave.  Yet it's seen as perfectly acceptable to present the same inadequate amount of information in a 'scorecard' or 'dashboard' report.

4.  If in doubt, plot the data in a control chart.  Or at the very least plot it over time.  This post has hopefully made it clear that data has no meaning without context, and that you need a way to separate signals (special causes) from noise (common cause variation).  That's why control charts were invented!

5. Don't use numerical targets.  They're arbitrary and make performance worse.  If you want to know why, have a read of my previous post on the subject.


Further reading