Last year, before the Brexit referendum, we offered the first unbiased ranking of UK pollsters. This was our effort to vindicate some in the polling business during a time of widespread public anger over their failure to correctly predict the outcome of the 2015 UK general election (despite the fact that making predictions is not really the job of pollsters). The Brexit referendum was another big miss, but the public outcry against them was lower this time, most likely because people already lost faith in the polls and were expecting them to go wrong, while on the other they were too busy either gloating or sobbing following the referendum results.
The situation in the US didn’t help. When virtually everyone was predicting a clear Hilary victory (everyone except us, of course!), Trump’s victory undermined faith even in the best in the business. And while there was certainly something that went wrong in the sampling methods of the pollsters, the same error that translated into various prediction models, we will not go into too much detail as to what that was.
Instead we offered our own methodology to track how good (or bad) the polls can be, and in order to offer a score of pollsters that can be used to adjust their current numbers before the 2017 general elections. We will use these numbers to construct our adjusted polling average, the benchmark we usually use to compare ourselves against (e.g. see the benchmark for Brexit, or the same benchmark for the US). As you might see the adjusted polling average was wrong in both cases (unlike our predictions), which goes to show that even when adjusting for the usual polling bias, the cases of Brexit and Trump were very specific. There was a systematic bias and underestimation of both Brexit and Trump in the polls. This is why all the poll-based forecasts went wrong.
Anyway, here is the ranking list of 15 selected UK pollsters. Before making any judgements, please do read the text below explaining our methodology. We accept all suggestions and criticism.
How do we produce our rankings?
The rankings are based on past performance of pollsters for four earlier elections, the 2016 Brexit referendum, the 2015 general election, the 2014 Scottish referendum, and the 2010 general elections. In total we observed over 500 polls from 15 pollsters (not all of which participated in all three elections). We realize the sample could have been bigger by including local and previous general elections, however given that many pollsters from 10 years ago don’t produce polls anymore, and given that local elections are quite specific, we focus only on these four national elections thus far. We admit that the sample should be bigger and will think about including the local polling outcomes, adjusted for their type. There is also the issue of methodological standard of each pollster which we don’t take into account, as we are only interested in the relative performance each pollster had in the previous elections.
Given that almost all the pollsters failed to predict the outcome of the 2015 general election and Brexit, we look at the performance between pollsters as well, in order to avoid penalizing them too much for these failures. If no one saw it coming, they are all equally excused, to a certain extent. If however a few did predict correctly, the penalization against all others is more significant. We therefore jointly adjust the within accuracy (the accuracy of an individual pollster with respect to the final outcome) and the between accuracy (the accuracy of an individual pollster with respect to the accuracy of the group).
- Within accuracy
To calculate the precision of pollsters in earlier elections we again have to assign weights for timing and sample size, in the same way as earlier described (older polls are less important, greater sample size is more important). Both of these factors are then summed up into the total weight for a given poll across all pollsters. We then take each individual pollster and calculate its weighted average (this is the sum of the product of all its polls and their sample and timing weights, divided by the sum of all weights – see footnote ). By doing so we can calculate the average error each pollster made in a given election. This is done for all three elections in our sample allowing us to calculate their within accuracy for each election. We calculate the average error for an individual pollster as the simple deviation between the weighted average polling result and the actual result for the margin between the first two parties in the elections (e.g. Conservatives and Labour). Or in plain English, how well they guessed the difference between the winner and the runner-up.
- Between accuracy
After determining our within index, we estimate the accuracy between pollsters (by how much they beat each other) and sum them both into a single accuracy index. To do this we first calculate the average error for all pollsters during a single election. We then simply subtract the joint error from each individual error. This represents our between index: the greater the value, the better the pollster did against all others (note: the value can be negative).
- Joint within-between ranking
To get our joint within-between index we simply sum up the two, thereby lowering the penalization across all pollsters if and when all of them missed. In this case those who missed less than others get a higher value improving their overall performance and ranking them higher on the scale.
We repeat the same procedure across all three elections and produce two final measures of accuracy. The first is the final weighting index (which we use for the ranking itself), and the second is the precision index. The difference between the two is that the precision index does not factor in the number of elections, whereas the final index does. The precision index is thus the simple average of the within-between indices, while the final index is the sum of all three divided by the total number of elections we observed regardless of how many of them the pollster participated in. The two are the same if a pollster participated in all four elections, but they differ if the pollster participated in less than four elections.
For example, consider Lord Ashcroft’s polls. We only have the data on his predictions for the 2015 general election, where he had a relativelly high precision score compared to the rest of the group (6.91 – see  to understand what this number means), but given that this was the only election that we have data for, his overall score is rather low (but note that his precision is higher). Pollsters that operated across all three elections give us a possibility to measure their consistency, a luxury we do not have for single-election players.
To conclude, the numbers reported under the final weighting index column represent the ranking weight that can be used to adjust the polling figures of each pollster in order to make viable predictions from them. Combined with the timing and sample size weights, it helps us calculate the final weighted average of all polls. When we do predictions this is how we calculate our strongest benchmark, the adjusted polling average.
 We formulate a joint weight of timing (the more recent the poll, the greater the weight), sample size (the greater the sample size, the greater the weight), whether the poll was done online or via telephone, and the ranking for each poll, allowing us to calculate the final weighted average across all polls in a given time frame.
 Calculated as ∑xiwi / ∑wi, where xi is an individual poll and wi the corresponding weight. wi is calculated as the sum of three weights, for timing (using an adjusted exponential decay formula, decreasing from 4 to 0, where half-life is defined by t1/2 = τ ln(2) ), for sample size (N/1000), and the ranking weight (described in the text).
 Define xi as the difference between total predicted vote share of party A (vA) and party B (vB) for pollster i, and y as the difference between the actual vote share of the two parties. Assume A was the winner, and B was the runner-up. The within accuracy of pollster i (zi) is then defined simply as zi = |xi – y|. The closer the value of zi is to 0, the more accurate the pollster. From this we calculate the within index as Iw = (10 – zi).