Oraclum Intelligence Systems Ltd is a non-partisan start-up interested in experimental testing of forecasting models on real-life electoral data. We aim to use a Facebook survey of UK voters, along with our unique set of Bayesian forecasting methods to try and pick out the best and most precise prediction method in lieu of the upcoming Brexit referendum. We wish to uncover a successful prediction method using the power of social networks. After the Brexit referendum, we will apply the same methods on the forthcoming US Presidential elections in November 2016.
Opinion pollsters in the UK came under a fierce line of attack from the public and the media following their joint failure to accurately predict the results of the 2015 UK general election. Months and weeks before the May election the polls were predicting a hung parliament and a virtual tie between Conservatives and Labour, where the outcome would have been another coalition government or even a minority government (a number of combinations were discussed, even the grand coalition between Labour and Conservatives).
The results showed that the pollsters, on average, missed the difference between the two parties by 6.8%, which translated into about 100 seats. What was supposed to be one of the closest elections in British history turned out to be a landslide victory for the Conservatives.
Naturally, inquiries were made, accusing pollsters of complacency, herding, and deliberate manipulation of their samples. And while there was certainly something that went wrong in the sampling methods of the pollsters, we will not go into too much detail as to what that was.
In fact we wish to vindicate some pollsters by offering, for the first time in the UK, an unbiased ranking of UK pollsters.
And here it is:
Our rankings are based on a somewhat technical but still easy to understand methodological approach summarized in detail in the text below. It has its drawbacks, which is why we welcome all comments, suggestions and criticism. We will periodically update our ranking, our method, and hopefully include even more data (local and national), all with the goal of producing a standardized, unbiased overview into the performance of opinion pollsters in the UK.
We hope that our rankings stir a positive discussion on the quality of opinion pollsters in the UK, and we welcome and encourage the usage of our rankings data to other scientists, journalists, forecasters, and forecasting enthusiasts.
Note also that in the ranking list we omit the British Election Study (BES), which uses a far better methodology than other pollsters – a face-to-face random sample survey (the gist of it is that they randomly select eligible voters to get a representative sample of the UK population, and then they repeatedly contact those people to do the survey; you can read more about it here). This has enabled them to give out one of the most precise predictions of the 2015 general election (they gave the Conservatives an 8% margin of victory). However there is a problem – the survey has been (and usually is) done after the elections, meaning that it cannot be used as a prediction tool. Because of this, instead of grouping it with the others we use the BES only as a post-election benchmark.
Methodology and motivation
Our main forecasting method to be applied during the course of the Brexit referendum campaign, the Bayesian Adjusted Facebook Survey (BAFS), will be additionally tested using a series of benchmarks. The most precise benchmark that we attempt to use is the Adjusted polling average (APA) method. In fact, the main motivation for our own ranking of pollsters in the UK is to complement this particular method. As emphasized in the previous post our APA benchmark adjusts all current Brexit referendum polls not only with respect to timing and sample size, but also with respect to its relative performance and past accuracy. We formulate a joint weight of timing (the more recent the poll, the greater the weight), sample size (the greater the sample size, the greater the weight), whether the poll was done online or via telephone, and the ranking for each poll, allowing us to calculate the final weighted average across all polls in a given time frame (which is in this case since the beginning of 2016).
The weighted average calculation gives us the percentages for Remain (currently around 43%), Leave (currently around 41%), and undecided (around 15%). To get the final numbers which we report in our APA benchmark, we factor in the undecided votes as well.
How do we produce our rankings?
The rankings are based on past performance of pollsters for three earlier elections, the 2015 general election, the 2014 Scottish referendum, and the 2010 general elections. In total we observed 480 polls from 15 pollsters (not all of which participated in all three elections). We realize the sample could have been bigger by including local and previous general elections, however given that many pollsters from 10 years ago don’t produce polls anymore (while the majority of those operating in 2015 still produce them now for the Brexit referendum), and given that local elections are quite specific, we focus only on these three national elections. We admit that the sample should be bigger and will think about including the local polling outcomes, adjusted for their type. There is also the issue of methodological standard of each pollster which we don’t take into account, as we are only interested in the relative performance each pollster had in the previous elections.
Given that almost all the pollsters failed to predict the outcome of the 2015 general election, we look at the performance between pollsters as well, in order to avoid penalizing them too much for this failure. If no one saw it coming, they are all equally excused, to a certain extent. If however a few did predict correctly, the penalization against all others is more significant. We therefore jointly adjust the within accuracy (the accuracy of an individual pollster with respect to the final outcome) and the between accuracy (the accuracy of an individual pollster with respect to the accuracy of the group).
- Within accuracy
To calculate the precision of pollsters in earlier elections we again have to assign weights for timing and sample size, in the same way as earlier described (older polls are less important, greater sample size is more important). Both of these factors are then summed up into the total weight for a given poll across all pollsters. We then take each individual pollster and calculate its weighted average (as before, this is the sum of the product of all its polls and their sample and timing weights, divided by the sum of all weights – see footnote ). By doing so we can calculate the average error each pollster made in a given election. This is done for all three elections in our sample allowing us to calculate their within accuracy for each election. We calculate the average error for an individual pollster as the simple deviation between the weighted average polling result and the actual result for the margin between the first two parties in the elections (e.g. Conservatives and Labour). Or in plain English, how well they guessed the difference between the winner and the runner-up.
- Between accuracy
After determining our within index, we estimate the accuracy between pollsters (by how much they beat each other) and sum them both into a single accuracy index. To do this we first calculate the average error for all pollsters during a single election. We then simply subtract the joint error from each individual error. This represents our between index: the greater the value, the better the pollster did against all others (note: the value can be negative).
- Joint within-between ranking
To get our joint within-between index we simply sum up the two, thereby lowering the penalization across all pollsters if and when all of them missed. In this case those who missed less than others get a higher value improving their overall performance and ranking them higher on the scale.
We repeat the same procedure across all three elections and produce two final measures of accuracy. The first is the final weighting index (which we use for the ranking itself and whose values we use as an input in the Brexit polls), and the second is the precision index. The difference between the two is that the precision index does not factor in the number of elections, whereas the final index does. The precision index is thus the simple average of the within-between indices, while the final index is the sum of all three divided by the total number of elections we observed regardless of how many of them the pollster participated in. The two are the same if a pollster participated in all three elections, but they differ if the pollster participated in less than three elections.
For example, consider the fourth ranked SurveyMonkey. They have the highest precision grade because they were the only ones in the 2015 election to predict the result almost perfectly (a 6% margin Conservative victory). However since they only participated in a single election, they do not come up on top in the final weighting index. Pollsters that operated across all three elections give us a possibility to measure their consistency, a luxury we do not have for single-election players.
In other words perhaps SurveyMonkey was just lucky, particularly since they only conducted a single survey prior to that election. However, given that the survey was done in the week prior to election day (from April 30th to May 6th; election day was May 7th) and given that it had over 18,000 respondents, our guess is that it was not all down to luck. Either way given that their entry to the race was late and a one-off shot (similar to our BAFS effort actually), if or when they do produce their estimate for Brexit one day prior to the referendum, we will surely factor them in and give them a high weight. Not as high as their precision index suggests, but high enough. The same is with several other pollsters that were operational over the course of a single election, meaning that they got a lower weight overall, regardless of their single-election accuracy.
To conclude, the numbers reported under the final weighting index column represent the ranking weight that we talked about in the beginning of this text. Combined with the timing and sample size weights, it helps us calculate the final weighted average of all polls thereby helping us configure our strongest benchmark, the adjusted polling average.
 The rankings that we report here will not be a part of our BAFS method.
 Calculated as ∑xiwi / ∑wi, where xi is an individual poll and wi the corresponding weight. wi is calculated as the sum of three weights, for timing (using an adjusted exponential decay formula, decreasing from 4 to 0, where half-life is defined by t1/2 = τ ln(2) ), for sample size (N/1000), and the ranking weight (described in the text).
 Define xi as the difference between total predicted vote share of party A (vA) and party B (vB) for pollster i, and y as the difference between the actual vote share of the two parties. Assume A was the winner, and B was the runner-up. The within accuracy of pollster i (zi) is then defined simply as zi = |xi – y|. The closer the value of zi is to 0, the more accurate the pollster. From this we calculate the within index as Iw = (10 – zi).