By Mile Šikić, Dejan Vinković, and Vuk Vuković:
As a part of an election coverage project initiated by the biggest daily newspaper in Croatia, Jutarnji list, we were given the opportunity to introduce, for the first time in this part of Europe, a prediction model of general elections which simultaneously uses election polls, previous election results, and a range of socio-economic data for a given electoral district.
Note: Croatia has a proportional electoral system (PR), divided into a total of 10 electoral districts, each electing 14 members of parliament. The votes are calculated into seats using the D’Hondt method for each party that passes the 5% threshold in a given unit.
We built our forecasting model in several phases. First we separated the predictions for the two main coalitions (one led by the conservative HDZ and the other by the social-democrat SDP) from the predictions for the smaller parties. We did this primarily because of the volatility in votes the smaller parties receive and due the fact that in these elections there were a total of ten new parties competing, each having realistic chances to enter Parliament.
The most important part of the analysis was to catch the swing between the two main coalitions since the last parliamentary elections in 2011 where the SDP won by a landslide and the elections for EU parliament held in 2014 where the trend has turned in favor of the HDZ. We placed a greater weight on the more recent elections. We made a distribution of votes for the HDZ and the SDP on the polling station level, which we adjusted towards a smaller or greater share of total votes given the socio-economic trends. In particular we used data on local level unemployment, exposure of the community to the 1991-1995 war for independence, and the educational structure of voters in each electoral district (these three factors carry the greatest weight in predicting voting patterns of domestic voters). Finally we included all the relevant recent polls adjusted for their partisan bias. Once we defined the main parameters of the model we ran a thousand random Monte Carlo simulations for each party for each electoral district (see Figure 1). Each scenario was randomly deviating from the pre-determined parameters which enabled us to calculate the standard deviation for each party.
After estimating the vote share for the HDZ and SDP-led coalitions the next step was to do the same for each smaller party. This was considerably harder since in each district at least 5 parties had a realistic chance (according to various pollsters) to pass the 5% threshold. This is why we applied an estimation method based mainly on opinion polls and previous voting trends for the so-called third options. In Croatia in each election there is a number of new “third options” with an aim to challenge the status quo of the two dominant parties. We found that in each election the distribution of votes for the each new “third option” is quite similar. In other words the smaller parties get their votes from roughly the same geographical areas. It was therefore easy to predict where they might fare quite well on these elections, but not necessarily which party will rise above the rest and what will be the final distribution of votes among the smaller parties. To do this we used all the bias-adjusted polls plus our own Facebook poll, where we relied on our meta-question to determine how good our participants were in estimating the strength of their preferred party. We used simple weighting between our Facebook poll and the other polls to estimate the relative strength among the smaller parties, and hence their number of seats.
Figure 1. An example of 1000 voting scenarios for one party within one electoral district. The
graph show cumulative distribution of voting percentages at the level of polling stations.
Finally, after we performed Monte Carlo simulations to see how the votes might be distributed within the electoral districts, we used this to calculate the probability of each party earning some number of seats. This means that we were not only looking at different scenarios involving the two main parties, but a whole number of combinations where the distribution of votes for the smaller parties was also taken into account with the D’Hondt method.
How good was the model?
The table below shows how precise we were in each electoral district. The first table depicts the probabilities of the actual event occurring. For example, the probability for HDZ’s electoral result in the first district (I) where they got only 4 seats was a mere 0.7%. It was thus hard to predict the scope of their failure in this district. On the other hand the probability for SDP’s electoral result was usually the highest probability for each district, except the last two. In general, the prediction for SDP was very precise (within two seats), while the prediction for HDZ was overshooting in most districts. The reason was the abrupt and unexpected rise of the third party – Most – founded only a few months before the elections which emerged as a complete dark horse and took a total of 19 seats out of 140. None of the polls were able to predict the rise of Most, so it was therefore a complete fat tail event.
Table 1. Probabilities of the actual event occurring for each party across all districts.
In the set of tables below we show the probability distribution for each party in every district. The red box represents the actual electoral result (in seats) for each party and its corresponding probability, the dark grey is the highest probability predicted by the model that the party would get, while the light grey color is the lowest. Some parties are not shown in each district as they were only running on a local level (like IDS, HDSSB, or REFORM).
We also found out that our Facebook poll, after we utilized the meta-question for mathematically filtering out internal biases, was particularly good at predicting the actual voting outcome (see Figure 2 below), correct within 4% of the actual results. The reason for this was our carefully designed meta-question which we used to uncover the predictive power of our participants. We did not give a high enough weight to our Facebook poll in the model, but now we can correct it to make it even more precise in the future.
Table 2. Probability distribution for each party in every electoral district.
Figure 2. Comparison of our Facebook poll results and the actual election results for the first three parties.