With a disappointing end to the tournament (for us England fans anyway!) here is the final conclusions of how the model performed.
Unfortunately a mixture of work commitments, and games coming thick and fast, I was unable to keep this website up-to-date with my predictions for each fixture, but I did carry on doing them offline.
(As a reminder: the model description is here and the other models were compare against are here.)
The Round of 16 gave us some truly surprising results, with every single model (including the bookies!) doing (a lot) worse than our Lazy prediction of 33% for Home, Draw and Away. This then seemed to be a sign of what was to come, the semifinal and final was worse than Lazy for most of the tested models. It made for an exciting tournament but a disaster for predictions!
The final performance analysis of the models was:
Model | Brier Score |
HAL 9000 | 0.586 |
Bookies | 0.588 |
Average | 0.609 |
BDC | 0.655 |
Lazy | 0.667 |
SSC | 0.728 |
Immediate things to note are that HAL won (just)! Clearly that is a great result. It is interesting though that SSC (small “smart” crowd) ended up doing much worse than Lazy, which is terrible! I would have expected that to do much better.
The poor performance likely comes from the fact that there were a number of “perfect” wrong answers. By that I mean that the model (my friends) all predicted something would definitely happen (e.g. England to beat Denmark after 90 minutes). That gave it a probability of 100%, when it didn’t happen, the maximum penalty of 2.0 was assigned.
Nothing in football is ever going to be 100% certain (the most one sided result that the bookies predicted was a Germany win against Hungary in the group stage [80.7%] – it ended up as a draw!). If we set a rule that SSC can never predict more than an 80% result then the Brier Score drops to 0.627 and ends up beating BDC and Lazy – more like I would have expected! Making SSC’s maximum prediction 60% further drops the Brier Score to 0.577 – making it the winning model! However this is likely because there was a large number of surprising results in the tournament, so this is really only a post-tournament model, and not a fair assessment. Setting the maximum to 80% (matching the Bookies maximum value) seems like a fair compromise though!
Model | Brier Score |
HAL 9000 | 0.586 |
Bookies | 0.588 |
Average | 0.609 |
SSC | 0.627 |
BDC | 0.655 |
Lazy | 0.667 |
Anyway, a great result for HAL and I’ll look forward to getting the model back out for next unprecedented international tournament, a December World Cup!