Who would be the most likely champion if leagues had not been interrupted?
Will 2020 be the year that Lazio breaks Juventus hegemony? Can Liverpool end a 30-year streak without winning the Premier League? Should the two-point advantage for Barcelona be enough to take the La Liga trophy to Camp Nou, or will this be the year for the Madridistas to win their first league title in the post-Cristiano Ronaldo era?
2020 has been an atypical year, due to the Covid-19 pandemic that has infected more than 6 M people worldwide (up until June 3rd, 2020). Activities of every kind were affected by the lockdown, and soccer leagues were no different, with every country in Europe but one (Belarus) interrupting the national leagues to comply with the OMS recommendations.
While several leagues are trying to adapt to the new reality by completing the season behind closed doors, many were already prematurely finished by national governments (e.g., France and Netherlands).
In any case, several questions emerge for leagues that are already cancelled or that may not be able to complete all matches:
Who should be coronated as the league champion?
Which teams should access the Champions League next season?
Which teams should be relegated?
Methods for setting the final standings come in all shapes and sizes, such as (1) assuming the current standing (at the moment of league interruption) or (2) assuming the rank at the end of first round. Although these methods have their own advantages, they are far from being unanimous, as they do not account for the difficulty each team would face on the remaining matches of the season.
We firstly decided to test five simple methods, encompassing two different approaches: (1) defining the final standings as a replica of previous moments in the past (e.g., the moment of interruption, the end of the first round) or (2) predicting the outcome of each remaining game using basic rules (e.g., home team always wins, leading team always wins). To analyze the accuracy of these methods, we applied them to the past three seasons (mimicking this season’s interruption) and then compared the outcomes to the real final standings, considering the six top European leagues (Premier League, La Liga, Bundesliga, Serie A, Ligue1 and Primeira Liga).
Historically, one of the simple methods fares better than the rest: considering the standings immediately before interruption. But is there a more sophisticated and ultimately better way to anticipate the league’s final standings? Or even to get more detailed predictions such as which teams would qualify for the following year’s Champions League?
Advanced analytical techniques, especially predictive models, can help solve this conundrum.
While this kind of techniques are already part of the backbone in sports like baseball and basketball, their application in soccer has been less frequent.
To break this trend, we developed a state-of-the-art Machine Learning (ML) model, that recognizes patterns in past data to accurately predict future outcomes. Our model learned from real match data from season 2007/2008 to season 2015/2016. We considered around 100 variables as match predictors, such as the previous ranking of both teams, their compared momentum (winning or losing streak), and historical head-to-head results. The model combines all gathered information (for matches already played) to predict the outcome of a given future match, thus enabling a simulation loop that mimics the final stages of the season to predict the league’s final standings.
When looking at the past three seasons to compare the ML model’s accuracy with the results of the best-performing simple method, we were able to confirm that predictive analytics can be a true help in finding the answers for intricate forecasting problems.
In this case, simpler techniques can provide a transparent and effective way to anticipate aggregated results, such as Champions league access (as it does not matter, for top leagues, if a given team finishes in first or second place).
Nevertheless, when trying to obtain more granular answers, such as which team will be the champion, the combination of a ML algorithm with a simulator is able to provide more accurate answers.
If the Covid-19 pandemic had not disrupted the current season, the ML model anticipates that Barcelona, Bayern and Juventus would be crowned champions, besides obvious victories from Liverpool and Paris SG. Out of the six top European leagues, competition was stiffer in the Portuguese top flight, where both Porto and Benfica had clear chances of winning the title. Will the pandemic, by breaking the season’s momentum and reducing the home field advantage, distort the expected final standings?
Considering the predictive power of advanced analytics, one can now wonder if Gary Lineker would change his mythical saying to: “Football is a simple game. Twenty-two men chase a ball for 90 minutes and a machine learning model can anticipate which team will win.”