This weekend spells the start of both the English Premier League, Ligue 1 in France and the Dutch top division, Eredivisie. Just like last year, Simon Gleave (@SimonGleave), head of analysis at @InfostradaLive, is collecting league predictions from fans, media and analytics nerds like myself for the Premier League and Eredivisie. I´ve never entered before as I´m relatively new to the analytics scene, but this year I thought it´d be fun to submit a prediction for both leagues.
I´ve never really done projections before, and my formal statistical education hasn´t really started yet, so I basically just threw a simple model together in literally 10 minutes just to have something to enter into the ”competition” that wasn´t just subjective opinion. I´m sure there are some giant flaws in the way I went about things, but I´m not a statistician (yet) so I figured only good things can come out of being public about the model and have people criticize and provide input to my process.
For my Premier League projection, I started outh with using the Expected Goals model created by myself and @stats4footy, very much inspired by the work of Michael Caley (@MC_of_A), and ran a linear regression on the relationship between points earned over a full season and the Expected goal difference of a team that same season, using two seasons worth of data (as I only have Expected goals numbers for the past two seasons). I used the obtained formula to create a projection based solely on last years expected goal difference for all 17 teams that survived last years Premier League.
My next step was to do another regression, this time obtaining a formula for the relationship between ”Shots on target ratio” and points earned over a season, using data from multiple seasons. I used this formula to create a projection based solely on last years shots on target-ratios.
Moving on, I decided to include actual goal difference in the model as well, my reasoning being that some teams seems to consistently over or underperform the measures previously used in the model. I ran a regression on the relationship between goal difference and points (obviously a very strong relationship) and then made another projection based solely on the goal difference teams ended up with last season.
Since I had no expected goals data for the Championship, to project the points for the promoted teams I only used last years goal difference and ”Shots on target ratio”. I also had to adjust the goal difference since the Championship consists of 46 games rather than the 38 games played by the Premier League teams.
To wrap it all up, I decided to give different weights to the three different projections that I had so far. Since actual goal difference probably contains the most variance, I gave least weight to this projection, only 20%, whereas the other two projections was given an equal weighting of 40%. For the former Championship teams I used a 70/30 weighting in favour of SoTR over goal difference. I also used a coefficient to translate the projected points tally to Premier League level, again using a subjectively chosen coefficient that seemed reasonable. Clearly these weightings aren´t scientific by any means as I´ve just subjectively chosen them. Had I ran a multiple regression I´m guessing that actual goal difference would´ve been given too much weight since it´s relationship to points earned is so strong in hindsight, but since the ability of goal difference to predict future goal difference is usually worse than the ability of both ”Shots on target ratios” and ExpG-models to predict future goal difference, I just weighted the manually.
Clearly I could´ve done a lot of things differently. I could have split last seasons numbers into smaller parts of the season and given more weight to the more recent games. I could have ignored actual goal difference completely and just focused on using underlying numbers instead, or I could have given more weight to the actual goal difference and less to the the underlying numbers. I could´ve looked way deeper into how teams promoted from the Championship fare in the Premier League and whether a strong SoTR or goal difference in the Championship translated to a better performance in the Premier League as well. I could´ve ran a shedloads of regressions trying to find a much better and much more scientific formula for predicting points, and I might do that in the future once my statistical skills have improved further, but for now, I was just looking to get a quick prediction together using numbers that I have a decent understanding of. Also, these projections completely ignore the transfers made this summers, as I couldn´t figure out a way of implementing these in the model. Anyways, this is the projection I ended up with for the Premier League:
Comparing this projection to my subjective views on the forthcoming season, there are some things that I´m quite happy about and some things that I kind of disagree what, although not super strongly.
I do think that the fight for the title will be a three horse race between the teams that my projection actually gives the same amount of points (Arsenal won it on a decimal level). Arsenal had some great underlying numbers last season and have bought a very highly rated goalkeeper in Petr Cech. They have a core of young players that are only getting better by the year, combined with some world class players and some vital experience at the back. I really do think they can challenge for the title this season.
Chelsea were great last year, but haven´t really improved much thus far. Still, I think they´ll be able to compete for the title once again as they have a very strong core and some room for development in Oscar, Willian and Kurt Zouma.
Man City have been getting a lot of stick in other peoples projections (mostly traditional journalists), for some reason that is beyond me. Yes the squad is aging and yes Yaya Toure didn´t have the best of seasons last year but it wasn´t horrible by any means either, it just keeps getting compared to title winning season 13/14 when Toures finishing reached completely unsustainable levels. Man City had great underlying numbers in 14/15, and now they´ve added some amazing talent in Raheem Sterling that also adresses the age issue as well as the lack of home grown players. Fabian Delph is another home grown talent in or around his prime that will add depth if nothing else. I´d be very surprised to se City fall out the top 3, and shocked if they somehow managed to finished outside the top 4 like some experts have suggested.
Another noteworthy projection is Southampton to finish fourth. I´d like to think that Man United will finish ahead of them, and (unfortunately) I do think that Liverpool will as well. However, I might be a little biased here, both because of my Man Utd fandom (even though I try to stay objective to them as well) but also because of the big names that are United and Liverpool, and how they are ”supposed” to finish ahead of tiny Southampton that´s on a shoe string budget in comparison, especially to United. You can argue that the loss of Schneiderlin, Clyne and Alderweireld will be hard to make up for (heard that one before? Like one year ago exactly?), but the fact remains though: Southampton had some truly impressive underlying numbers last year, and if they can be repeated, they could very well be able to fight for the Champions League spots again. Subjectively, I have them finishing sixth this season.
The projection have Tottenham falling of a cliff and finishing 10th. This is solely due to them having some very mediocre underlying numbers last season, which have been written about my many. I wrote about it in Swedish a few months back, and James Yorke (@jair1970) wrote about it on Statsbomb around the same time. Maybe they wont fall all the way down to 10th, but I really do think that they´ll have to improve A LOT to remain the top 6.
I also have Swansea projected to finish 14th, six places below their 8th place finish from last season. This also due to them posting some very weak (like relegation candidates weak) underlying numbers, regardless of whether you are looking at shot ratios or expected goal difference. Bobby Gardiner (@BobbyGardiner) have written some great stuff on Statsbomb, looking more in depth at Swansea specifically. Read his articles by clicking here and here. My subjective opinion= Not relegated but not in the top 10 either.
And finally: West Bromwich. Numbers really dislike Tony Pulis. I remain puzzled.
So there you have it, my Premier League projection. As you might have noticed, the projected winners sit at 78 points, 9 points less than Chelsea managed to scrape together during their title winning campaign in 14/15. Watford at 20th place is predicted to score 37 points, 7 points more than last years last place finishers Queens Park Rangers, meaning that the range from first to last in the projection is 41 points, whereas the difference between first and last during the 2014/2015 season was 57 points. This is natural in a model, as a team that does really well or really badly usually does so as a consequence not only of skill (or lack thereof), but also a fair bit of variance, which can´t be predicted beforehand.
For the Eredivisie I used the exact same procedure, but using historical Eredivisie data instead of Premier League data when running the regressions that gave me the formulas for projecting points.
I have no subjective opinions whatsoever on the Eredivisie as I saw a total of two games from that league last season, so I´ll just leave my projection here for people to make fun of!
Thats it! I hope this was readable. If there are some grammatical errors or weird expressions in there: Don´t blame me! I´m Swedish.