xG And Analytics Under Amorim

Okay! We have enough of a sample size I think and yet .. we remain a somewhat confusing side statistically. The hard fixtures we've had explains the bad stats in some areas to some extent. I don't know if people want to see some really confusing stats but I encourage you to play around with tools like:

[1] https://markstats.club/
[2] https://theanalyst.com/football/team/scm-1/manchester-united (see the Zones of control)
[3] https://theanalyst.com/football/team/scm-1/manchester-united/stats (see the "team" section that compares premier league clubs)

There's enough to be positive about but so far the stats aren't UCL level across the board.
 
Just want to note that the definition of these statistics, in the case of football here, still involves subjectivity that renders the meaningful of the statistic to be one fairly up for debate.

Of course, but their supposed high level of subjectivity has to be countered by research and data. A shot header isn't "obviously" worth more xG because "i" think so. And the model isn't "deeply flawed" because "i" believe we dominated a game. Clubs actually pay good money to have people who can read into these stats and interpret them in a beneficial way. This alone shows that the people who make the calls value them enough.
 
Where is the subjectivity?
I'm not articulate enough to explain it. So here's Gemini's explanation which goes well beyond what all I was thinking to be honest...

While they are presented as objective statistics, there is significant subjectivity "under the hood" in three main areas: how the models are built, what data is used to train them, and how the data is collected.

Here is a breakdown of where that subjectivity lies, moving from the most obvious to the more technical.

1. The Model Itself: What Defines a "Chance"?

The biggest source of subjectivity is that there is no single, universal xG formula. Every data provider (like StatsBomb, Opta, Wyscout, etc.) builds its own proprietary model. The value they assign to a shot depends entirely on what factors they subjectively decide are important.

A basic xG model might only include a few factors:
Shot Location: The distance and angle to the goal.
Body Part: Was it a shot with the foot or a header?
Type of Assist: Was it from a through-ball, a cross, or a set piece?

A more advanced model, like StatsBomb's, introduces many more variables that get closer to defining the "clarity of a chance":
Defender Positions: How many defenders are between the shooter and the goal?
Goalkeeper Position: Is the keeper out of position?
Pressure on Shooter: Was the shot taken under duress?
Ball Height: Was the ball in the air (e.g., a volley)?
The Subjectivity: One company's model might weigh "pressure on shooter" very heavily, while another might not track it at all. This is why the same game can have different xG scores on different websites. The creator of the model decides what defines a "good chance."

2. The Data Collection: The "Big Chance" Dilemma
This directly addresses your question about "clarity." Some models are (or were) built using data that was manually labeled by a human.
For example, a data analyst might have watched thousands of shots and tagged some as a "Big Chance." This introduces massive human subjectivity.

Hindsight Bias: If a player skies an easy-looking shot, the analyst might subconsciously be less likely to label it a "Big Chance" than if the player had scored. This taints the data before it even gets to the model.
Inconsistent Definition: What one analyst considers a "clear opportunity" another might not.

Most high-end models now try to remove this human-labeling step and rely only on the objective event data (like player coordinates), but the "Big Chance" concept is still a subjective one that plagues older models and public discussion.

3. The Training Data: What is an "Average" Shot?
An xG model works by looking at a massive database of historical shots (often 100,000+) and calculating the probability of a goal. For example, it might find that a shot from the penalty spot with no pressure (a penalty kick) goes in 76% of the time, so its xG value is 0.76.

But the dataset itself is a subjective choice.
Which Leagues? Is the model trained on data from the top 5 European leagues, or does it include 25 different leagues? A shot from a specific position might be more likely to be scored in the Premier League than in a lower-level league. The model's "average" is skewed by the data it's fed.
Which Players? If a model's training data is disproportionately filled with "great finishers" (like Lionel Messi or Erling Haaland), it might over-estimate the xG for a given chance, because its historical "average" is based on elite players.


4. What xG Doesn't Measure (A Subjective Limitation)
Finally, a key point of contention is what xG ignores. An xG value is only generated when a shot is taken.
This means a brilliant attacking move—a perfect pass that cuts open the entire defense—results in 0.00 xG if the striker miscontrols the ball and fails to even get a shot off.

Many fans and pundits find this limitation to be the model's biggest flaw. It measures shot quality but not necessarily chance creation quality, which is a subtle but critical difference.
 
And it is factual, as they take a huge amount of attempts from the same position to come up with their numbers.

You're ignoring the fact that its a modal based on a large dataset. Its factual insofar as those are the facts based on hundreds or thousands of datapoints but its not factual when it comes to quantifying a single datapoint.

Lets say as a very simple example you have 5 players who take a shot from the same location.

1 of them scores and the other 4 miss. They quantify that chance as 0.2xG.

They are all from the edge of the 6 yard box. The one that went in was a tap in from a cross and the keeper was the only player to beat. The ball was beautifully weighted and it was a tap in. The other 4 chances had the box full of defenders, players pressuring the attacker etc.

Each one of those shots goes down as 0.2 xG towards the overall xG for the game.

If you took watched of those 5 changes with your eyeballs you would say that the one that was scored was far higher than 0.2xG and the other 4 were far lower. In fact if you took out the single goal from the dataset the xG would be completely gone.

I know this is a very simplified example and the larger the dataset the more reliable the data becomes but fundamentally all xG is saying is "in this position, you would expect the attacking player to score x% of the time". Many of them are not taking anything specific into account. Its not taking into account whether its Harry Kane on the end of it or Vinnie Jones. Its not taking into account whether there is a clear path to goal and no one else around or whether there are 20 players in the box. Its not taking into account whether the player is using their favoured foot. If there are defenders in the way is the attacker positioned to curl it around them or are they limited to trying to pinball it through.

Clubs won't be looking at xG and saying "you should have scored this" as they watch a clip of a player shooting inside a congested 6 yard box with no path to goal. They will be using it (along with numerous other stats) to direct them towards what actually can be improved. xG is a data point. Its not an objective measure of the quality of an isolated chance.
 
You're ignoring the fact that its a modal based on a large dataset. Its factual insofar as those are the facts based on hundreds or thousands of datapoints but its not factual when it comes to quantifying a single datapoint.

Lets say as a very simple example you have 5 players who take a shot from the same location.

1 of them scores and the other 4 miss. They quantify that chance as 0.2xG.

They are all from the edge of the 6 yard box. The one that went in was a tap in from a cross and the keeper was the only player to beat. The ball was beautifully weighted and it was a tap in. The other 4 chances had the box full of defenders, players pressuring the attacker etc.

Each one of those shots goes down as 0.2 xG towards the overall xG for the game.

If you took watched of those 5 changes with your eyeballs you would say that the one that was scored was far higher than 0.2xG and the other 4 were far lower. In fact if you took out the single goal from the dataset the xG would be completely gone.

I know this is a very simplified example and the larger the dataset the more reliable the data becomes but fundamentally all xG is saying is "in this position, you would expect the attacking player to score x% of the time". Its not taking anything specific into account. Its not taking into account whether its Harry Kane on the end of it or Vinnie Jones. Its not taking into account whether there is a clear path to goal and no one else around or whether there are 20 players in the box. Its not taking into account whether the player is using their favoured foot etc.

Clubs won't be looking at xG and saying "you should have scored this" as they watch a clip of a player shooting inside a congested 6 yard box with no path to goal. They will be using it (along with numerous other stats) to direct them towards what actually can be improved. xG is a data point. Its not an objective measure of the quality of an isolated chance.

It's the best thing we've got so far, and it's improving every season. What i take issue with is people trying to downplay its "objectivity" not by bringing their own homework to the table but inserting their own biased hypotheticals. It deals in averages. Takes all the hypotheticals in your example and tries to come up with an average number. It should be cold as a mathematical model and shouldn't care about who takes the chance. This is the nature of statistics. And, believe it or not, this works. Like the other guy saying that if the pass is good but the player mis-controls the ball. Then, "factually", this isn't a (taken) chance, but he wants it to be calculated as one. You can't get more subjective than that. Stats don't work like that. This is for the manager to assess: Does it happen frequently? If yes, how much of a hindrance it can be, etc.

I'll give you a recent example from my local league. Last Sunday, we had a derby. The home team were the favourites, and they play a very aggressive style that relies on winning duels and generating transitions. The visiting side's plan is to wait in a mid/low block and hit on the counter. In the first 20 minutes, they get two such chances. One was squandered because the winger couldn't play the switch-ball that would get a teammate running at the keeper. In the other instance, the pass was good, but the forward (who's in very bad form) couldn't even bring the ball down. In the end, the home team won 2-0, while the visitors ended the game with xG 0.17. Nobody claimed that the result could have been different or that the underlying stats don't tell the true story of the game. The manager of the visiting side argued that his team had a bad game, but the decider was the difference in quality in the final third between the two sides. Which is a valid assessment supported by the stats.

The teams have experts who can interpret the stats in the best possible way. And, of course, the eye-test matters. Not so much for us fans, for whom it's often a buzzword to confirm personal biases as much as the underlying stats are, but for the managers who are, first and foremost, visual thinkers.
 

Its objective insofar as its based on statistics but as we know, statistics can be up for interpretation and the numerous different xG systems tell us that. Eventually I imagine that we truly will be able to assign an very accurate xG to every shot. We will eventually get to the point where we are tracking the limbs of every play on the pitch at all times and we can accurately replay goals/shots back based on this data and essentially say "this player was on their right foot with the ball in a good position 10 yards from goal, 83% of the goal was unblocked and the keeper can't see the ball until X position therefore he can usually only cover Y % of the remaining space therefore the xG is...".

I broadly believe in xG but I tend to take it as a far more valuable metric when you have a number of games to look at or even a season.

When liverpool battered us a few seasons back I think their xG at the time was 2.74 or something and they tonked us by 7 goals. Anyone who watched won't have come away thinking that Liverpool should have only put 2-3 past us. I think there was another game we drew with them after that and the xG was very similar to that number and they scored perhaps 2 goals. Again, the eye test would tell you that it was far more accurate that time.
 
Its objective insofar as its based on statistics but as we know, statistics can be up for interpretation and the numerous different xG systems tell us that. Eventually I imagine that we truly will be able to assign an very accurate xG to every shot. We will eventually get to the point where we are tracking the limbs of every play on the pitch at all times and we can accurately replay goals/shots back based on this data and essentially say "this player was on their right foot with the ball in a good position 10 yards from goal, 83% of the goal was unblocked and the keeper can't see the ball until X position therefore he can usually only cover Y % of the remaining space therefore the xG is...".

I broadly believe in xG but I tend to take it as a far more valuable metric when you have a number of games to look at or even a season.

When liverpool battered us a few seasons back I think their xG at the time was 2.74 or something and they tonked us by 7 goals. Anyone who watched won't have come away thinking that Liverpool should have only put 2-3 past us. I think there was another game we drew with them after that and the xG was very similar to that number and they scored perhaps 2 goals. Again, the eye test would tell you that it was far more accurate that time.
This sounds as if you don't really understand what the xG values tell us and what they should be used for. if you are arguing with xG numbers from one provider against a person who argues with xG numbers from another provider, than you are right - you've found an issue. but nobody who understands numbers would do that. The objective element comes from applying one model to your data set - not one model for the shots from Bundesliga, added with a slightly different model for the shots from La liga. So even if you are dealing with a "dumb" model that for example only takes position of the shot into account, it is still as objective as it gets since it won't do anything else then take the position of a given shot, find the x amount of shots taken from the same spot in the data set, then check how many have been scored and there you go.

You can argue as much as you want about how people use the data for or what conclusion they draw from it - but the numbers itself are as objective as it gets.

The things the AI spat out are easily tackled:
1. Multiple Models: Yes, if you mix providers, you get into trouble. but thats not an xG problem but a use of xG problem.
2. Big chance Dilemma: whoever is daft enough to design a data point called a big chance has built in the dilemma himself. But if you define a certain set of clear conditions to what constitutes a big chance, the issue is minimized. Thats a design problem, not an objectivity problem
3. Training data: should be tackled by the provider so see point 1
4. What xG doesn't measure: an absolute legit point raised but once a again a user problem. Nobody in their right mind would look at the amount of tackles a player made in a single game and then start to criticize the tackling-stat because it doesn't tell you about the amount of interceptions. There are definitions written about what xG is and it is only about shots that have been taken. xG isn't the same as "what is the exact amount of danger we produced" although it is probably closer to most other popular metrics.
Most likely there will be the next evolution of numbers where xG is in the same relation to then as what number of shots is to xG these days. Then xG might fade away, because there are now even more useful numbers to look at.

The main issue with xG is the way it is used for by some people.

edit: I absolutely agree, a more useful and straight forward way of using those numbers is across a few games and in comparison with other teams numbers that have to be generated by the same model/provider though.
 
Last edited:
So even if you are dealing with a "dumb" model that for example only takes position of the shot into account, it is still as objective as it gets since it won't do anything else then take the position of a given shot, find the x amount of shots taken from the same spot in the data set, then check how many have been scored and there you go.

No one is arguing that the xG stats aren't objective (I think). We are saying that you can't apply big dataset objective data to an isolated event and call that an objective measure of it. So when you add up a bunch of thees isolated events in a game you get a number but regularly that number doesn't bear that much relevance to the actual quality of the chances its measuring. About the only one that is always reasonably objective is penalties because the variables are so limited.
 
No one is arguing that the xG stats aren't objective (I think). We are saying that you can't apply big dataset objective data to an isolated event and call that an objective measure of it. So when you add up a bunch of thees isolated events in a game you get a number but regularly that number doesn't bear that much relevance to the actual quality of the chances its measuring. About the only one that is always reasonably objective is penalties because the variables are so limited.
But the "variables" are parameters and depend on the model. If you want, you could go crazy with it, add shooters "quality", shooting ability, form, match flow or whatever into the model so you end up with another number. It could become a nightmare to annotate games then though to generate the data it is then applied on. Maybe I just don't get your point though - the questioning of the objectiveness of numbers seems somewhat futile to me. I mean, wouldn't the subjective part here be you thinking that an "xG value" should be higher than it is because you think the quality of chance is higher than say "0.17 xG"? I do think, that the numbers have some relevance - they tell you how often comparable shots to the one that happened in a single game went in in the past. Those numbers added up for a game are definitely an indicator of how productive you were and a pretty objective one at that as well since it reduces bias' pretty effectively due to having defined parameters that are then applied to the entirety of the data. Thats were it overcomes any eyetest - because even if one person could determine the "quality of a chance" better than such a model, this person would never be able to have seen and remembered all games there so it would require more people and more people will then mean differences in evaluation.

xG isn't perfect and it certainly isn't the be all and end all answer of the question "how good was our chance creation in a single given game" but it is the closest you can get these days or do you have anything else in mind?
 
No one is arguing that the xG stats aren't objective (I think). We are saying that you can't apply big dataset objective data to an isolated event and call that an objective measure of it.

It doesn't have to be "objective" to be useful. Reality isn't objective, almost everything except the speed of light is defined relative to an established framework.

So when you add up a bunch of thees isolated events in a game you get a number but regularly that number doesn't bear that much relevance to the actual quality of the chances its measuring.

This is absolutely false though.

About the only one that is always reasonably objective is penalties because the variables are so limited.

I think everyone needs a crash course in statistics and modeling theory.
 
I think everyone needs a crash course in statistics and modeling theory.

Its pretty simple, if your model doesn't explicitly consider the impact of what actually contributes to the final statistic or it can't apply those to an isolated incident then it won't be very accurate for single incidents.

If your model doesn't consider the impact of other players locations, keeper location, whether the ball is bouncing, how fast its coming at the forward etc then how on earth can that model accurately represent a single shot with any level of accuracy unless that shots conditions sit very much in the middle of the range of outside influences. It can't.
 
Its pretty simple, if your model doesn't explicitly consider the impact of what actually contributes to the final statistic or it can't apply those to an isolated incident then it won't be very accurate for single incidents.

If your model doesn't consider the impact of other players locations, keeper location, whether the ball is bouncing, how fast its coming at the forward etc then how on earth can that model accurately represent a single shot with any level of accuracy unless that shots conditions sit very much in the middle of the range of outside influences. It can't.
You're spot on.
 
Its pretty simple, if your model doesn't explicitly consider the impact of what actually contributes to the final statistic or it can't apply those to an isolated incident then it won't be very accurate for single incidents.

But this isn't true. Very simple models that do not consider second order effects are proven in multiple domains to be very accurate for single incidents.

All models are an approximation and none will never exactly replicate reality. However, even simple models when done right can get the form and major functions of reality correct to a degree that the model can be relied on for future decision making.

I know many people question the ability of models to be used to aid understanding of a complex sport like football, and to them I would simply ask them to consider the numerous things in life more complex than 22 men chasing a ball, and how many of these things have been successfully mimicked by a variety of models.

If your model doesn't consider the impact of other players locations, keeper location, whether the ball is bouncing, how fast its coming at the forward etc then how on earth can that model accurately represent a single shot with any level of accuracy unless that shots conditions sit very much in the middle of the range of outside influences. It can't.

That's where you are getting it confused.

Outside of a goal being scored or not, there is no absolute measure of the quality of a shot.

No, this cannot be defined by your average match-going fan, or even an expert.

So the model isn't trying to represent a single shot in the absolute sense.

What the model is trying to do, is represent the shot, in relation to other shots, using the baseline criteria of whether the shot is more likely to go in or not.

And that is how we know what shots are good or not in real life; we don't calculate drag coefficients and the strength in the thigh prior to the shot... Our brains imperfectly compare the shot to other kinds of those shots and take into account the quality of the keeper and the quality of the shooter and so on... But our brains rely heavily on heuristics and memories impacted by trauma and beer and time...

Models (at least the ones we have access to) don't take all variables into account either. However what they do is establish a uniform framework and that allows for the systemic rating of shots, relative to that framework and relative to each other. And because we see strong correlation between these models and actual team performances, we know that although the models are not perfect, they are good enough to be useful.

Yes. Even for single occurrences.

There is this erroneous assumption among football fans that xG is useless until you can aggregate across 10 games. That is very false.

It's a regression model. Every prediction it makes will have a band of certainty around it (which makes comparing a 0.47 xG vs 0.475 xG chance with only the model pointless) but it can be used on a single chance or even game, especially in comparison to the thousands of shots used to train the model.
 
Plus, a model is only as good as it can be used to make decisions.

If someone with a PhD was arsed enough they could include the impact of other variables like wind speed and how fast it is coming and so on in a xG model to push the AUC (measure of accuracy) up a few points. But what purpose would that serve? Fans would still say "xG the game is gone". And coaches can't calibrate players to adapt their shots for wind speed and how fast the ball is coming. There is no xG model that will be good enough to satisfy skeptics who simply don't believe football can be quantified at all

One of the biggest variables they can control is where the player is when he is taking the shot. And whatever else they can control I'm sure will be in models we can't even see on the public end... Because based on those models they will instruct on their players on where to be and where the biggest opportunities based on the data are.
 
Last edited:
It's obviously a very good model. The teams who generally top xG and xPTS over a 38 game season are the sides who top the league. Obviously, you'll get the odd anomaly here and there, but that would be the case with any study. You can question the validity of the model during a randomly selected 90 minute game, but over a longer duration of time, the results tend to back up the data.
 
People need to stop saying this. If a model can be used across 10 games it can also be used for 1 game
It's true though. It's possible that one game may feel a little off. A statistical anomaly. With further data (more games), there's less and less of a hiding place and the data generally reflects what we are seeing on the pitch.
 
I think this thread should just be called the analytics thread to stop people from complaining about xG.
 
It's true though. It's possible that one game may feel a little off. A statistical anomaly. With further data (more games), there's less and less of a hiding place and the data generally reflects what we are seeing on the pitch.

Yes there is more variance around a point prediction than the variance of an aggregate of point predictions.

However this does not mean the model's validity can be questioned when it comes to one game. It is either the model is good or it isn't. There is no such thing as "it's bad for a single game and great for 38 games".

Now what you shouldn't do is take the results of 1 game and use the model to extrapolate to 38 games. But in as much as the model is used to understand what happened in a game and compare it with other games, it can be used in this form.
 
But this isn't true. Very simple models that do not consider second order effects are proven in multiple domains to be very accurate for single incidents.

Have you got some examples of these because I don't believe that very simple models for events that have very complex external factors to them are accurately predicted by a simple model.

That's where you are getting it confused.

Outside of a goal being scored or not, there is no absolute measure of the quality of a shot.

Of course there isn't but you can look at a shot and say "that xG is completely wrong" when it clearly is. You're working too much in absolutes. There are games when the xG seems broadly correct and others where its clearly wrong.

So the model isn't trying to represent a single shot in the absolute sense.

What the model is trying to do, is represent the shot, in relation to other shots, using the baseline criteria of whether the shot is more likely to go in or not.

Which is my point, that model is wrong if it doesn't accurately model it. You seem to be arguing that xG is correct most of the time or at least broadly correct and yet we have competing models that are assigning quite different values to the same shot. The fact this is happening means that it cannot be a purely objective measure. Objective means there is a right and wrong answer.

And that is how we know what shots are good or not in real life; we don't calculate drag coefficients and the strength in the thigh prior to the shot... Our brains imperfectly compare the shot to other kinds of those shots and take into account the quality of the keeper and the quality of the shooter and so on... But our brains rely heavily on heuristics and memories impacted by trauma and beer and time...

Yes we broadly use that but we also use nuance and we take every shot on its own merit far more than thinking "based on the position of this shot he should be scoring". We see that the keeper was perfectly placed. We see that there were 4 players blocking the shot. We see that the ball was played slightly behind the forward. We see that the ball was bobbling. We see that the player got a nudge in the back. We see that it was a right footed player trying to dig a shot out with his left foot. All of these things that are relevant to this singular incident tell us how good the chance was.

Models (at least the ones we have access to) don't take all variables into account either. However what they do is establish a uniform framework and that allows for the systemic rating of shots, relative to that framework and relative to each other. And because we see strong correlation between these models and actual team performances, we know that although the models are not perfect, they are good enough to be useful.

Yes, they are good enough to be useful plenty of the time for single games but far more useful over a larger timespan.

Yes. Even for single occurrences.

There is this erroneous assumption among football fans that xG is useless until you can aggregate across 10 games. That is very false.

There isn't this assumption at all. The assumption is that you cannot take it as gospel. That sometimes it will be almost perfect for a single game and sometimes it will be quite wrong over 5.

As I said earlier, eventually it will be excellent at predicting single events but that will require a massive amount more data to do that. Until that happens it will remain patchy at best.
 
Great talk by the way...

Have you got some examples of these because I don't believe that very simple models for events that have very complex external factors to them are accurately predicted by a simple model.

Newton's laws of motion are a good example. I could pull others from Statistical Mechanics and other technical fields if I dusted off my textbooks.

Modeling of traffic is another example, where you have thousands of independent, (some) stupid drivers on the road... And yet congestion and bottlenecks can be calculated using relatively simple formulae in some cases.

But here I agree with you: sometimes those models miss the mark. But their value is not in getting every prediction right, or matching reality exactly

Of course there isn't but you can look at a shot and say "that xG is completely wrong" when it clearly is. You're working too much in absolutes. There are games when the xG seems broadly correct and others where its clearly wrong.

In how many games would you say the xG scores are absolutely wrong?

I'd wager that in most situations the xG total adequately represents the quality of chances available to both teams. I don't think it works completely well in the aggregate for a game (taking 30 poor shots worth 0.1 xG doesn't necessarily mean the score should be 3-0) but for the majority of situations it works. And that's what a great model does. If we had a model that was fit to every data point it would be useless for further use.

Which is my point, that model is wrong if it doesn't accurately model it. You seem to be arguing that xG is correct most of the time or at least broadly correct and yet we have competing models that are assigning quite different values to the same shot. The fact this is happening means that it cannot be a purely objective measure. Objective means there is a right and wrong answer.

Are you saying the xG models are wrong based on your own perspective of watching games?

You cannot invalidate a model by single instances that are off from reality. Because the essence of a model is that you are trying to distill a very complicated existence into something manageable on paper that can be manipulated, and in the process of that distillation, some things will be left out that will result in occasional slips, but those slips don't mean the model is invalidated.

And just because 2 models approximating the same event don't yield the exact same answer, it doesn't mean they are both wrong. It means that model construction was different, however as long as the assumptions were clearly stated and the methodology for each model is uniformly applied to the dataset, depending on my motivations I may choose to select one or the other. Or both!

Yes we broadly use that but we also use nuance and we take every shot on its own merit far more than thinking "based on the position of this shot he should be scoring". We see that the keeper was perfectly placed. We see that there were 4 players blocking the shot. We see that the ball was played slightly behind the forward. We see that the ball was bobbling. We see that the player got a nudge in the back. We see that it was a right footed player trying to dig a shot out with his left foot. All of these things that are relevant to this singular incident tell us how good the chance was.

Yes but I promise you that you don't do this consistently.

The reason why we create models is because no human being has watched all the games in the PL this season and evaluated every chance with the same level of intensity and analysis between every shot.

And the stakes in football are too high for managers to rely solely on human memories and judgements in determining how to approach football games. But if we have a tool that can be used to gauge chance quality in advance, based on the systemic processing of all the chance data we have access to... It won't be perfect but it will fit the data well enough.

Ok, fine, no xG model can tell me how good that shot was as well as @mctrials23 will, 2 rounds of beer in. However can he also break down to me exactly how much the chance quality would change by if the player changes, or 4 players blocking becomes 3, or Kepa is in goal, or it is raining? And also can he use this model to determine exactly where chances should be created on the pitch? He would rightly tell me to feck off

Yes, they are good enough to be useful plenty of the time for single games but far more useful over a larger timespan.

We need to remember that these models are trained on singular events, shots. If the model is good enough (from a statistical POV), not only are they good enough to be useful for single games, they are good enough to be used for evaluating single shot quality. You would be hard pressed to find a shot scored as xG of 0.7 that would not be a clear chance from the average top flight player.

There isn't this assumption at all. The assumption is that you cannot take it as gospel. That sometimes it will be almost perfect for a single game and sometimes it will be quite wrong over 5.

No one is saying take model outputs as gospel. No one should take the eye test as gospel. No one should take my word as gospel.

And I think if the xG models had these serious flaws in accuracy they would have been highlighted in papers and articles and blogs giving more fuel to the "burn math" crowd (not saying you're one of them ha)

As I said earlier, eventually it will be excellent at predicting single events but that will require a massive amount more data to do that. Until that happens it will remain patchy at best.

How much data would you need? You can get excellent models on as few as 20-30 data points... We have shots data that I'm sure goes into the 10s of thousands. If you can't build a good model on that dataset then give up

That said there is always room for improvement. And I'm sure if we saw what clubs were using behind closed doors we would be blown away...
 
For me the question is not "is xG helpful?" (Seems obvious that it's yes, IMO), but rather "Do key United decision makers (Berrada + Wilcox, in particular) utilize xG and possibly even more advanced stats for their decision making?"

Have to admit that signing the #1 and #2 xG overperformers (Mbeumo and Cunha) in the same off-season shook my faith in the skill of our leadership. The hot start has restored a bit; I'm hoping there was even more advanced analytics the public isn't even privy to that showed Cunha/Mbeumo in a good light, and the analysis wasn't as simple as "they scored a bunch of goals last season".
 
For me the question is not "is xG helpful?" (Seems obvious that it's yes, IMO), but rather "Do key United decision makers (Berrada + Wilcox, in particular) utilize xG and possibly even more advanced stats for their decision making?"

Have to admit that signing the #1 and #2 xG overperformers (Mbeumo and Cunha) in the same off-season shook my faith in the skill of our leadership. The hot start has restored a bit; I'm hoping there was even more advanced analytics the public isn't even privy to that showed Cunha/Mbeumo in a good light, and the analysis wasn't as simple as "they scored a bunch of goals last season".
Why would it?

The top players tend to over perform their xG because xG represents expected chance of scoring for the average player.

But top players are not the average player.

What’s a 0.4 xG shot might actually be a 0.7 xG for someone like Kane.
 
Great talk by the way...



Newton's laws of motion are a good example. I could pull others from Statistical Mechanics and other technical fields if I dusted off my textbooks.

Modeling of traffic is another example, where you have thousands of independent, (some) stupid drivers on the road... And yet congestion and bottlenecks can be calculated using relatively simple formulae in some cases.

But here I agree with you: sometimes those models miss the mark. But their value is not in getting every prediction right, or matching reality exactly



In how many games would you say the xG scores are absolutely wrong?

I'd wager that in most situations the xG total adequately represents the quality of chances available to both teams. I don't think it works completely well in the aggregate for a game (taking 30 poor shots worth 0.1 xG doesn't necessarily mean the score should be 3-0) but for the majority of situations it works. And that's what a great model does. If we had a model that was fit to every data point it would be useless for further use.



Are you saying the xG models are wrong based on your own perspective of watching games?

You cannot invalidate a model by single instances that are off from reality. Because the essence of a model is that you are trying to distill a very complicated existence into something manageable on paper that can be manipulated, and in the process of that distillation, some things will be left out that will result in occasional slips, but those slips don't mean the model is invalidated.

And just because 2 models approximating the same event don't yield the exact same answer, it doesn't mean they are both wrong. It means that model construction was different, however as long as the assumptions were clearly stated and the methodology for each model is uniformly applied to the dataset, depending on my motivations I may choose to select one or the other. Or both!



Yes but I promise you that you don't do this consistently.

The reason why we create models is because no human being has watched all the games in the PL this season and evaluated every chance with the same level of intensity and analysis between every shot.

And the stakes in football are too high for managers to rely solely on human memories and judgements in determining how to approach football games. But if we have a tool that can be used to gauge chance quality in advance, based on the systemic processing of all the chance data we have access to... It won't be perfect but it will fit the data well enough.

Ok, fine, no xG model can tell me how good that shot was as well as @mctrials23 will, 2 rounds of beer in. However can he also break down to me exactly how much the chance quality would change by if the player changes, or 4 players blocking becomes 3, or Kepa is in goal, or it is raining? And also can he use this model to determine exactly where chances should be created on the pitch? He would rightly tell me to feck off



We need to remember that these models are trained on singular events, shots. If the model is good enough (from a statistical POV), not only are they good enough to be useful for single games, they are good enough to be used for evaluating single shot quality. You would be hard pressed to find a shot scored as xG of 0.7 that would not be a clear chance from the average top flight player.



No one is saying take model outputs as gospel. No one should take the eye test as gospel. No one should take my word as gospel.

And I think if the xG models had these serious flaws in accuracy they would have been highlighted in papers and articles and blogs giving more fuel to the "burn math" crowd (not saying you're one of them ha)



How much data would you need? You can get excellent models on as few as 20-30 data points... We have shots data that I'm sure goes into the 10s of thousands. If you can't build a good model on that dataset then give up

That said there is always room for improvement. And I'm sure if we saw what clubs were using behind closed doors we would be blown away...
10s of thousands is not a large amount of data.
 
Why would it?

The top players tend to over perform their xG because xG represents expected chance of scoring for the average player.

But top players are not the average player.

What’s a 0.4 xG shot might actually be a 0.7 xG for someone like Kane.
Don’t have time to find the source right now but I’m pretty sure I’ve read that very few players consistently over perform their xG. That getting xG itself is the biggest predictor of future goalscoring success. That’s not to say Cunha and Mbeumo aren’t both excellent players who, based on the eye test I’d expect to be more likely to repeat xG overperformance than others. But it’s not really as simple as good players always overperforming xG.
 
Why would it?

The top players tend to over perform their xG because xG represents expected chance of scoring for the average player.

But top players are not the average player.

What’s a 0.4 xG shot might actually be a 0.7 xG for someone like Kane.

There is an element of truth to this, which is why most clubs include actual goals scored their models to at least some degree.

But it's only a element of truth, as not only do many top goalscorers not overperform their xG, but those that do often do so to a relatively minor degree.

Both Cunha and Mbeumo overperformed last season to an absolutely unsustainable degree, and nobody with any knowledge in this area would expect that to be representative of what they will do across the rest of their career. If the club thought they would repeat that level and were buying them on that basis, that would be insane.

But I don't think our club thought that, as even in my most pessimtic vision of the quality of our data department they understand that most basic intepretation of xG.

In reality they were likely just satisfied with what they projected Cunha/Mbeumo would provide, to the extent that (in Mbeumo's case in particular) they didn't mind paying a fee that was probably somewhat inflated off the back of that one-off level of overpeformance.
 
Why would it?

The top players tend to over perform their xG because xG represents expected chance of scoring for the average player.

But top players are not the average player.

What’s a 0.4 xG shot might actually be a 0.7 xG for someone like Kane.

For someone like Kane a 0.4 xG shot would have around a 0.45 probability of going in. Kane is a very good finisher, better than almost everyone in the game, but like all good goal scorers he's mostly achieving that by being very good at getting chances to score.
 
Perhaps other may be able to shed a light on Opta's table prediction.

While they have us down in 10th, there are only 3 points separating 10th and 4th place in their predictor.

 
10s of thousands is not a large amount of data.

It is in the context we are speaking of, using data to create models. These are regression models: 10s of thousands of data points are more than enough to find the underlying signal (if any). The input variables (position, foot of kick, position of GK, etc) can't be more than 100

Most industry standard models go through intense validation and peer review so questions of whether sufficient data is used for training are answered at this stage
 
It's a picture of what we've been seeing thus far. Nothing more.

Arsenal have shown better consistency than anyone else when it comes to getting the three points in the bag. This makes them (early) favourites, although their overall ceiling is a tad lower than what City and Liverpool have achieved in the previous seasons.

Liverpool and City aren't expected to reach the heights of previous campaigns, but their floor level with spurts of quality here and there will possibly keep them ahead of the rest of the pack.

Between positions 4-12, you have the chasers. An open race for the remaining two (possibly three) CL spots and the other European places that will probably come down to form (team and individual), moments of quality when it matters, tactical acumen in critical matches, luck, injuries to key players etc.

Between positions 13-20, you have the teams that will be happy, more than anything else, to avoid relegation and, if that's possible, set the foundations to aim higher next season.

It doesn't mean that either City or Liverpool can't miss out on CL, or that we can't go on to win 10 games on the trot and lead the race for CL qualification come next spring. It just paints a picture of the league 1/4 into the season in terms of "relative strength". You take that we've already moved a whole block up, compared to last season, and we're looking upward.
 
Perhaps other may be able to shed a light on Opta's table prediction.

While they have us down in 10th, there are only 3 points separating 10th and 4th place in their predictor.



Order looks about right, once you accept that 4th-10th are a block of fairly interchangeable teams. I wouldn't be surprised at any team in that block finishing above or below us.

All three promoted teams staying up would also be great for the league.
 
Perhaps other may be able to shed a light on Opta's table prediction.

While they have us down in 10th, there are only 3 points separating 10th and 4th place in their predictor.


https://www.reddit.com/r/soccer/s/Cvg8N0PHjv

Id put absolutely nothing into these opta predicted table.

I doubt they'd predict Liverpool going on a losing spree for example.
 
Just want to note that the definition of these statistics, in the case of football here, still involves subjectivity that renders the meaningful of the statistic to be one fairly up for debate.
Well, he did say "based on", the selection of them and the interpretation of their significance and meaning is another matter, of course.

Kind of how scoring more than xG to one person means a lucky side bound to come back around, while to another shows a great finisher making more of a small chance than his neighbor might be able to do with a big one.
 
Why would it?

The top players tend to over perform their xG because xG represents expected chance of scoring for the average player.

But top players are not the average player.

What’s a 0.4 xG shot might actually be a 0.7 xG for someone like Kane.
I mean, one of the primary insights from xG is that you can more accurately identify players/teams who are over (or under) performing their long term goal potential. So I'm a little confused by your question.

The top players don't overperform their xG anywhere near as much as Cunha + Mbeumo did last year, unless their name is Messi or Kane. Many top players are near their xG expectation. Here's Haaland, Salah, Saka, and Ollie Watkins:

Haaland (Bundesliga and Premier League only): 133 non-penalty goals, 116 non-penalty xG
Salah (Premier League only): 152 non-penalty goals, 144 non-penalty xG
Saka: 43 non-penalty goals, 43 non-penalty xG
Watkins (Premier League only): 71 non-penalty goals, 71 non-penalty xG

It seems fairly obvious to me that Cunha/Mbeumo's performance last year was unsustainable. The top players tend to create more shots and higher quality shots, moreso than just finishing better on the shots they do get.

Cunha had 15 goals on only 8.6 xG. That's incredibly hard to sustain long-term. For his career, Cunha's at 47 non-penalty goals on 47 non-penalty xG, so he's been an exactly average finisher over his entire career. This year he's under his xG total so far, of course (1 goal, 1.5 xG).

Mbeumo had 15 non-penalty goals on 7.5 non-penalty xG. For his career he's at 70 non-penalty goals on 62 non-penalty xG. So he has been a bit better than an average finisher, but finishing at 2x his xG will be almost impossible long term. And if you remove last season, he's been essentially an exactly average finisher.

I think these guys are likely scoring 7-9 goals a season with us on average. And my concern has been we bought them fully believing their goal numbers from the 2024-2025 season reflect their ability, rather than outlier seasons that are probably hard to replicate moving forward. Mbeumo in particular has been absolutely fantastic so far, so that's given me some hope that there were even more advanced analytics/scouting behind their acquisitions. But if it truly was just "Hey, the guy banged in 15 goals last year, he's probably really good" I think it portends really poorly for the kind of acquisitions we can expect from our leadership.
 
Last edited: