Wednesday, October 11, 2023

World Cup 2026 qualification projection - Introduction

 Across past World Cup qualifications, I have run a bit of a "projection" process.  Typically, the various global confederations can take a while to announce their plans - which means by the time you can do it "correctly", we are well into the qualifiers.

However, this time around we already know pretty much the way the whole thing works (minus maybe some seeding plans etc) - so the process can be done right from the start (well, a month in).

How it works

Basically, it's pretty simple and based entirely on:

  • announced structures and seeding plans, and
  • projected results that are based entirely on FIFA ranks (and a bit for home advantage)
The projection (which is effectively just a big Excel workbook with a short macro) runs 10,000 "simulations" of the possible group and match draws, the results of these and the resultant movements to future rounds etc.

The "match results" section working uses a simple randomised model that is based on all international football results across the past few years and how those results are related to the FIFA ranks of the teams involved.  It is NOT particularly fancy and only adds a "home advantage" to one team if applicable (the analysis I did for this part of the process suggests home advantage is equivalent to 0.33 goals).

After running the 10,000 projections the model ends up with the probability of each country qualifying for the 2026 World Cup.

Caveats

Anyone could point out tons of reasons why this isn't the world's most fantastic piece of "analysis".  Most people would note:
  • The link between FIFA ranks and results - that is, assuming the FIFA ranks are actually accurate - is probably best described as "heroic"
  • There is no allowance for form, injuries etc - it's all just FIFA ranks.  This is partially offset by the fact that 10,000 projections are run - some of them have long running slumps, some bounce back straight away.
  • There isn't an allowance for "need".  The model doesn't care if the last match pits an already eliminated team against one needing a win to secure qualification - it's 100% dispassionate.
  • The ranks themselves don't "adjust" with projected results.  This is more important as the seeding for future draws is based on current ranks - and it is almost certain that some projections would imply changes to seedings (that is, if a low ranked team were to advance to the next round, they would probably have a higher FIFA rank at the time of the next draw - I haven't made any allowance for that).
  • Probably lot of other things.
That said, you can judge the "accuracy" of the projections when I get to some numbers.

Things that are useful

However, the one thing that I like about the model is that "randomness" is highly controlled.  Effectively each time I run the 10,000 simulations, the random numbers that underlie each draw and match result are the same - and the match results will only differ due to changes in the FIFA ranks that might have occured in between (or because a group stage draw has been held etc).  So, when two sets of projections are compared, the changes in the results are not just due to one projection being "lucky" and the other "unlucky", it's only due to updated ranks or (as we go along) due to results that have occured recently.

Project outline

That's the intro post (which I had to write because otherwise I'd never do any of the others).  If I ever continue, I'll start with a quick overview of the initial projection, then go through confederation by confederation - starting with Conmebol (because they have actual matches), then the AFC (covering the First Round and a bit more on the sort of things that are in the projection), then CAF (because they start next in November but actually have some interesting "pre-draw" results that illustrate some of the things I like about the way these projections work).

The other confederations will follow later as they don't really get going for a while.

Thursday, May 24, 2007

Goal difference vs Head-to-Head - who benefits

This is a question that has bothered me on and off for a while now. Personally, I prefer the goal difference comparison - if only because you can get all the information you need from a simple table of standings (and don't have to go to a list of all the results to see who won).

Still, that's not a really good reason for preferring one method of 'ranking' over another. So, time for some meaningless numbers.

I've got this spreadsheet thing that can randomly generate a set of results for international football matches - with the likelihood of a result based on FIFA ranks and the typical relationship between past international results and FIFA ranks (now, there's a long story). If you run a lot of random trials (say, 50,000) for a group, and compare who would finish where based on GD and HTH rankings, you can get an idea of what the effects are.

So, I've done a number of different scenarios, based on a 4-team group playing a full 6 match schedule. To make it look vaguely realistic, the groups are of 4 African sides, using May 2007 FIFA ranks.

In each case, I've put in a table showing:
  • Team
  • FIFA rank
  • % times finishing 1st under each method
  • % times finishing 1st or 2nd under each method


Which gives a handle on what we might actually care about as a result of the group.

TRIAL ONE - A mixed group - no hopeless teams, but a range of abilities.






TeamRankAve PtsAve Pos GDAve Pos HTH%1st GD%1st HTH%1-2 GD%1-2 HTH
Cameroon96813.11.341.3673.7%72.6%93.1%92.7%
Guinea5939.32.282.2917.2%17.8%62.8%62.4%
Togo4667.62.722.728.2%8.6%36.9%37.2%
Liberia2244.03.653.630.9%1.0%7.2%7.7%


TRIAL TWO - One good team - three no hopers.






TeamRankAve PtsAve Pos GDAve Pos HTH%1st GD%1st HTH%1-2 GD%1-2 HTH
Cameroon96816.81.011.0198.9%98.7%99.9%99.9%
Guinea-Bissau187.52.582.590.8%0.9%56.6%56.0%
São Tomé e Príncipe05.13.213.200.1%0.2%21.8%22.2%
Djibouti05.13.203.200.2%0.2%21.6%21.9%


TRIAL THREE - Three evenly ranked sides - one no-hoper






TeamRankAve PtsAve Pos GDAve Pos HTH%1st GD%1st HTH%1-2 GD%1-2 HTH
Egypt69411.02.002.0133.8%33.8%66.7%66.8%
Senegal69211.02.002.0033.7%33.7%67.0%66.8%
Mali68510.92.032.0332.4%32.5%66.0%66.0%
Somalia131.53.973.960.0%0.0%0.3%0.4%


TRIAL FOUR - One top side - three others around the same






TeamRankAve PtsAve Pos GDAve Pos HTH%1st GD%1st HTH%1-2 GD%1-2 HTH
Cameroon96810.81.761.7753.1%52.3%78.5%77.9%
Egypt6947.72.722.7216.2%16.5%41.3%41.5%
Senegal6927.62.752.7415.6%15.9%40.6%40.9%
Mali6857.52.772.7615.1%15.3%39.7%39.8%


TRIAL FIVE - Two good - two bad






TeamRankAve PtsAve Pos GDAve Pos HTH%1st GD%1st HTH%1-2 GD%1-2 HTH
Cameroon96814.21.421.4259.2%59.0%99.3%99.1%
Côte d'Ivoire85013.41.611.6140.7%40.9%98.8%98.5%
Guinea-Bissau183.73.463.460.1%0.1%1.1%1.3%
Somalia133.43.523.520.0%0.1%0.8%1.0%


TRIAL SIX - One good - two middling - one poor






TeamRankAve PtsAve Pos GDAve Pos HTH%1st GD%1st HTH%1-2 GD%1-2 HTH
Cameroon96814.11.261.2779.0%77.9%95.2%94.9%
Togo4669.32.402.3910.5%11.1%52.4%52.5%
Zambia4629.32.412.4010.5%11.0%51.8%51.9%
Somalia131.83.943.930.1%0.1%0.7%0.8%


Pretty much overall, GD gives an advantage to higher ranked sides. Except where ranks are very close (so randomness comes into play), the GD ranking is more likely to see the best ranked side finish higher - or the best two sides finish 1-2.

Of course, there's millions of things that make this unrealistic, but the hope is (I suppose) that by looking at an average of 50,000 trials these are sort of smoothed away. However, I think the results are clear, and match what I might have expected, and match what happens in reality - namely that lower ranked teams tend to get an advantage from using HTH as the tie-breaker.

So, if you prefer the 'best' teams to make the finals, you're probably going to want GD used to rank, if you prefer 'surprises' (Angola ahead of Nigeria for example) then HTH is more your thing.

Saturday, May 19, 2007

In the beginning...

In the beginning there were deadlines. And the deadlines slowly approached. And the deadlines even more slowly focussed the mind...

Sadly, the mind became focussed on things that - to others (let us call them, for want a better title, "employers") - were of less than earth shattering import.

Yet focussed the mind became.

Questions emerged. Such as "is there a solid relationship between FIFA world ranks and international match results", "is head-to-head really as bad a method of tie-breaking as I think it is", "if you went back to the first FA Cup match and tracked the winners through the thousands of games since - who would you be following now" and "how much more likely is Australia to qualify for the world cup now they qualify out of Asia rather than against the last number Sepp Blatter thought of".

Some of these questions involved numbers.

Some of them involved vast arrays of data and spreadsheets and stuff like that.

Some may even involve answers...