Inference
Handling uncertainty in sample data for wargame statistics.
Uncertainty
It’s easy to misconstrue statistics. Most of us are wargamers first and foremost; We’re just trying to follow the state of the game, not earn a PhD.
The previous section focused heavily on why statistics can provide real value for us. The implication being, wargaming statistics shouldn’t be cynically dismissed.
However, there’s another danger equally pervasive in this space: ignoring Sample Size and Uncertainty. This failure leads to run away conclusions inconsistent with the available evidence.
Despite being polar opposites, the naïve and cynical views share a major commonality: a flawed understanding of uncertainty. The former completely ignores it (treats noise as signal); the latter is terrified of it (treats noise as bias).
Statistics is built for uncertainty; it’s a discipline designed to quantify it. We call this process Inference; leveraging mathematical principles and assumptions to transform a collected sample into an estimate of the true value we want to know.
In our case, the “true value” of win rate would be the probability a faction has at winning a game in the current meta.
Bayesian Average
If you've been following wargaming statistics for any amount of time, you've probably noticed how low sample factions (factions with only a few games) can put up more extreme performances than large sample factions (factions with many games). Consider this table of made-up data:
| Faction | Games | Win Rate |
|---|---|---|
| A | 35 | 69.7% |
| B | 68 | 58.1% |
| C | 403 | 57.4% |
Awesome. So, Faction A must be the best, Faction B is second, and Faction C is third. Our work here is done.
Ehm. Obviously not…
Faction C is clearly better than faction B, and although it's possible faction A is better than faction C, we’re not so sure about that. In fact, we know faction A’s win rate will likely go down as more games are played, it’s just a question of how much.
How do we handle this? How do we compare faction performances with vastly different sample sizes?
Well, there's always more than one way to skin a xenos in statistics. For our case, a very simple, yet very helpful, method I’ve been using is called a Bayesian Average. I wrote a blog post on it awhile ago, which you can read here. This effort was heavily inspired by Data Scientist David Robinson’s excellent introduction to the subject. If you really want a deep dive into the technical side, check out his blog or book.
But for this section, let’s jump into a practical application. We’ll calculate a Bayesian average for the metric that is near and dear to all our hearts: Win Rate.
Fellow board game nerds have likely heard of the website Board Game Geek. Their ranking algorithm, “Geek Rating” is a Bayesian average.
Pseudo-Observations
The easiest way to understand and calculate a Bayesian average is via Pseudo-Observations.
To calculate a faction’s Bayesian win rate, we begin with a set of pseudo-observations. Pseudo-observations are unobserved, "fake" games included in the calculation before any real observations are added.
For simplicity, let’s start off with 5 wins and 5 losses as our pseudo-observations; a 50% win rate over 10 games. Calculating the Bayesian win rate for a faction would look like this:
If you’re an amateur like me, you might feel this approach sounds absurd. We’re just adding fake data, how is that not a crime?
Robinson explains that although the execution is simple, the math to prove it is quite complex.
In other words, some smart math nerds found a way to simplify Bayesian methods thanks to the power of what’s called a Conjugate Prior. Conjugate priors provide an easy mathematical shortcut for updating random variables modeled by certain, simple distributions. Because win rate is a binomial (or close enough anyway), we can tap into some Bayesian magic with nothing but basic arithmetic.
Don't be too intimidated by the statistical jargon. I’ll do my best to explain this process as simply as possible; you don’t really need to know what concepts like “random variable” or “conjugate prior” actually mean. I’ll occasionally invoke certain terms just to make it clear there are deeper principles at work.
The Prior
Bayesian statistics begin with what’s called a Prior. For us, the prior is the strength (number of games, i.e. 10) and mean (the average, i.e. 0.5) of the pseudo-observations we’re using.
This prior represents our assumption that low sample factions are noisy and not representative of a faction’s true win rate. It constrains our estimate to a more reasonable range of uncertainty when sample size is small; the smaller the sample size, the larger an effect the prior has on the estimate.
The 5 win and 5 loss prior can be described as saying:
I’m pretty sure no faction’s true win rate is below 10% or above 90%.
When you think of it in those terms, it's suddenly not so absurd after all. A prior is not merely an adjusted average. In truth, it’s a Probability Distribution. That is why we can describe it using terms like range or probability.

Notice how the bell curve is highest at 50%, then goes down as values move towards 10% and 90%. Values in the center are more likely outcomes than values at the tails.
There's nothing special about that 10 game prior I made up. After all, why start with 10 games? Why not 20? Or even 50?
There’s no right way to choose a prior. Rather, the prior we use is justified based on our knowledge and assumptions of the system we are analyzing.
Empirical Prior
There is a way to construct a prior from actual data, present or historical. Such a prior is called an Empirical Prior.
To build a Kill Team empirical prior, I decided on using data from only the KT24 edition. This edition has had far more extreme win rates than the previous edition, so it should naturally produce a wider prior (a greater range of uncertainty than KT21).
We can do this by fitting a beta distribution using maximum likelihood. Yeah, there’s the jargon again… To me, this process was no more difficult than shoveling data into a glorified calculator. You can check out Robinson’s work for the technical explanation.
The main thing to understand is we’re using historical Kill Team data to inform our expectations over an unknown faction's true win rate.
With all our quarters of KT24 fitted, we get a prior strength of 42.57 games. Rounding down, we unironically get the ever-impressive value of 42.
We round off the decimal for ease of use. When using informed priors, shaving off a decimal has a trivial impact.
After using the empirical process to find the strength (number of games), we can then set the prior mean (average) to 50%. We know the expected value of an unknown faction ought to be 50%. If you pool all wins, ties and losses together, for all factions, you will always get 50% due to every game producing a sum of 1 success between two players.
The fitted hyperparameters α and β came to 20.34 and 22.23 respectively (α represents wins and β represents losses). The fact that it isn’t 50% is because more factions have a below average win rate due to poor representation. We want poorly represented factions to inform the variance for our prior, but we do not want them to inform the mean.
This gives us a strong, informed prior, which is expected with empirical priors. The calculation for our average now looks like this:
This prior’s assumptions can be understood as:
95% of Kill Team factions have a true win rate between 35% and 65%.
Remember, the true win rate is not our sample mean, but a faction’s true population mean. Knowing that, this prior seems perfectly reasonable; it also includes a 5% wiggle room for the rare factions that are extremely off balanced (thank you Canoptek and Battleclade).

Notice the central 95% highlighted in this chart. That’s the range where we say there’s a 95% chance an unknown Kill Team’s true win rate is here.
Website Metrics
If all of this sounds horrifying to you, don’t worry. When you land on the Rankings page, the actual values you see for win rate and placing rate are the Raw, unadjusted averages. It’s only the natural language descriptions I use for these values (i.e. weak, average, strong) that are informed by the Bayesian averages.
But there are some cooler ways to view these estimates.
The win rate and placings pages both have an Interval tab. There, you can directly see the Bayesian average of each faction’s win and placing rate.
Additionally, just like the prior, Bayesian estimates are also Probability Distributions (not mere averages!). We call these distributions Posteriors; Posteriors represent the full uncertainty within our estimate. In other words, posteriors embody the sample size.
Because of this, we can highlight the central 50% or 95% probability within these distributions (just like I did in the above image). Such ranges are called Credible Intervals. Credible intervals are a helpful representation of uncertainty within an estimate. For a more detailed run down of these estimates, see my blog post here.
We can choose any probability for a credible interval. 50% and 95% are just conventions.
Placing Rate
Every step of empirical Bayes inference we took for win rate easily translates over to placing rate.
Placing rate is also a binomial, which lets us leverage binomial likelihood with a beta prior; put simply, it can use the magic of a conjugate prior and pseudo-observations.
Below are the hyperparameters (pseudo-observations) for both win and placing rate; Alpha (α) represents successes (wins or top placings), and Beta (β) represents failures (losses or non-placings):
| Rate | Alpha | Beta |
|---|---|---|
| Win | 21 | 21 |
| Placing | 3 | 21 |
Next up, we'll take these estimates and figure out how we can leverage them in a ranking algorithm.