Pretentious Plastic Ops

Representative Sample

One of the first lessons learned in statistics is most data is a Sample of some, larger population. Typically, the goal is to infer qualities about the larger population from the smaller sample.

Your classic textbook case would sound something like:

Survey 1,000 40k players and ask them what they think about alternating activations.

Obviously, there are far more than 1,000 40k players in existence (it's no Kill Team after all…). You’re using your study sample to make a broader claim about the community at large.

To do this correctly, statisticians follow principles like random sampling or stratification to ensure they’re collecting an accurate Representative Sample of the population.

Sports Ball

However, there’s another way to view a statistical population.

For the moment, let’s set aside the exhilarating world of wargame statistics and instead enter the boring world of sport statistics.

Imagine we collected data for a complete season of some sports league. Using it, we wanted to see who the most skilled player was. Technically, we would have the full population data at our fingertips; in professional sports, there is no broader, unmeasured population out there. We collected every observation available for the season.

Yet, it’s far too simple to say:

Well, this guy over here scored the most sport balls. He’s the most skilled player in the league.

Sure, you can prove they were the season’s MVP, but it does not prove they are the most skilled. Clearly, there’s Noise in these numbers even when we have access to the entire population of data.

When sport statisticians model player skill, they still treat their data as a sample. However, for them, the population isn’t a real group of unobserved data points. Rather, it’s better understood as an infinite process of all possible data points that those players could have produced.

🧐 Aperçu

Such a population is a hypothetical superpopulation. You can imagine it as a vast multiverse of other, possible universes where the same season played out, but lady luck tipped the scales in different directions.

Wargame statistics work more like sport statistics than that textbook survey case. We're not searching for qualities of a real population; instead, we use a competitive environment to model player skill and Faction Strength. In fact, you can think of faction strength like a kind of "skill", except it’s inherent to the faction rather than any one player. It’s a Latent Variable; A hidden, but stable, frequency constantly influencing game outcomes amidst the noise.

Events

Competitive environments make collecting a representative sample dead simple; competitive environments are the representative sample. Statisticians barely need to lift a finger; the sweats do all the work!

Wargaming may not have an environment as reliable as professional sports, but we do have something in the ballpark: Event Data. We value real tournament data because it has the strongest controlled environment in our space. Tournaments have the following qualities:

A standardized experience with approved missions and maps.
Engaged TOs and judges who provide support for rule questions or disagreements.
Greater transparency and oversight of game outcomes (we’re more confident the games are real).
Swiss-style pairings structure that helps mitigate skill disparity and provides all players with an equal opportunity to play.

In other words, these environments give us much more confidence that they contain games of Kill Team being played as intended. Because they’re structured, standardized, community-run events, they’re more resilient to self-report bias which can happen with individually submitted pickup games.

Competitive vs. Casual

I’ve had people ask me:

Event data is just the competitive scene, a minority of all players; why don’t you collect data the represents the casual scene?

For what it’s worth, I genuinely consider RTT data to be casual data. RTTs are local, single day tournaments, involving only three rounds. If you’ve ever attended one, you know they’re not a sweat factory.

I do provide a feature that allows you to view data from events with a minimum of 16-players and 4-rounds. The community generally considers events with 4 or more rounds to be on the competitive side (which is probably true on average).

I actually like having a variety of skill levels in the data; ultimately, ensuring the data comes from a real event goes a long way in building confidence in its validity.

If people are interested in “casual data” from players who play a couple of times a year in their own basement, they’re better off sending out surveys with qualitative questions; Collecting technical, quantitative data like win rates doesn’t make much sense without some controlled guardrails.

Data Cleaning

As anyone who has attended an event knows, they’re far from perfect. Measurement Error still exists in these environments; that can never be fully eliminated. However, if you enjoy doing statistics, you also enjoy Data Cleaning. Data cleaning is always a necessary step to improve our validity. For Kill Team statistics, this means we strive to remove invalid events and invalid games.

Invalid Events

Finding invalid events is rather easy. Here are the following events we remove from the data:

Events with less than 8 players or 3 rounds
Narrative Events
Doubles Events
Leagues
Online Events
Junk Events

Event Size

We always remove events that do not meet the minimum requirement of having at least 8 players and 3 rounds. This size is achievable for even small KT communities, yet big enough to expect we’re receiving some of the benefits of a Swiss event.

Narrative Events

Narrative events run unique missions and rules. They’re fun, but not at all standardized. We remove them.

Doubles Events

Doubles events involve two teams of two players, playing 2v2 (four Kill Teams on the board at once). They’re probably my favorite way to play Kill Team, but alas, not at all standardized (or balanced!). We remove them.

Leagues

Leagues are effectively a collection of lone games, not an event. Some leagues are very well run and could easily be considered valid. However, many are not; it’s easiest to just avoid using them all together.

Online Events

Online events aren’t bad representations for balance, but they lack the social and physical environment present at live events. You could argue they should be included; but I like encouraging the actual hobby, which involves real human interactions and real miniatures. I’m sure James Workshop would concur. We remove them.

Junk Events

Finally, junk events are duplicate events, failed events, people testing out the software, or some other irregularity that is clearly not real. It goes without saying we remove them.

Team Events

You’ll notice team events weren’t on the removal list. Team events still rely on 1v1 games, but each player on a team contributes to their team’s collective score in addition to their personal record. We keep team events because they contain valid games. The main issue with team events is they influence matchups in a way normal events do not; each team has some control over the type of opponent their players might play against.

🧐 Aperçu

Historically, about 5% to 6% of all games come from team events. Because the data skews so heavily towards single events, the influence of team events on overall matchups should be minimal.

However, team events tend to be highly competitive and are valued for the quality of their games; we want to take them seriously, so we keep them.

Invalid Games

Even if an event is valid, there could be games within that event which are not. When TOs use event management software, they must handle no shows, byes, or disqualifications. TOs don’t all follow the same process; sadly, this means we can’t catch all invalid games. However, there are some general principles we can follow to remove the worst offenders.

We remove a game, the outcome produced by both players, if any of the following is true:

A game lacks two real players.
One or both players are missing a game result (win, loss, draw).
One or both players aren’t in the ranked placings (no idea how this even happens, honestly)
One or both players produce 0-points in a non-WTC game.

The first three conditions should be uncontroversial, the final one, less so.

🧐 Aperçu

I follow a general philosophy that any game suspicious of being fake should be purged. Innocence proves nothing.

It’s quite common for TOs to host a fun game with any player on a bye. In these cases, the TO, or another player helping them, usually exists in the data, but is given a score of 0 for these games while the real players receive a free win.

Table Top Hearld has a great feature where you can label a player a “dummy player.” No, it’s not an insult, it’s nice for Statlords like me who want to know if a player was a real participant or not. If Best Coast Pairings has such a feature, I haven’t found it; many TOs just handle byes using a fake participant (the names are often hilarious) who produces 0-point games.

🧐 Aperçu

Historically, about 1% to 1.5% of BCP games wind up being 0-point games. It’s a small amount, but not completely insignificant.

Throwing these games out does mean we lose games where a player genuinely produced 0-points. Although such games do happen (clearly someone else also administered a purging), I’m convinced most 0-point games are more likely to be fake than real. It’s worth losing some real data if we know we’re throwing out much more fake data.

🧐 Aperçu

Other possible reasons a player produces 0-points in a game includes no shows, leaving early, or disqualification. Ultimately, 0-point games are always questionable. A suspicious mind is a healthy mind.

WTC Events

The big exception to this would be in cases where the event is using WTC scoring. For WTC scoring events, we keep the 0-point games because they’re very common and expected. It is possible to separate the WTC score from the true approved ops score, but I don’t bother since most WTC events are just team events (i.e. a minority); it’s easier for me to waive the 0-point game rule for WTC events.

Well, that completes the most exciting section in these docs. Next up, we’re going to level up our statistical jargon and enter the world of Inference.

Representative Sample

Sports Ball

Events

Competitive vs. Casual

Data Cleaning

Invalid Events

Event Size

Narrative Events

Doubles Events

Leagues

Online Events

Junk Events

Team Events

Invalid Games

WTC Events

On This Page