Jump to content


Replay analysis - what does this tell us about the MM?

Lets do science statistics

  • Please log in to reply
382 replies to this topic

Baldrickk #1 Posted 08 July 2018 - 02:51 AM

    Field Marshal

  • Player
  • 30307 battles
  • 14,493
  • [-TAH-] -TAH-
  • Member since:
    03-03-2013

*
POPULAR

World of Tanks

 

It's a big game, with plenty of people playing it, but is is fair?

 

The subject of the matchmaker crops up commonly on the forum, with most of the posts being complaints.

As the people posting accusations that the game is rigged against them somehow always seem to have misplaced their evidence, I thought that I would help them out and have written a replay analyser for the purpose of analysing the matchmaker.

 

You can get it here: https://github.com/Baldrickk/replay_team_balance

It is a work in progress still, so if you would like to contribute, or just point out errors or suggest new features, please feel free to do so.

 

A fellow Forumite, Lord Muffin, who has a larger stash of replays than me provided some to analyse (he sent me 21,000, that's over 20GB of replays)

What follows are the results of the analysis of those replays.

Ratings are using WG's rating system.

 

Update: 28th Oct: added flag to remove platoon battles.  Updating the graphs etc below with platoon-less battles only.

 

We start with an overall summary:

 

Block Quote

Total replays:
            12704
Green team average rating:
            4380.9
Red team average rating:
            4381.54
Percentage difference:
            -0.0146%
Stronger than enemy:
            6337 battles
Weaker than enemy:
            6367 battles
Percentage Stronger:
            -0.118%

 

 

So, what does this mean?

Not all the replays were valid (special events, skirmishes etc) so those were ignored, and to remove bias, platoon games are ignored too, leaving just under 20,000 replays about 12,700 replays, a still significant number.

The mean value for his team's rating is a little higher than that for the enemy, and for about 5% of his battles, his team is stronger. This most likely due to platooning, but needs to be verified. Updated without platoons.

 

Posted Image

 

This scatter graph shows one mark for each battle:
  • Position on the X axis is the average rating of the enemy team
  • Position on the Y axis is the average rating of the friendly team (not including LordMuffin himself)
  • Colour shows if it was a win (green) or a loss (red)

 

As can be seen, it's a fairly even spread around the x=y line.

A very good indication that there is no rigging going on by the Matchmaker

 

Spread along the line could be down to a number of factors, however, from experimentation it appears that the dominating factor is tier.

As players improve, they tend to play higher tier battles.

This leads to each tier forming a roughly circular point cloud on this graph.

 

 

Posted Image

Block Quote

Histogram of team rating differences: μ=-0.070767 σ=14.399599

 

Quite well spread around the point where teams are even, this forms a pretty nicely formed normal distribution, as expected for a random MM.

We can clearly see that there are far more balanced battles than there are unbalanced ones, with the mean being almost exactly zero.

 

 

Posted Image

Block Quote

Distribution of results: μ=6.885816 σ=3.336748

Distribution of results: μ=6.832730 σ=3.237254

This didn't change much from with platoons to without

 

Another common complaint is that there are too many games where the result is too one sided. This graph shows that the average result is a difference in tanks alive of 7 (the most common result is 9).

 

This isn't really all that unexpected - the format of the game, with killed tanks removed from the battle pernamently leads to whoever gets the early kills having a great advantage from then, and this advantage just snowballs from there.

This actually seems like a nicely rounded result.

There is a reason for the mean being less than the mode in this graph, which we will see with the next graph.

 

 

Posted Image

What if we look at the results against the strength of the teams?

 

A common suggestion is that a skill based matchmaker would result in more battles ending with a closer result.

This graph shows that with the team difference near to 0, we get the whole range of results, from -15 to +15, with the density of all results not showing a clear difference betwen the "balanced" set of games, or the overall set of games that we see.

 

As noted with the previous graph, near equal results are clearly quite rare due to the game format.

Two clear regions can be seen in this graph however, there is some overlap. It is this overlap of the two regions that cause equally balanced battles to be as common as they are, which is the reason for the lower mean in the previous graph.

 

 

Posted Image

How quickly does the data trend towards the mean?

It's important to have an idea of whether the sample you have is statistically significant or not.

This graph shows the running average across all the battles. It converges quickly, and is within 2% after about 80 battles, 1% after about 300 battles, and only trends towards 0 from there.

 

This trends towards the % value for average team percentage difference as shown in the summary.

 

It should be noted that completely solo, this should be expected to trend towards 0. it does.  Playing in platoons will affect this limit.

Posted Image

This version of the graph shows the above (green) with platoons (red) and including Muffin himself (blue)

 

 

Posted Image

A graph requested by Muffin himself!

This graph plots each battle against the percentage difference in rating. The idea being to look for any sort of trend. It isn't very clear with this many data-points, but there are no trends that are visible, even with a smaller data sample to make it clearer.

Posted Image

 

 

Posted Image

Block Quote

All teams rating distribution: μ=4381.216659 σ=799.052643

 

How the teams for both sides stack up

 

 

Posted Image

Block Quote

Histogram of all players: μ=4474.499513 σ=1844.500787

 

How the players seen all stack up against each other. Where do you lie? 

 

Please, discuss and submit your own results.

I'm happy to help people run the program, or, if they want to put them somewhere I can access them, then I'm happy to run the tool on their replays.

 

 


Edited by Baldrickk, 28 October 2018 - 01:57 PM.


Catn1p #2 Posted 08 July 2018 - 04:51 AM

    Private

  • Player
  • 1350 battles
  • 28
  • Member since:
    10-17-2017

It tells that if you get a streak of terrible teams for 100 games and then a streak of great teams for 100 games then you have gotten average teams for 200 games.

 

Friggin’ genius.



LordMuffin #3 Posted 08 July 2018 - 06:48 AM

    Field Marshal

  • Player
  • 48529 battles
  • 11,167
  • [-GLO-] -GLO-
  • Member since:
    06-21-2011

View PostCatn1p, on 08 July 2018 - 04:51 AM, said:

It tells that if you get a streak of terrible teams for 100 games and then a streak of great teams for 100 games then you have gotten average teams for 200 games.

 

Friggin’ genius.

But it also showed that it I didn't get any such streaks after one another in any way, shape or form.

 

Like.

100 bad teams, 100 good teams, 100 bad teams, 100 good teams etc. Such a pattern would have been recognised in the graph I wanted to see.

Such a pattern would result in a wave-shaped graph that we didn't get.

 

Baldrickk, regarding that graph, I think it would be better if the dots where reduced somehow.

So we could get a line showing the average difference at each point.

Say you take the average value for each dot and the 10-20 (maybe higher) closest ones at both sides closest, and then draw a line between all and remove the dots.

It should result in a wave-formed diagram, if no pattern is visible, no riffing like the above mentioned one can happen, or at least did happen.

 

Regarding the difference in dead tanks. Many have argued it started with the new MM (me included), and in this sample, we could actually see if this is true. Have we gotten more one-sided results since the introducing of the 3-5-7 MM or not.



lgfrbcsgo #4 Posted 08 July 2018 - 08:54 AM

    Second Lieutenant

  • Player
  • 30517 battles
  • 1,019
  • [MOTIV] MOTIV
  • Member since:
    04-04-2012
Nice work. Were these replays recorded by the same player and over which time frame were they recorded?

LordMuffin #5 Posted 08 July 2018 - 09:02 AM

    Field Marshal

  • Player
  • 48529 battles
  • 11,167
  • [-GLO-] -GLO-
  • Member since:
    06-21-2011

View Postlgfrbcsgo, on 08 July 2018 - 08:54 AM, said:

Nice work. Were these replays recorded by the same player and over which time frame were they recorded?

Yes, and over a couple of years aswell.

Say 2014 and onwards.

 



CmdRatScabies #6 Posted 08 July 2018 - 09:13 AM

    Brigadier

  • Player
  • 37626 battles
  • 4,454
  • [-MM] -MM
  • Member since:
    10-12-2015
Are all 21k replays post 3/5/7 matchmaker?

Bordhaw #7 Posted 08 July 2018 - 09:18 AM

    Major

  • Player
  • 11747 battles
  • 2,649
  • Member since:
    01-29-2017
So basically in random battles, the results are random

r0f #8 Posted 08 July 2018 - 09:22 AM

    Sergeant

  • Player
  • 6385 battles
  • 278
  • [AFUNM] AFUNM
  • Member since:
    10-19-2012

So after 21k non currently relevant games you finally found a graph suitable to the point you always want to make on the forum, a point which currently goes against what people experience in their actual plays.

 

What does this tell us about you? :trollface:

 

Hey on the bright side, maybe look into politics / Ceo stuff, they love this ****.

 

 

 

 



TungstenHitman #9 Posted 08 July 2018 - 09:26 AM

    Brigadier

  • Player
  • 23154 battles
  • 4,212
  • Member since:
    08-28-2016
My cat likes cat food but also mice

LordMuffin #10 Posted 08 July 2018 - 09:29 AM

    Field Marshal

  • Player
  • 48529 battles
  • 11,167
  • [-GLO-] -GLO-
  • Member since:
    06-21-2011

View Postr0f, on 08 July 2018 - 09:22 AM, said:

So after 21k non currently relevant games you finally found a graph suitable to the point you always want to make on the forum, a point which currently goes against what people experience in their actual plays.

 

What does this tell us about you? :trollface:

 

Hey on the bright side, maybe look into politics / Ceo stuff, they love this ****.

 

 

 

 

What is the experience you talk about?

 

Why are the games irrelevant?



Dosjer007 #11 Posted 08 July 2018 - 09:33 AM

    Staff Sergeant

  • Player
  • 643 battles
  • 411
  • Member since:
    10-02-2016

View PostLordMuffin, on 08 July 2018 - 09:02 AM, said:

Yes, and over a couple of years aswell.

Say 2014 and onwards.

 

 

 

Do all u guys keep old replays, that many old replays?

 

I guess you have all the client versions of the game installed in your pc as well.



jabster #12 Posted 08 July 2018 - 09:38 AM

    Field Marshal

  • Beta Tester
  • 12555 battles
  • 23,747
  • [WSAT] WSAT
  • Member since:
    12-30-2010

View PostLordMuffin, on 08 July 2018 - 08:29 AM, said:

What is the experience you talk about?

 

Why are the games irrelevant?

 

They’re irrelevant as real data is nothing compared to the power of the feels.

Baldrickk #13 Posted 08 July 2018 - 09:39 AM

    Field Marshal

  • Player
  • 30307 battles
  • 14,493
  • [-TAH-] -TAH-
  • Member since:
    03-03-2013

View PostLordMuffin, on 08 July 2018 - 06:48 AM, said:

But it also showed that it I didn't get any such streaks after one another in any way, shape or form.

 

Like.

100 bad teams, 100 good teams, 100 bad teams, 100 good teams etc. Such a pattern would have been recognised in the graph I wanted to see.

Such a pattern would result in a wave-shaped graph that we didn't get.

 

Baldrickk, regarding that graph, I think it would be better if the dots where reduced somehow.

So we could get a line showing the average difference at each point.

Say you take the average value for each dot and the 10-20 (maybe higher) closest ones at both sides closest, and then draw a line between all and remove the dots.

It should result in a wave-formed diagram, if no pattern is visible, no riffing like the above mentioned one can happen, or at least did happen.

 

Regarding the difference in dead tanks. Many have argued it started with the new MM (me included), and in this sample, we could actually see if this is true. Have we gotten more one-sided results since the introducing of the 3-5-7 MM or not.

 

So, a moving average instead?

 

Regarding more one-sided results, I did that test with my replays a while back when I first did that graph, and there was virtually no difference.  It's buried in the MM thread somewhere... 
Finally found it, post 3731...

This is with about 6k battles either side iirc



unhappy_bunny #14 Posted 08 July 2018 - 09:39 AM

    Major

  • Player
  • 18405 battles
  • 2,781
  • [-OC-] -OC-
  • Member since:
    08-01-2012

View Postr0f, on 08 July 2018 - 08:22 AM, said:

So after 21k non currently relevant games you finally found a graph suitable to the point you always want to make on the forum, a point which currently goes against what people experience in their actual plays.

 

What does this tell us about you? :trollface:

 

Hey on the bright side, maybe look into politics / Ceo stuff, they love this ****.

 

 

 

 

 

He is offering this tool to anyone who wants to test it on their own replays. Why not run your replays through it and see if the result matches your perception of your actual plays? 

At least these guys are trying to find evidence to support or deny the various theories the preoccupy so many on the forum.



LordMuffin #15 Posted 08 July 2018 - 09:43 AM

    Field Marshal

  • Player
  • 48529 battles
  • 11,167
  • [-GLO-] -GLO-
  • Member since:
    06-21-2011

View PostDosjer007, on 08 July 2018 - 09:33 AM, said:

 

 

Do all u guys keep old replays, that many old replays?

 

I guess you have all the client versions of the game installed in your pc as well.

I put it to save all replays so I could rewatch or send a replay I played earlier the same day or the day before to anyone who asked without having to manually save the ones probably asked for. 

 

Hdd space wasn't an issue.

 

View PostBaldrickk, on 08 July 2018 - 09:39 AM, said:

 

So, a moving average instead?

 

Regarding more one-sided results, I did that test with my replays a while back when I first did that graph, and there was virtually no difference.  It's buried in the MM thread somewhere... 
Finally found it, post 3731...

This is with about 6k battles either side iirc

OK. 

 

Yes some kind of moving average, though I don't know how that works.


 

_b_ #16 Posted 08 July 2018 - 09:49 AM

    Brigadier

  • Player
  • 55235 battles
  • 4,037
  • Member since:
    04-06-2011

Good job, ofc the people who believe they're being rigged against will never believe it.

 

But would like to see a dataset for just the builds after WG brainfarted up the 3:5:7 ?

 

 



CoDiGGo #17 Posted 08 July 2018 - 09:50 AM

    Warrant Officer

  • Player
  • 15024 battles
  • 570
  • [NEUR0] NEUR0
  • Member since:
    05-10-2015

1. I want to run the program

2. I dont understand usage guide, not clear must steps from optional steps

3. Program auto-ignore not "randoms" replays ?



Baldrickk #18 Posted 08 July 2018 - 09:50 AM

    Field Marshal

  • Player
  • 30307 battles
  • 14,493
  • [-TAH-] -TAH-
  • Member since:
    03-03-2013

View PostCmdRatScabies, on 08 July 2018 - 09:13 AM, said:

Are all 21k replays post 3/5/7 matchmaker?

Afraid not.

 

View PostBordhaw, on 08 July 2018 - 09:18 AM, said:

So basically in random battles, the results are random

If you want to look at it from "can I predict the result from when I hit the Battle button" or "Is WG fixing the game against me?" then yes.

One thing this data says to me is that in-game performance (of the whole team) is more important than their stats.

 

 

View PostDosjer007, on 08 July 2018 - 09:33 AM, said:

Do all u guys keep old replays, that many old replays?

 

I guess you have all the client versions of the game installed in your pc as well.

My replay folder goes back to 5/1/2017, when I last re-installed Windows.  I do have a stash from older games dating back to when I started playing (almost) but thats more as a WoTReplays proxy - the site didn't exist back then, so this is just a stash of replays that I thought I did well in back then.

I did start a youTube series going back to look at some old replays, but life reared its ugly head. I'll go back and actually release more than one video at some point, but yes, I do have some old versions of WOT instaled for that :P

 

 

View Postjabster, on 08 July 2018 - 09:38 AM, said:

View PostLordMuffin, on 08 July 2018 - 09:29 AM, said:

View Postr0f, on 08 July 2018 - 09:22 AM, said:

So after 21k non currently relevant games you finally found a graph suitable to the point you always want to make on the forum, a point which currently goes against what people experience in their actual plays.

 

What does this tell us about you? :trollface:

 

Hey on the bright side, maybe look into politics / Ceo stuff, they love this ****.

What is the experience you talk about?

 

Why are the games irrelevant?

They’re irrelevant as real data is nothing compared to the power of the feels.

I love how I "finally" found a graph, when I've been posting with this in the MM thread since I started working on this, and every graph has shown the same thing...

And we could take any slice of consecutive games from Muffin's stash, and end up with results that look the same.

 

 

View Postunhappy_bunny, on 08 July 2018 - 09:39 AM, said:

 

He is offering this tool to anyone who wants to test it on their own replays. Why not run your replays through it and see if the result matches your perception of your actual plays? 

At least these guys are trying to find evidence to support or deny the various theories the preoccupy so many on the forum.

 

:great:
 

View PostCoDiGGo, on 08 July 2018 - 09:50 AM, said:

1. I want to run the program

2. I dont understand usage guide, not clear must steps from optional steps

3. Program auto-ignore not "randoms" replays ?

 

I made a few changes last night that fixed a couple of things, give me a second to upload a new release.

 

 

edit: Done.  https://github.com/B...ases/tag/v0.1.2 there is now a .exe that is up to date with the code.

 

1. :great:

 

2. The optional step is to get your own API key from wargaming.  It takes seconds to do, but will make getting the stats from them potentially a lot quicker.

There is a "mobile" key included, which will work, but there is some level of rate limiting on it.

Step by step:

  • Download the .exe
  • Go to download folder in Windows Explorer
  • Press SHIFT and rightclick on some blank space in the Explorer window (not on a file, and with no file selected)
  • In the context menu that pops up, one of the options will be "Open Powershell window here" or "Open Cmd window here".  Click that.  A black or blue console window should appear.
  • get your API key,
  • back in the console window (the blue/black one), type the following:
    • .\replay_analyser -k KEY C:\Games\World_of_Tanks\replays
    • of course put your key in instead of KEY  (you can paste with a right click)
    • hit Enter

3. yes, it only looks at standard battles, and filters out other types.

 

 


 

CmdRatScabies #19 Posted 08 July 2018 - 09:55 AM

    Brigadier

  • Player
  • 37626 battles
  • 4,454
  • [-MM] -MM
  • Member since:
    10-12-2015

View PostDosjer007, on 08 July 2018 - 09:33 AM, said:

 

 

Do all u guys keep old replays, that many old replays?

 

I guess you have all the client versions of the game installed in your pc as well.

 

I have 32k replays saved but I don't have old client versions to view them.

Dorander #20 Posted 08 July 2018 - 09:57 AM

    Lieutenant Сolonel

  • Player
  • 18584 battles
  • 3,043
  • Member since:
    05-07-2012

View PostCatn1p, on 08 July 2018 - 03:51 AM, said:

It tells that if you get a streak of terrible teams for 100 games and then a streak of great teams for 100 games then you have gotten average teams for 200 games.

 

Friggin’ genius.

 

View PostBordhaw, on 08 July 2018 - 08:18 AM, said:

So basically in random battles, the results are random

 

It might seem like kicking in an open door and for any reasonable person it is, but given the prevalence of conspiracy threads by people who are convinced the game is rigged against them, the door isn't as open as people would like to think. Even solid and reliable assumptions benefit from testing and the resulting evidence.

 

Just check the recent "mm is rigged" thread that's been going on these past few weeks, it started with 3 pages of "MM is rigged" arguments, then another 20 of "you have no evidence" arguments.






1 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users