Results 1 to 13 of 13
Thread: Statistics help

20190404, 07:16 PM (ISO 8601)
 Join Date
 Aug 2005
 Location
 Mountain View, CA
 Gender
Statistics help
Any expert statisticians in the Playground? I'm trying to do something that goes a bit beyond what my high school statistics class covered, and I want a second opinion on whether I got it right. Details here.
Like 4X (aka Civilizationlike) gaming? Know programming? Interested in game development? Take a look.
Avatar by Ceika.
Archives:
SpoilerSaberhagen's Twelve Swords, some homebrew artifacts for 3.5 (please comment)
Isstinen Tonche for ECL 74 playtesting.
Team Solars: Powergaming beyond your wildest imagining, without infinite loops or epic. Yes, the DM asked for it.
Arcane Swordsage: Making it actually work (homebrew)

20190405, 06:27 AM (ISO 8601)
 Join Date
 Nov 2007
 Location
 Cippa's River Meadow
 Gender
Re: Statistics help
warty goblin is the main professional statistician (and expert woodworker) I know of on this forum. Maybe you can use your mod powers to summon him?

20190405, 06:50 AM (ISO 8601)
 Join Date
 Apr 2008
 Location
 Germany
 Gender
Re: Statistics help
My statistics is not good enough (anymore) to reliably make a comment on the procedure (though I see no obvious problem).
But I wonder what you mean by 'first' and 'last' cards in the deck. The order in which you constructed your deck? Or the order Arena displays the cards when looking at it? I feel unless it was a weird intended bug it's unlikely the client remembers in what order you constructed the deck.

20190405, 07:34 AM (ISO 8601)
 Join Date
 May 2007
 Location
 Tail of the Bellcurve
 Gender
Re: Statistics help
I'm sure just glancing at it whether your method will work or not. I'll have a harder look this weekend; right now I have to finish my cornflakes so I can head back to the statistics mines...
Thanks, but I'm hardly an expert woodworker. If I was, I'd have figured out how to carve faces by now.Bloodred were his spurs i' the golden noon; winered was his velvet coat,
When they shot him down on the highway,
Down like a dog on the highway,And he lay in his blood on the highway, with the bunch of lace at his throat.
Alfred Noyes, The Highwayman, 1906.

20190405, 07:36 AM (ISO 8601)
 Join Date
 Apr 2010
 Location
 Night Vale
 Gender
Re: Statistics help
My gut says you should be testing to see if you can reject the null hypothesis of shuffling is fair and random first.
I may be mistaken, but I think a multivariate binomial distribution is more appropriate for what you're doing, or the small population expansion.Avatar by TheGiant
Longform Sig

20190405, 08:01 AM (ISO 8601)
 Join Date
 Jan 2009
Re: Statistics help
This sounds awesome.
I'm doing a Master's in statistics currently, so I'll try commenting on a couple things. I definitely defer to the more experienced, though.
I think you are going about it in the right way, if I'm reading you right.
Originally Posted by from your link
You use these simulations to make up two distributions: for the first 24 cards, and for the last 24 cards.
In step #2, you run two tests to see if "first 24 real" is similar to "first 24 simulated", and same for last 24. I'd defer to warty goblin about the correctness of steps 24, but that sounds reasonable.
Note that, instead of using a pvalue of 0.05 as the straight cutoff, you could take the attitude of degrees of evidence. < 0.05 is "strong evidence"; < 0.01 is "very strong evidence".
If you plan on presenting this to Wizards of the Coast, they might find it intriguing even if you have a pvalue < .1.
As a side note, if you wanted to simply prove nonrandomness, I'd recommend simulating truly random (well, computer pseudorandom RNG) shuffling and comparing its distribution to your real data, to see if there's a difference.
But here you're trying to prove a specific sort of nonrandomness.
On the other hand, it might be worthwhile for you to test just if the shuffling is nonrandom in general as well as for what you expect is the particular bug.
I have 8 bins and unequal sample sizes, so 8 degrees of freedom.
I'd agree with his gut response. At least, that would make folk take the "now I'm testing for this specific sort of nonrandomness" more seriously, since they'd already be fairly convinced something is off.
Hypergeometric (or some sort of multivariate hypergeometric?) might also be more appropriate, although for large samples the difference between it and binomial mostly disappear.Last edited by JeenLeen; 20190405 at 08:03 AM.

20190405, 12:45 PM (ISO 8601)
 Join Date
 Aug 2005
 Location
 Mountain View, CA
 Gender
Re: Statistics help
You can export a deck from Arena, which works by putting a list of the cards in it into your clipboard so you can paste it somewhere. The order of that list is always the same, and is determined by the order you added cards to the deck when you built it. When a deck is written to the game logs, the same order is used. I'm pretty sure this is the order that the game uses internally to store the deck, and most likely is also the order that gets input to the shuffler.
Already did that.
On looking it up, it looks like that is indeed the type of distribution I'm working with, but I didn't find anything on how to test whether two samples are from the same such distribution.
A low pvalue would actually be an indication that I'm wrong, not that I'm right.
Already did that, as linked above. The distribution a correct shuffler is supposed to have is easy to derive from pure theory, and that's the first thing I tested. My choice of hypothesis for the specific bug is based on the patterns I observed in that first test.
According to the source I'm using for how to do the two sample chisquared test, if the sample sizes are different then it's just the number of bins. If you have a more authoritative source that says otherwise, please tell me where I can find it. When I try searching for such things, the overwhelming majority of results are about a regular Pearson's chisquared test, which has an assumption that doesn't match my situation  that the predicted distribution is a theoretical one, known exactly with no variance.
Hypergeometric is what a correct shuffle is supposed to have. This bug results in something different.
Thanks for the comments! I'm hoping to post my detailed study plan on reddit today, but I want to be reasonably sure I'm doing the analysis right first.
Incidentally, the numbers for the simulation row in my example are real. Running the billion shuffles and tabulating the results took somewhere around an hour, I think.Last edited by Douglas; 20190405 at 01:31 PM.
Like 4X (aka Civilizationlike) gaming? Know programming? Interested in game development? Take a look.
Avatar by Ceika.
Archives:
SpoilerSaberhagen's Twelve Swords, some homebrew artifacts for 3.5 (please comment)
Isstinen Tonche for ECL 74 playtesting.
Team Solars: Powergaming beyond your wildest imagining, without infinite loops or epic. Yes, the DM asked for it.
Arcane Swordsage: Making it actually work (homebrew)

20190407, 08:44 AM (ISO 8601)
 Join Date
 May 2007
 Location
 Tail of the Bellcurve
 Gender
Re: Statistics help
You're using the two sample chi square correctly so far as I can tell. However, combining the two pvalues via Fisher's Method isn't quite right here, because the tests aren't independent. Since all the cards have to end up somewhere in the deck, the location of the first card in the shuffled deck is not independent of the location of the second.
The most obvious solution is to do one test from the beginning by working directly with the simulated joint distribution. So when you simulate/tabulate the data, record the probabilities for all 60 cards ending up in your hand, then calculate the marginal distribution of cards from the first 24 and last 24 by summing over that. This directly gets you a Monte Carlo approximation to the appropriate distribution, so you can calculate a single pvalue via the chi square test.
Because the deck is fairly large however, the dependence between the cards will be fairly weak, so this won't change your answer very much. Further, because the dependence is by necessity negative, your current method is very slightly conservative, which, if you're going to be wrong, is the direction to be wrong in.Bloodred were his spurs i' the golden noon; winered was his velvet coat,
When they shot him down on the highway,
Down like a dog on the highway,And he lay in his blood on the highway, with the bunch of lace at his throat.
Alfred Noyes, The Highwayman, 1906.

20190407, 01:40 PM (ISO 8601)
 Join Date
 Aug 2005
 Location
 Mountain View, CA
 Gender
Re: Statistics help
How is the nonindependence of positions of different cards relevant to my use of Fisher's method? That nonindependence affects the results for how many early/late cards are in the opening hand, which is what goes into the two sample chi square test. Fisher's method doesn't come in until that's already done.
Like 4X (aka Civilizationlike) gaming? Know programming? Interested in game development? Take a look.
Avatar by Ceika.
Archives:
SpoilerSaberhagen's Twelve Swords, some homebrew artifacts for 3.5 (please comment)
Isstinen Tonche for ECL 74 playtesting.
Team Solars: Powergaming beyond your wildest imagining, without infinite loops or epic. Yes, the DM asked for it.
Arcane Swordsage: Making it actually work (homebrew)

20190407, 01:46 PM (ISO 8601)
 Join Date
 Apr 2008
 Location
 Germany
 Gender
Re: Statistics help
Huh, I never felt like MGA's shuffling was eskew... but then again I'm not too attentive or whatever counts here. If it was really bad I guess I would have noticed.
So, I know I cannot really be of much help but if I may ask anyway...(assuming this isn't just an exercise to improve your statistics background)
what do you think the bug is and how much does it diverge from the 'proper' distribution? (going by your simulation)

20190407, 02:17 PM (ISO 8601)
 Join Date
 May 2007
 Location
 Tail of the Bellcurve
 Gender
Re: Statistics help
Fisher's method combines pvalues from independent tests. Your tests aren't independent because you're pvalues are derived from tests of dependent random variables. If A is the number of first 24 cards in your hand, and B is the number of the last 24, then you necessarily have that A + B <= 7.
Bloodred were his spurs i' the golden noon; winered was his velvet coat,
When they shot him down on the highway,
Down like a dog on the highway,And he lay in his blood on the highway, with the bunch of lace at his throat.
Alfred Noyes, The Highwayman, 1906.

20190407, 03:57 PM (ISO 8601)
 Join Date
 Aug 2005
 Location
 Mountain View, CA
 Gender
Re: Statistics help
The correct way to do a FisherYates shuffle (which is what lead developer Chris Clay has said they're using) goes like this:
Code:for (int i = 0; i < deck.length; i++) { int swapIndex = random.nextInt(deck.length  i) + i; int temp = deck[i]; deck[i] = deck[swapIndex]; deck[swapIndex] = temp; }
Code:for (int i = 0; i < deck.length; i++) { int swapIndex = random.nextInt(deck.length); int temp = deck[i]; deck[i] = deck[swapIndex]; deck[swapIndex] = temp; }
0 in hand 1 in hand 2 in hand 3 in hand 4 in hand 5 in hand 6 in hand 7 in hand first 24 0.009336 0.068686 0.201692 0.306143 0.259227 0.122308 0.029739 0.002869 last 24 0.046986 0.194165 0.319792 0.271807 0.128615 0.033814 0.004575 0.000245
I see. I expect the effect of that on the aggregate statistics is really tiny, especially over a large sample size, and weakened even further by the fact that some games will only be counted in one or the other because, for example, the 24th and 25th cards are the same (and I can't distinguish which copy got drawn).
If I really want to be rigorous about this detail, it would be far simpler to separate games into two groups, and check only the first 24 cards for one group and the last 24 for the other. I actually did that for the simulation results, doing a separate set of 1 billion shuffles for each distribution.
Sounds like you're saying this is ok to ignore because of how small it is and what direction it's in?Like 4X (aka Civilizationlike) gaming? Know programming? Interested in game development? Take a look.
Avatar by Ceika.
Archives:
SpoilerSaberhagen's Twelve Swords, some homebrew artifacts for 3.5 (please comment)
Isstinen Tonche for ECL 74 playtesting.
Team Solars: Powergaming beyond your wildest imagining, without infinite loops or epic. Yes, the DM asked for it.
Arcane Swordsage: Making it actually work (homebrew)

20190408, 10:27 AM (ISO 8601)
 Join Date
 Apr 2008
 Location
 Germany
 Gender
Re: Statistics help
Wow, it took me way too long to code this (mostly because I made stupid mistakes not because I couldn't figure it out but still.. and I used octave) (Also, not sure if my implementation is the most efficient but it works)
Okay, the difference seems really obvious now so if you have the raw data it should be clear if they use the wrong algorithm which would be a bit embarassing... Also, it seems weird MGA doesn't sort cards alphabetically or something. But apparently not. So let me know if I should abuse this bug in the future