PDA

View Full Version : A Spot of Mathematical (or Statistical) Assistance

Fortuna
2014-02-12, 05:42 PM
Hi. Maths student and extremely amateur statistician here, playing with things he doesn't really understand.

Now, if you have a bunch of numbers, and you know that they were generated by rolling a single 'die' with a given number of sides (1dx), and adding a known constant term (c), it's reasonably straightforward to derive from the variance and mean of the sample best guesses for c and x. The variance of 1dx is (x2+x)/12, so you can calculate x from the variance, and the mean is simply c + (1+x)/2. All reasonably straightforward.

But what if you have a bunch of numbers and only know that they were generated by ndx+c? The variance is quite straightforwardly n(x2+x)/12, and the mean is c+n(1+x)/2, but what third statistic should be used to permit solution for all three?

Razanir
2014-02-12, 08:29 PM
Hi. Maths student and extremely amateur statistician here, playing with things he doesn't really understand.

Now, if you have a bunch of numbers, and you know that they were generated by rolling a single 'die' with a given number of sides (1dx), and adding a known constant term (c), it's reasonably straightforward to derive from the variance and mean of the sample best guesses for c and x. The variance of 1dx is (x2+x)/12, so you can calculate x from the variance, and the mean is simply c + (1+x)/2. All reasonably straightforward.

But what if you have a bunch of numbers and only know that they were generated by ndx+c? The variance is quite straightforwardly n(x2+x)/12, and the mean is c+n(1+x)/2, but what third statistic should be used to permit solution for all three?

The next non-zero moment of the uniform discrete is the fourth. And I think the fourth moment about the mean of ndx+c is
-n2(x4-1)/24

Fortuna
2014-02-12, 09:43 PM
The next non-zero moment of the uniform discrete is the fourth. And I think the fourth moment about the mean of ndx+c is
-n2(x4-1)/24

I... think I understand that, but could you please unpack it in dummy mathematician-speak instead of statistics? What, precisely, is a 'non-zero moment of the uniform discrete', and how would one go about calculating the fourth moment of a population?

warty goblin
2014-02-12, 11:16 PM
A dice is a special case of a discrete uniform distribution, which gives equal probability to a strictly finite number of outcomes.

Moments are (for distributions that only give positive probability to a countable number of values) probability weighted sums of outcomes to powers. Thus the first moment is the mean, and is simply the sum of outcomes divided by the probability of that outcome.

Things get a little more complicated for the second moment and higher. The k'th central moment is the probability weighted sum of (x - mu)^k, where mu is the average calculated above. The noncentral moment is just the probability weighted sum of the x^k, where x denotes a possible value. If the mean is zero, the central and noncentral moments are identical.

The first moment measures central tendency, the second central moment is the variance and measures dispersion, the third measures asymmetry, and the fourth measures how 'heavy' the tails of the distribution are. Nobody cares about higher moments. Since the discrete uniform is symmetric about the mean, it has a third moment of zero.

What you are attempting to do is a very old form of parameter estimation known as the Method of Moments. You simply equate sample and population moments, and solve for the parameters. Usually this is done with the noncentral moments.

Wolfram (http://mathworld.wolfram.com/DiscreteUniformDistribution.html) has a list of the moments of a discrete uniform. You calculate the sample moments by averaging your observations to the relevant power.

I should note that you aren't likely to have particularly good performance with a method of moments estimator, particularly for a problem like this. Unknown n problems are usually very difficult, and the estimators tend to change their behavior drastically for only small changes in observed values.

Fortuna
2014-02-13, 01:20 AM
Aha. Cheers!

Jay R
2014-02-13, 01:21 PM
Don't forget to use the boundary conditions. The lowest number you get is greater than or equal to n+c, and the highest number you get is less than or equal to nx+c.

Also, when you are done with the Method of Moments, you will almost undoubtedly get a incorrect and impossible answer. "Roll 6.12 d4.38s, and add 3.405," or some such. I would then compare all the nearby integral solutions with a Chi-squared or Kolmogorov-Smirnov test.