Help me with math please. [Archive] - Giant in the Playground Forums

View Full Version : Help me with math please.

Maryring

2014-01-20, 10:06 PM

It's been way too long since I did stuff with percentages, so I'm quite rusty on how to do it properly but... assume you have a luck based game where you toss a ball into a labyrinth, and it will randomly end up in one of 40 locations. Location 10, 20, 30 and 40 each have a 1.25% chance of showing up. All the other locations have a 95/36 = 2.64% chance of showing up.

Now, here is the question. What are the chances for number 1-39 to show up twice before number 40 shows up once? I have a feeling that there should be a formula for this that I'm missing, but so far all I've come up with is "calculate each individual probability after one another", so 0.9875*0.987(...)*0.0125. But it's such an inelegant solution, and I'm not even sure it's correct considering the sheer variance of potential outcomes in the middle.

Anyone stumbled across this problem before?

To clarify, the game requires each number between 1-39 to show up at least twice before the game is over. So the minimum amount of rounds the game has to be played to reach the "victory" state, is 78. If you get a 40, the game ends, but if you get say a third 37 then the game continues. You just don't make any progress towards the goal.

Zrak

2014-01-20, 10:29 PM

I'm not sure I understand the question. Do you mean the odds of every number from 1 through 39 each showing up twice before location 40 shows up once? Also, exactly twice, or at least twice?

Knaight

2014-01-20, 11:32 PM

Now, here is the question. What are the chances for number 1-39 to show up twice before number 40 shows up once? I have a feeling that there should be a formula for this that I'm missing, but so far all I've come up with is "calculate each individual probability after one another", so 0.9875*0.987(...)*0.0125. But it's such an inelegant solution, and I'm not even sure it's correct considering the sheer variance of potential outcomes in the middle.

Does 1 need to show up twice, 2 show up twice, etc.? Or does 40 showing up third or later suffice? There are two interpretations here, and they produce very different probabilities.

Grozomah

2014-01-21, 03:08 AM

I'm assuming that you mean that the number e.g. 11 must show up twice before 40 appears once?

In that case it's a bit more complicated than you imagine, because a possible solution would be e.g. 11, 22, 34, 1, 44, 11 - or any combination that includes two of a number.

This is where it gets complicated, because you can make up to 39 tosses without reaching an enging situation (you toss each number from 1-39 once).

Now i'm a bit short on time, so i'll give you the solution for a simplified version, where all the probabilities for each number are the same and I can expand the solution in the afternoon, if someone else doesn't do it first.

The probability for throwing each number now is p0=1/40.

Now take a paper and pen and sketch the following. In your first throw you have 39/40 chance of not throwing 40. Then one of three things will happen: either you will throw 40 (p=1/40), you will throw the same number again (p=1/40), or you will throw another number (p=38/40). If either of the first two happen the game is over but it continues in the last case. If the last case happens there is again 1/40 chance of throwing 40, 2/40 of throwing an already existing number, and 37/40 of throwing another number. And so on and so forth.

Now we can look at our scheme and try to figure out what are the odds of throwing 40 in any of the possible outcomes. We can throw it in the fist attempt (p=1/40) or in the second (p=39/40 * 1/40) or on the third (p=39/40 *38/40 * 1/40) on on the N-th (p= 39/40*38/40*...*(41-N)/40 * 1/40 ).

So now we want the sum of all these probabilities from N=1 to N=40. I'm doing this by heart, so check my calculations, but it should look something like:

P= 1/40 * 39/40 * (1 + 38/40*(1+ 37/40*( ... *(1+1/40)) ... )))

Which is calculable. :smalltongue:

In short and according to the text of your problem, not only is your solution inelegant, it's also wrong. :smalltongue:

For your problem i'd suggest that you again draw a chart and tr to figure out the changes in the upper equation if there are two seperate probabilities for various throws, but i think the main chain of thought should be fairly similair.

Kato

2014-01-21, 04:38 AM

I'm not sure about the rules either... maybe you could clarify before we try to solve a wrong problem. This can get kind of tricky.

I'm not entirely sure about Grozomah's last step but otherwise it seems valid. (Could be right on everything... then again, I'd likely use a sum for it just to be sure)
Of course, with the differing chances the term gets quite a bit uglier since you likely want to compute the chance of getting a multiple of ten and another number separately and then add the chances up.

Maryring

2014-01-21, 05:31 AM

I'm not sure I understand the question. Do you mean the odds of every number from 1 through 39 each showing up twice before location 40 shows up once? Also, exactly twice, or at least twice?

At least twice. So through six throws you could get 6, 34, 20, 20, 20, 4, and it would still be within the parameter of success. But at the end, the results you need will have to be 1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6 (...) 38, 38, 39, 39, without a single 40 having been achieved. So the shortest duration that the game can have without "losing" by getting a 40 is 78 rounds.

And thank you for the solution Grozomah, but that solves the equationfor a situation where getting repeats will end the game, so it's not quite the solution I'm looking for. It has sparked the old grey matter a bit though. I added a little clarification to the first post in the thread. Hope it makes the problem clearer.

Heliomance

2014-01-21, 05:43 AM

What's this for? I have to say, just eyeballing it, that the odds seem incredibly low.

Kato

2014-01-21, 09:36 AM

Gosh... so the chances are lower than (79/80)^78... That's the first obvious answer I can give :smalltongue: (Though, as this is still nearly 40% that's not saying much)
I'd still say the most luck you'll have (as is often* the case in statistics) is to target the problem from the other direction, as in, "what are the odds I don't screw up/get a 40"...
Maybe use the above chance for not getting a 40 in n throws and then target the chances of creating a sequence? As in, how likely is it in 78 throws to have sufficient right numbers, in 79, in 80... That should be solvable by calculating the distribution Grozomah described above, I think... and then multiply each term with the respective odds of not having a 40 during the required number of trials?

Grozomah

2014-01-21, 10:33 AM

At least twice. So through six throws you could get 6, 34, 20, 20, 20, 4, and it would still be within the parameter of success. But at the end, the results you need will have to be 1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6 (...) 38, 38, 39, 39, without a single 40 having been achieved. So the shortest duration that the game can have without "losing" by getting a 40 is 78 rounds.

And thank you for the solution Grozomah, but that solves the equationfor a situation where getting repeats will end the game, so it's not quite the solution I'm looking for. It has sparked the old grey matter a bit though. I added a little clarification to the first post in the thread. Hope it makes the problem clearer.

Ah, so it is a bit different than I understood it. And it's no problem, I like doing such problems and I just plain love probability. :smallbiggrin:

So you *must* have each number twice, but having it more than twice is also ok? Again i'll assume even probability for all the numbers since having them different won't do much in terms of understanding the problem but it would make the following solution considerably more complicated.

So what we want now, as a "must have" is each number from 1 to 39 to appear twice, yes? So we start with (1/40)^78, since we need specific numbers, but they can be in any possible permutation. So we also add a multinomial coefficient (http://en.wikipedia.org/wiki/Binomial_coefficient#Generalization_to_multinomial s) which equals to something like 78!/(2^39)

Multiplying both we get the probability of any possible 78-number combination where each number (except for 40) appears twice to be P(78)=2.25508*10^-22. Which is very little, but then again we're fairly picky on which numbers we want.

Now we said that we're of course also perfectly happy with having more than two numbers of each, so let's say we add another attempt that is still OK.
This attempt can give us back any number, since we've already got the bases covered. Since we also have P(78) it's fairly easy to expand it from there.

The throw can be anthing but 40, and the probability for something like that is 39/40. The multinominal also changes to 79!/(3*2^77), since one number is now repeated 3 times. Calculating all this we get
P(79) = 1/40^78 * 39/40 * 79!/(3*2^77) = 1.15798*10^-20

The real fun begins at 80. Compared to P(79) we add another 39/40, change the 79! -> 80!, but the last part can now be either (3^2*2^76) or (4*2^77), depending on whether the added number is the same as in 79 throws or different (a much more likely outcome). Nedless to say it gets progressively more difficult for higher number of required throws. There may be a better solution that i'm toying with, but before i delve into that i'd like a confirmation that i understood the problem right this time. :smalltongue:

TLDR calculated it for 78 and 79 throws, gets insanely complicated later.

Maryring

2014-01-23, 06:59 PM

What's this for? I have to say, just eyeballing it, that the odds seem incredibly low.

Mostly me trying to figure out how bad luck I'm having to thus far not have received the one prize I want from a game. :smallbiggrin:

Ah, so it is a bit different than I understood it. And it's no problem, I like doing such problems and I just plain love probability. :smallbiggrin:

So you *must* have each number twice, but having it more than twice is also ok? Again i'll assume even probability for all the numbers since having them different won't do much in terms of understanding the problem but it would make the following solution considerably more complicated.

So what we want now, as a "must have" is each number from 1 to 39 to appear twice, yes? So we start with (1/40)^78, since we need specific numbers, but they can be in any possible permutation. So we also add a multinomial coefficient (http://en.wikipedia.org/wiki/Binomial_coefficient#Generalization_to_multinomial s) which equals to something like 78!/(2^39)

Multiplying both we get the probability of any possible 78-number combination where each number (except for 40) appears twice to be P(78)=2.25508*10^-22. Which is very little, but then again we're fairly picky on which numbers we want.

Now we said that we're of course also perfectly happy with having more than two numbers of each, so let's say we add another attempt that is still OK.
This attempt can give us back any number, since we've already got the bases covered. Since we also have P(78) it's fairly easy to expand it from there.

The throw can be anthing but 40, and the probability for something like that is 39/40. The multinominal also changes to 79!/(3*2^77), since one number is now repeated 3 times. Calculating all this we get
P(79) = 1/40^78 * 39/40 * 79!/(3*2^77) = 1.15798*10^-20

The real fun begins at 80. Compared to P(79) we add another 39/40, change the 79! -> 80!, but the last part can now be either (3^2*2^76) or (4*2^77), depending on whether the added number is the same as in 79 throws or different (a much more likely outcome). Nedless to say it gets progressively more difficult for higher number of required throws. There may be a better solution that i'm toying with, but before i delve into that i'd like a confirmation that i understood the problem right this time. :smalltongue:

TLDR calculated it for 78 and 79 throws, gets insanely complicated later.

I... think I actually understood that. I'll have to poke it a bit closer when I don't have such a headache, but at the very least I might have ended up with a firmer grasp on multinominal coefficients. Thank you.

ChristianSt

2014-01-23, 08:45 PM

I see two (maybe) practical solution to your problem:

1) Simulation: Just writing a program with the rules you proposed and wiring a (good!) random number generator to it would give an accurate solution. Of course this is not exact, and you might get into numerical problems (or problems with your random numbers), and I don't know how fast it would converge. But it is imo the simplest solution.

2) Using a Markov Chain (http://en.wikipedia.org/wiki/Markov_chain): Basically you can view your setup as a transition from one state to another. You start in the state "no number drawn" and you have the two final states "drawn all numbers beside 40 at least twice" and "drawn 40 before that". I'm not exactly sure how your odds are (In the first post you have wighted the multiples of 10 differently, but in the later posts it seems you are fine with 1/40 for all numbers).

If you don't summarize the states the number of states would be 3^39+1 which is around 4*10^18 (so quite large) [you can represent each state as a 39-digit number consisting of 0,1,2. Each digit represents how often that number is drawn, if you draw a number you have already drawn two times, you simply stay the same state, since you don't care about multiples. You need one state more for the second end state]. For the probabilities for the first case I probably would try to use following states: "A*(# 10/20/30 drawn twice) + B*(# 10/20/30 drawn once) + X*(# Numbers drawn twice outside 10/20/30) + Y*(# Numbers drawn once outside 10/20/30)", which A+B<=3 [giving 10 options], while X+Y<=36 [giving 703 options], so only have 703*10+1 states. [With equal probabilities you even get down to 820+1 states].

If you have all states you than need a transition matrix P, containing the probabilities to get from each state to each other state (which should be rather sparse, since if you combine states, each state should have only a rather small amount of possible follow ups). That should be rather easy to compute, if there are problems here you can try to use other states that make this step easier.

Now comes the really fun part: If you have the state transition matrix you can than multiply it infinitely with itself (so the limes of n to infinity of P^n). If it possible to solve that analytically you would get an exact solution. (Otherwise you need to approximate it numerically)

I'm not sure that the Markov Chain is the easiest way to get an exact solution, but that is the way I would try if I would need an exact solution.
If I would not need an exact solution I would certainly do a simulation.

reaverb

2014-01-23, 09:04 PM

Wow another math question already? (http://www.giantitp.com/forums/showthread.php?t=323156)

It's even about the probability of certain outcomes when of repeated trials when certain combination of the outcomes end the trials in a success or fail.

OP: Please elaborate on the actual problem. If you look through that thread, you'll see that once the OP gave a complete description of the problem, it took a single paragraph and middle school algebra to give a comprehensive answer.

Everybody else trying to solve this: ChristainSt's mention of Markov Chains is the only thing I can see that I think could work, although I won't have phrased it quite that way since I'm unfamiliar with Markov Chains or transition matrixes (and you don't need math that complicated to give an exact solution to this).

Kato

2014-01-24, 05:24 AM

ChristianSt

2014-01-24, 06:58 AM

Everybody else trying to solve this: ChristainSt's mention of Markov Chains is the only thing I can see that I think could work, although I won't have phrased it quite that way since I'm unfamiliar with Markov Chains or transition matrixes (and you don't need math that complicated to give an exact solution to this).

As for Christian's answer... A Markov chain might be a good idea but just from a random estimate I don't think there will be an analytical solution to it.
I'm not quite sure I can follow your idea to simplify the matrix, though and without it... Also, maybe I'm totally missing something but how is 39^3 even close to 10^18? Yeah, about 60.000 states is quite bit as well, but far from 10^18... And I guess still to the point where it would be quite a bother if possible to solve.

So yeah... I'm sure there is a solution and I guess Grozomah was on the right track but it's not a nice one. I think I've said it before and Christian also mentioned it: If you are happy with just the chance, go for a simulation.

:smallredface: ups, I made a typo there: that is not 39^3 but 3^39 (since each of the 39 numbers has 3 possibilities: 3 times 3 times 3 .... ), and yeah, while 39^3 is rather small, 3^39 is rather big. That is also the reason why imo you need (or should) reduce the number of states (hence combining multiple of those states into new states). For example it makes zero difference which numbers are drawn exactly (if they have the same probability to be drawn), but only how many are drawn. For example you have the state "22 drawn two times and 37 drawn one time". But from a mathematical standpoint it behaves exactly like the state "1 drawn two times and 39 drawn one time". So you can view both (and many others) as one single state - each additional state makes further steps more complicated.

ETA:
Why it would simplify the matrix:

the matrix is NxN where N is the number of states. And it is much easier to work with a 800x800 matrix than an 200000x200000 matrix :smallwink:
From the state "One number drawn once" you can only go into a small number of other states: "One number drawn once" or "Two numbers drawn once" or "One number drawn twice" or "final state: drawn 40 before each number drawn twice". The probability of all other states are 0. [Of course if you don't sum up states, it is rather easy to state the probability of that transit to happen, but you have more possibilities]. So maybe that doesn't make it easier, but shift problems a bit (but still a smaller matrix is a smaller matrix).

EDIT: "One number drawn once" can't go into "One number drawn once", since there isn't any number you can roll to not change the state. But you have a non-zero probability of going from "One number drawn twice" to "One number drawn twice" again for example. :smallsigh: those errors ...

And reaverb is probably right that it is not needed to use a transition matrix: Since you can basically reduce* the state graph to a directed acyclic graph (DAG; basically a graph without cycles) it should be rather easy to sum up the probabilities somehow (e.g. sum up all paths, there is a finite number of those, through the graph). So yes, I think it is pretty likely to get an exact solution out of the process (If you don't make rounding errors while summing up).

*You can't go back (which would be to undraw numbers) and since you can indefinitely reroll, you can change the transitions from e.g. "A" goes with 20% to "A", 20% to "B" and 60% to "C" -> "A" goes with 25% to "B" and 75% to "C".

But I have pondered some more and maybe there is another way to go, using conditional probabilities:
A conditional probability P("A"|"B") is the probability of "A" happening if "B" is happening. (Probability of "A" given "B")

You want

P("drawing 40 after drawing 1-39 each at least twice")
You can rewrite this as:
SUM[over all n] P("drawing 1-39 each at least twice among the first n numbers, which have no 40 among them and the n+1 drawing being 40")
This can be further rewritten as:
SUM[over all n] P("drawing 1-39 each at least twice among n numbers" | "no 40 among those n numbers") * P("(first) 40 is number n+1")
The second probability is rather trivial: it is (1-P("40"))^n*P("40")
The first is much more complicated, but should be much easier than the original problem (since you only draw numbers among 1...39; so you can't fail through drawing a 40 anymore, you can only fail because you haven't drawn each twice). It is 0 for n<=77, and for n=78 it should be p=P("1"|"no 40")^2*P("2"|"no 40")^2*...*P("39"|"no 40")^2*(78!/(2!^39))#. After that it gets more ugly, and I don't want to think about it right now :smalltongue:. (But maybe there is a formula out there that can compute this - and as always it might be useful to look at the complementary event(s))

#the P("X"|"no 40") basically means you need to "re-normalize" your probabilities. So if you have originally P("1")=0.4; P("37")=0.2 and P("40")=0.4; you would have P("1"|"no 40")=2/3 and P("37"|"no 40")=1/3. If all numbers are equally likely it goes from 1/40 to 1/39.

After that you have "only" to sum them up (depending on how the first term turns out to be, it might quite hard to do it analytically).

Kato

2014-01-24, 07:36 AM

ChristianSt

2014-01-24, 08:06 AM

Geez, my bad, I should have noticed it was just a typo. Of course it's 3^39 :smallredface:

I think I get what you are going at with the simplification method now, too. I thought it would be more difficult but I guess it's not that hard... still, pretty large matrix to find a analytic solution.

You noticed that there was an error at least :smallwink: (and also noted correctly that 39^3 << 10^18)

Yeah, I didn't say that it is guaranteed to yield an exact solution. I only did say that if I needed an exact solution, I would try it that way (since I think it is rather easy compared to other methods mentions here). And as I stated, since the state graph is a DAG, it should be at least possibly to use that directly to compute the exact probability if it infeasible to compute the limes of the transition matrix.

(And it certainly might be possible that there is a more feasible option to yield an exact solution)