PDA

View Full Version : Statistics - confidence intervals



Grozomah
2014-04-30, 03:41 AM
Howdy playground!

Recently i've been taking a look at some statistics and I've been struggling with confidence intervals during parameter estimation. Namely i've been trying to solve the following problem:


When initiated, the process X returns a value, which is distributed according to the Poisson distribution with mean M. We perform a single measurement and get the value 7. What is the lower, upper and central confidence interval for the mean M, if we want a 95 % confidence level.

I'm guessing that what i need to e.g. for the upper limit is to do is the sum from 7 to infinity of the distribution, while varying the mean, until i get the desired confidence level? And the sum from 0 to 7 for the lower border?

If anyone would be able to shine some light on this subject it would be fantastic. :)

shadow_archmagi
2014-04-30, 07:45 AM
It might be easier if to understand the problem if we had the full text. (At least, I'm assuming that you didn't post the full text, because you're not going to get a confidence interval from JUST the mean)

Grozomah
2014-04-30, 08:46 AM
Well, actually that is the whole problem.
And we don't know the mean. The only thing we know is that whatever unspecified task we're doing, the result is poissonially distributed and we got back a 7. We want how broadly to specify the intervals in order to be 95 % certain we got the mean in our interval.

So far i've done the upper [and lower] confidence interval by searching for the right mean for which the sum from 7 to infinity [and from 0 to 7] turned out to be 0.95. And while i'm not 100 % certain it's correct, i think it shuld be fine. I'm still struggling with the central interval.

warty goblin
2014-04-30, 09:19 AM
The lower confidence interval is an interval that gives a lower bound only, i.e is of the form (L, infinity), and basically says we have 95 (or whatever) % confidence that the parameter (here the mean) is no less than L.

The upper is similar, except it is of the form (-infinity, U).

The central is the usual (L, U) form. Usually one takes L and U so that there is equal mass in each tail. Note that L and U for the central interval will not be equal to the L and U from the lower and upper intervals if the upper and lower intervals have the same coverage probability.

Now there are multiple ways to get a confidence interval for a parameter. Many rely on asymptotic properties of certain types of estimators; basically what their sampling distribution tends to as the sample size gets very large. Clearly those are not valid in this context, because we only have sample. Another method relies on knowing the distribution function of the estimator exactly, which does work here, because we have a single Poisson observation. This however demands the solution of some rather hairy equations - in fact I suspect they're transcendental and cannot be solved algebraically. It's not hard to do numerically though.

[edit: Poisson is not a scale family. Duh.]

The Random NPC
2014-04-30, 04:14 PM
Me: Hey, I'm taking a statistics class right now! Maybe this thread will be able to help me!
Forum: MATHMATHMATHMATHMATHMATHMATHMATHMATHMATHMATHMATHMA THMATHMATHMATHMATHMATHMATHMATHMATHMATHMATHMATHMATH MATHMATHMATHMATHMATHMATHMATHMATHMATHMATHMATHMATHMA THMATHMATHMATHMATHMATHMATHMATHMATHMATHMATH.
Me: :smalleek: Maybe I'll just ask my professor...






I don't actually have a need for help at the moment, I just figured out what I was doing wrong.

Waar
2014-05-02, 06:06 AM
Howdy playground!

Recently i've been taking a look at some statistics and I've been struggling with confidence intervals during parameter estimation. Namely i've been trying to solve the following problem:



I'm guessing that what i need to e.g. for the upper limit is to do is the sum from 7 to infinity of the distribution, while varying the mean, until i get the desired confidence level? And the sum from 0 to 7 for the lower border?

If anyone would be able to shine some light on this subject it would be fantastic. :)

I think that what you're supposed to do is: assume 7 is the mean then make a 95% confidence interval (for the poisson ditribution), which should give you a 95% confidence interval for the actual mean.

Brother Oni
2014-05-02, 07:56 AM
Me: Hey, I'm taking a statistics class right now! Maybe this thread will be able to help me!
Forum: MATHMATHMATHMATHMATHMATHMATHMATHMATHMATHMATHMATHMA THMATHMATHMATHMATHMATHMATHMATHMATHMATHMATHMATHMATH MATHMATHMATHMATHMATHMATHMATHMATHMATHMATHMATHMATHMA THMATHMATHMATHMATHMATHMATHMATHMATHMATHMATH.
Me: :smalleek: Maybe I'll just ask my professor...

I don't actually have a need for help at the moment, I just figured out what I was doing wrong.

Well there's a guy asking for help with his Chemistry classes (Gen Chem II I think) which we're helping out with. You could always ask the question when you get stuck and ask us to dumb it down simplify our answers. :smalltongue:

The Random NPC
2014-05-02, 02:51 PM
Now that I think about it, I do have a question. I have 2 data sets for pulse rates of the same college women before and after a fright. I'm testing the hypothesis that the mean of college women's pulse rates are higher after a fright, with a 5% significance level. According to my research, in situations like this you should use a T test as Z tests are biased. Can I still use a Z test, or will the bias make it too risky? Do I have the right formulas for the T test?
T = (Xbar1 - Xbar2) / sqr root (((S1^2) / N1)+((S2^2)/N2))
S1 = sqr root ((1/(n1-1))Σ(X1 - X bar1)^2)

EDIT: I know my end answers are correct, as I can use a calculator, but I'm trying to figure out how to show my work. It's a lot harder without a good understanding of what I'm doing.

warty goblin
2014-05-02, 03:26 PM
I think that what you're supposed to do is: assume 7 is the mean then make a 95% confidence interval (for the poisson ditribution), which should give you a 95% confidence interval for the actual mean.
That doesn't quite work, simply because if you assume you know the mean you can't meaningfully do a confidence interval. What you can do is use the fact that you know the distribution of X as a function of the parameter to produce a confidence interval. The theoretical justification for this is based off of the probability integral transform. This is pretty much what the OP was proposing.

What you describe is what it looks like you do for confidence intervals for the means of normal distributions. Under the probabilistic hood however that isn't what's really going on, it only looks like that because the sampling distribution of the sample mean from a normal distribution is exactly normal with the same mean as the data distribution, and a scaled variance. For the Poisson however, the sample mean is generally not Poisson distributed (in fact it cannot be for sample sizes greater than one).


Now that I think about it, I do have a question. I have 2 data sets for pulse rates of college women before and after a fright. I'm testing the hypothesis that the mean of college women's pulse rates are higher after a fright, with a 5% significance level. According to my research, in situations like this, you should use a T test as Z test are biased. Can I still use a Z test, or will the bias make it too risky? Do I have the right formulas for the T test?
T = (Xbar1 - Xbar2) / sqr root (((S1^2) / N1)+((S2^2)/N2))
S1 = sqr root ((1/(n1-1))Σ(X1 - X bar1)^2)

Are these the same women? If so, I do not think the standard two-sample T test is appropriate, since it assumes independence between the first and second samples. Two observations on the same woman however should not be viewed as independent - one would expect somebody with a relatively high pulse before the fright to have a relatively high pulse after the fright as well. What you do then is create a single dataset that looks at the differences in before and after pulse rate for each woman, and test whether there's significant evidence that the average difference is greater than zero. In this case, assuming your data isn't badly skewed, suffering from weird outliers, or other maladies (or if you are, you have a reasonably large sample), a one-sample T test is appropriate.

If you do in fact have two samples of different women, you can use a paired T test. However which T test you use depends on whether you believe you have equal variances in the first and second groups. There are test statistics and reference distributions for this, but I don't have an enormous amount of faith in them, so it's something of a judgement call.

If you conclude your variances are the same, you use the pooled variance estimator:Sp = sqrt((N1 - 1)S2X + (N2 - 2)S2Y)/(N1 + N2 - 2)), and your denominator becomes sqrt(1/N1 + 1/N2)*Sp. Your degrees of freedom will be N1 + N2 - 2. Here S2X, S2Y are the first and second sample variances.

If your variances are not the same, use S = sqrt(S2X/N1 + S2y/N2) as the denominator. In this case, you need to use an approximate degrees of freedom given by the adorable Satterthwaite (http://en.wikipedia.org/wiki/T_test#Independent_.28unpaired.29_samples)approxim ation. Which I'm not typing.

Using the standard normal Z as a reference distribution instead of the T is not legitimate in cases like this. Theoretically it's simply the wrong distribution, since the T is the exact distribution of a normal divided by the root of a scaled chi squared. Under the assumptions of a normal data distribution, this is exactly what Xbar/(sigma/sqrt(n)) is. Intuitively, the normal is appropriate if you know the population variance, and the thicker tails of the T account for the additional uncertainty due to estimating sigma squared. The effect will be that a hypothesis test will fail to reject when in fact it should reject, and your confidence interval will be too narrow and have coverage under the 1 - alpha rate. Since the T converges to the standard normal as the degrees of freedom goes to infinity, for very large datasets where the variance can be well estimated the difference is extremely minor, but for small samples this makes a big difference.

Hope that helps!

The Random NPC
2014-05-02, 03:48 PM
Yes, they are the same women, here is the data if you'd like it.
Before

Before
After


64
68


100
112


80
84


60
68


92
104


80
92


68
72


84
88


80
80


68
92


60
76


68
72


68
80




So if I understand correctly, I subtract the After column from the Before column, and then do a T test on it? Assuming you have access to a TI-84 Plus, would that be the T test under Stat->Tests->T-Test? Assuming that is the test, and that mu0 is 0, with a frequency of 1 (no idea what that is) and mu>mu0, I get a T value of -4.9 and a p value of 1. That seems rather large to me.

warty goblin
2014-05-02, 04:33 PM
Yes, they are the same women, here is the data if you'd like it.
Before

Before
After


64
68


100
112


80
84


60
68


92
104


80
92


68
72


84
88


80
80


68
92


60
76


68
72


68
80




So if I understand correctly, I subtract the After column from the Before column, and then do a T test on it? Assuming you have access to a TI-84 Plus, would that be the T test under Stat->Tests->T-Test? Assuming that is the test, and that mu0 is 0, with a frequency of 1 (no idea what that is) and mu>mu0, I get a T value of -4.9 and a p value of 1. That seems rather large to me.

Yes, you can do before - after, although I'd probably do after - before, simply because we're interested in whether there has been a positive change, but formulated correctly either will give the correct answer. Since we're interested in if the mean heart rate has increased, then you want to test

H0: mu <= 0 against
HA: mu > 0

if you do after - before. If you do before - after, you would test

H0: mu >= 0 vs
HA: mu < 0.

I think what you are doing is before - after, then testing H0: mu <= 0 against HA: mu > 0. Needless to say, your data is consistent with a lower heart rate before the fright than after (your test statistic is negative), so you fail to reject the null under that test.

The Random NPC
2014-05-02, 04:55 PM
So I think I was doing it wrong before, as when I tried mu < 0 I didn't see the x10-4 and I thought I got a probability greater than 1. If I understand correctly now, the formula for T should be http://www.cliffsnotes.com/assets/267199.png where delta is 0 and s is the standard deviation of the sample. Putting in the calculator gives me a T value of -4.9 and a P value of 0.00018. That look about right?

EDIT: Coincidentally, all of the tests I've been doing tell me to reject the null, except this one, and I think it's because I'm getting confused on whether it should be mu: < mu0 or mu: > mu0