PDA

View Full Version : Statistic refresher needed



Cikomyr
2013-05-02, 04:45 AM
Hi people. My Statistics classes are long behind me, and this rust is currently biting me in the ass in my current essay research. I was hoping some of you could help me out with something.

We are currently running multivariate regression analysis. Thing is, I started to suspect some of our variables might not be valid to consider. In our 296 observation for each variables, we have encountered quite large number of "0" (more than half of the observation).

Each of these variables are based on a scale of 1 to 10. They came out as such:

Var Mean StDev Min Max
1 1.5 2.6 0 9
2 1.0 2.5 0 10
3 0.8 2.3 0 10
4 4.2 2.4 0 10
5 3.4 3.3 0 10
6 1.5 3.4 0 10


We will run correlation analysis of these with a much more bell-curved statistic. Here is my question:

Could the heavy-loading of results of "0" invalidate the use of variables in a regression analysis? Isn't there a way to determine if a statistic is statistically analyzable or not? Something like Mean/StDev?

Since for at least 4 of my 6 variables, the StDev is much bigger than the mean, and there is an absolute minimum of 0, I am worried that the data collected might be erroneous.

Brother Oni
2013-05-02, 06:07 AM
There's nothing wrong with the SD being larger than the mean - it just means you have highly variable data.

I suspect you may need to throw your data through an outlier test and exclude any values that are anomalous.

One possible place to start - is a value of 0 indicative of 'no data available' or a placeholder for missing data? If it's valid to do so, excluding these values from the data set for that variable may help making some sense of numbers.

Cikomyr
2013-05-02, 07:24 AM
Nah. A value of 0 means no effective measurable law..

Finlam
2013-05-02, 07:37 AM
I think this (https://groups.google.com/forum/?fromgroups=#!topic/sci.stat.math/DQmbV8axc5I) thread is a good reference.

What I gather from your standard deviation being higher than the mean, there are two likely possibilities: A)The original data is garbage (probably not the case) or B) the data has a tendency to fall toward the upper end of the spectrum. Needless to say, as you've already pointed out, you have a significant amount of values that are 0; these values will skew you mean and make it lower.

I would also second the idea of trying to remove outliers. While I would not only remove 0 values and would try to remove outliers at the top of the spectrum too, it may be useful to analyze the data without all the 0's impacting the mean.

warty goblin
2013-05-02, 12:59 PM
Nah. A value of 0 means no effective measurable law..

Is your data count data; e.g. counts of how many of something you observed? Because if so, a standard multinomial regression is completely the wrong model to fit. You need a Generalized Linear Model.

I suggest consulting with a statistician. If you're doing your research through a college, you can probably just get in touch with somebody in the stat department. They'll be able to ensure your anaylsis is reasonable, and are testing for the things you are interested in.


And the standard deviation being higher than the mean? Generally doesn't mean about the validity of your data, although it does suggest you'll have low power for things like t-tests .

In fact, if you data is count data, a higher standard deviation than mean is pretty typical. The mean and variance of a Poisson are the same parameter, and st.dev. is just the root of the variance.