I know there's a fairly good amount of Math and Stats guys out there, so let's try this. I was asking myself if it was accurate to use the Central Limit Theorem (confidence interval) to calculate the margin of error (say in pre-electoral polls) when there's more than two "serious options" (say >2 candidates with chances of winning). The margin of error on one percentage is calculated this way: +-2 ([p(1-p)]/n)^0.5. (19/20, hence the 2) But does it really makes sense to used a centric model (50-50 on each side of the curve)? Especially when there is >2 probable answers? Wouldn't some other probability law modeling, multidimensional or/and not centered, be more accurate? I'm probably confused on some point, I'm not very good at this. One hint that makes me think I'm just wrong somewhere is that this way of calculating only gives the margin of error in regards of the sample size, not other stuff. (But this begs the question, is there better ways?) Also, is there ways of calculating the margin of error that includes the different %? I've seen with one and two percentages but not more. My statistics and polls books/manuals are too introductory to help me on this, hence me asking Help?

Most likely, it is considered as a "safe" way to calculate a margin of error. I was simply wondering if there was more precise ways and if my logic was flawed for the >2 part. edit: I think I mixed up statistical error and margin of error for the >2 thing... I'm still wondering why we always suppose it always follows a Gaussian Law... One of my manuals on polls says it "most often works" but doesn't add any details or sources to that statement. I'm curious.

Don't knock it, my German Shepherd (the valiant Sir Zachary) understands Klingon very well. I read it in one of those trekky books, and decided I'd teach him Klingon - people find it rather odd, but they never criticize lest I give him the Klingon word for "attack"! I should say though, when your carotid artery is being ripped out, language becomes completely irrelevant.

Considering the way it started, that's no surprise. Statistical analysis - few real humans understand that stuff. For those that do understand it, I'd love to have a game of chess with you.

I was mixed up, I managed to sort this out. And who the hell speaks klingon anyways, I'd rather speak numbers

I left my house yesterday without a t-shirt, yadda yadda yadda, I woke up on a subway surrounding by asian schoolgirls doing math homework.

Don't be boring, use your imagination. How many schoolgirls does it take to fill a subway car, and could any of them answer the quiz?

Statistical tests are built on assumptions, and many of them are built on the assumption that the sample follows a Guassian curve. Of course, life isn't perfect, and contrary to what the stats books tell you, many sampling distributions don't fall into a nice normal curve. Luckily, you can sometimes smooth a non-Guassian distribution with logarithms to a certain extent. It's a real handy technique when you're dealing with certain regression tests that require normal distributions. But you do have to be careful with interpretation then, because the data is no longer in raw form; and you have to transform the variables back for interpretation. On the other hand, some statistical tests, like robust regression, can deal fairly well with non-normal distributions.

I prefer regression, cloud cluster analysis and metaheuristics. Gaussian curve is cheap but sadly needed for many regressions I intend to get good at stat modeling. And maybe in a life time or three I'll be able to create metaheuristics models.

Besides making the math easy, Gaussian distributions are often a good assumption in the physical sciences because many observables are the result of averaging processes. For instance the voltage on the terminals of a resistor is the average of a bazillion electrons whanging back and forth, and follows Gaussian statistics. I trust a plain 100k resistor to function as a Gaussian noise source. On the other hand, the characteristics of components such as tubes and JFETs are not Gaussian because they go through a sorting process and out-of-spec parts are thrown away. Given that I'm not a statistician, I usually cheat and find confidence intervals by Monte Carlo simulation. If nothing else, it's a useful check for whether the formula that I'm using is reasonable.

Haha, have fun with that. My undergraduate training was in psychology, so I was raised on a steady diet of ANOVA, t-tests, chi-squares, and correlations, with the occasional OLS or logistical regression thrown in. My doctoral training has a heavy sociological and program evaluation slant to it, which means way more regression in order to get statistical control. But I like to say that you can take the psychologist out of psychology, but you can't take the psychology out of the psychologist.* The lenghts I'll go to design true experiments is almost comical due to my obsession with internal validity. *I realize that makes absolutely no sense, but I thought it sounded cool. P.s. And oh, if you are are planning on doing the level of analysis that you're talking about, I don't know what statistical software you're using, but if it's SPSS, you're eventually going to find it limiting. SPSS is ANOVA-based. For high level stats, Stata will take you a lot further. Or if you are really a super-duper genuis, you can start using SAS.

Gaussian distributions are definitely not the same in natural sciences, if I recall correctly most stats book I've checked introduce the notion with examples taken from nat.sci. then it gets a bit fishy with lines like "while it's not as perfect in social science, it generally works pretty okay"