![]() |
Statistics question I know there's a fairly good amount of Math and Stats guys out there, so let's try this. I was asking myself if it was accurate to use the Central Limit Theorem (confidence interval) to calculate the margin of error (say in pre-electoral polls) when there's more than two "serious options" (say >2 candidates with chances of winning). The margin of error on one percentage is calculated this way: +-2 ([p(1-p)]/n)^0.5. (19/20, hence the 2) But does it really makes sense to used a centric model (50-50 on each side of the curve)? Especially when there is >2 probable answers? Wouldn't some other probability law modeling, multidimensional or/and not centered, be more accurate? I'm probably confused on some point, I'm not very good at this. One hint that makes me think I'm just wrong somewhere is that this way of calculating only gives the margin of error in regards of the sample size, not other stuff. (But this begs the question, is there better ways?) Also, is there ways of calculating the margin of error that includes the different %? I've seen with one and two percentages but not more. My statistics and polls books/manuals are too introductory to help me on this, hence me asking :) Help? |
Would a confidence interval sufficiently solve your problem? |
Quote:
edit: I think I mixed up statistical error and margin of error for the >2 thing... I'm still wondering why we always suppose it always follows a Gaussian Law... One of my manuals on polls says it "most often works" but doesn't add any details or sources to that statement. I'm curious. |
Hing fow matug. Patuk ila midta farang. |
That's Malaysian for "watchuonaboutwillis?" |
Quote:
Actually, I was speaking Klingon, but your interpretation was almost perfect. |
Quote:
|
Quote:
|
This is going to wind up very uninteresting. |
Quote:
|
I was mixed up, I managed to sort this out. :D And who the hell speaks klingon anyways, I'd rather speak numbers :p |
Quote:
|
I left my house yesterday without a t-shirt, yadda yadda yadda, I woke up on a subway surrounding by asian schoolgirls doing math homework. |
Whatdoyoumeanschoolgirls? |
Don't be boring, use your imagination. How many schoolgirls does it take to fill a subway car, and could any of them answer the quiz? |
Quote:
|
I prefer regression, cloud cluster analysis and metaheuristics. Gaussian curve is cheap but sadly needed for many regressions :p I intend to get good at stat modeling. And maybe in a life time or three I'll be able to create metaheuristics models. |
Besides making the math easy, Gaussian distributions are often a good assumption in the physical sciences because many observables are the result of averaging processes. For instance the voltage on the terminals of a resistor is the average of a bazillion electrons whanging back and forth, and follows Gaussian statistics. I trust a plain 100k resistor to function as a Gaussian noise source. On the other hand, the characteristics of components such as tubes and JFETs are not Gaussian because they go through a sorting process and out-of-spec parts are thrown away. Given that I'm not a statistician, I usually cheat and find confidence intervals by Monte Carlo simulation. If nothing else, it's a useful check for whether the formula that I'm using is reasonable. |
Haha, have fun with that. :p My undergraduate training was in psychology, so I was raised on a steady diet of ANOVA, t-tests, chi-squares, and correlations, with the occasional OLS or logistical regression thrown in. My doctoral training has a heavy sociological and program evaluation slant to it, which means way more regression in order to get statistical control. But I like to say that you can take the psychologist out of psychology, but you can't take the psychology out of the psychologist.* The lenghts I'll go to design true experiments is almost comical due to my obsession with internal validity. :D *I realize that makes absolutely no sense, but I thought it sounded cool. P.s. And oh, if you are are planning on doing the level of analysis that you're talking about, I don't know what statistical software you're using, but if it's SPSS, you're eventually going to find it limiting. SPSS is ANOVA-based. For high level stats, Stata will take you a lot further. Or if you are really a super-duper genuis, you can start using SAS. |
Gaussian distributions are definitely not the same in natural sciences, if I recall correctly most stats book I've checked introduce the notion with examples taken from nat.sci. then it gets a bit fishy with lines like "while it's not as perfect in social science, it generally works pretty okay" :p |
| All times are GMT -6. The time now is 12:57 PM. |
Powered by vBulletin® Version 3.6.12
Copyright ©2000 - 2013, Jelsoft Enterprises Ltd.