We've all seen media polls. What we don't
often see are the details of the polling apparatus; most
significantly "Who and how many are polled?" A
second issue is hidden from view. . .polling reliability
(or confidence). Some of these issues are discussed
below. We take polls for one big reason: collecting data from everyone is too time consuming and (most importantly) too expensive. In a perfect world, poll-taking would be simple. In the real world it is not. Let's discuss the perfect world first. In the perfect world we'd put the names of all voters in a great big hat, shake the hat up, and draw names that would constitute the sample. These people would be contacted and surveyed; their responses would lead to the "results" (something like: Clint Billon has 54% of the vote with a polling error of ± 3%). If you've followed election polling, one thing you learn is that each poll seems to give a different result. This is due to the randomness of the selection process. There are n sampled people. Suppose X of them have some property (such as "will vote for Clint Billon"). We estimate the fraction of all people having this property with ESTIMATE = X ÷ n Multiply by 100% to obtain a percentage. Using this method, and operating in this perfect world, the polling error is related to the sample size n through (approximately) POLLING ERROR = 1 ÷ SQRT(n) where SQRT stands for "square root of." Again, multiply by 100 to obtain the value in % form. (This "recipe" or "formula" should only be used when the fraction is somewhere near half -- 50%. There are adjustments for other cases). So if the sample size is n = 100 then the polling error is approximately ± 10%. If n = 1000 we get a polling error of approximately ± 3.2%. In fact, it takes about 1000 people sampled in this fashion (drawn from the big hat) to achieve a margin of error of approximately ± 3%. So, suppose we sample (randomly, of course) 1000
voters; 541 favor candidate Clint Billon. That's 54.1%
with a polling error of ± 3.2%. It's always good to look
at the endpoints of the implied interval: 54.1 - 3.2 =
50.9, 54.1 + 3.2 = 57.3. So, we might also state our
interval as the range from 50.9% to 57.3%. Of course, if
we are the media, we round these values off to avoid
scaring those readers who don't like decimals: 51% to
57%. (Actually, the results Then, what does this 51% to 57% mean? There's no
uncertainty about the polled people; the polling error
must reflect something about those people left unpolled
(almost everyone it turns out). Most people understand
that the range of values given by the poll estimates a
percentage of The best question you might ask yourself is: How can
they be so sure? After all, if 60% of all voters intend
to vote for Billon, it certainly seems possible (if
improbable) that 54% of the They're not! They're only 95% sure! This (95%) is the number that
nobody ever tells you about. There's no way of knowing
(before the fact) whether the poll result (including
margin of error) is correct or not. In our example, we've
got a successful poll IF the percentage of ALL voters
favoring Billon is between 51% and 57% (within 3% of
54%). Otherwise we have a failed poll. It's impossible to
determine which type of poll we have. However. . .and
this is the point. . .it is the case that 95% of all
polls conducted in this fashion are successful ones. The
95% refers to the reliability -- or For instance, I have 20 pennies in my pocket, 19 (95%) are U.S, the other is Canadian. I mix up the coins and pull one out of the pocket. I look at it; you do not. You feel the same way about that penny being U.S. as you do about the results of any (media) election poll. That is: the poll either is or is not correct (the penny either is or is not U.S.), but in 95% of all cases it is correct (in 95% of all cases I will have a U.S. coin in my hand). To test your understanding, here's a little assignment! |