Friday, October 31, 2008

More Fun With Poll Numbers

The election is coming to a close, or at least we hope so (thank you Al Gore for proving that sometimes the nightmare just continues). All along, I have been saying that the poll numbers are invalid on their own standards, and once again I found another reason to repeat that claim: The state polls contradict many of the national polls.

The claim made by those who like the polls, has generally run along the lines that they cannot all be wrong, and that a consensus of the polls should be trusted. I hardly agree, because of a factor in statistics known as collinearity. Here’s the formal definition from statistics.com: “In regression analysis , collinearity of two variables means that strong correlation exists between them, making it difficult or impossible to estimate their individual regression coefficients reliably.”


Informally, collinearity is a warning to statisticians to make sure that they are using data which is truly independent of other data. When data is redundant or co-related, using the additional data gives an invalid additional weight to the data used, corrupting the results. Tests have been created to detect multicollinearity, such as the Farrar-Glauber test (most commonly used in econometrics), but it does not appear that vector testing is commonly practiced in opinion poll analysis.

The math in that line of testing tends to get a bit complex for a casual discussion, so for here I will come back to another point of opinion polling: the statistical level of confidence. That is a critical test for an opinion poll, and what it means is a quick reference on whether the poll is valid. “Valid” does not mean right or wrong, it means the poll’s method is considered trustworthy. “Invalid” means that whatever the poll says, you should not rely on it. Again, I refer the reader to the National Council on Public Polls (NCPP), and their criteria for polling and their principles of disclosure. In short, when a poll will not tell you who paid for the poll, hides how many people refused to take the poll when contacted, or refused to release internal demographics used in the poll and from the response pool, that poll is in direct violation of NCPP rules and should not be taken seriously, even if you find their results believable. The bad news there, is that almost none of the publicly-released polls are in full compliance with NCPP standards.

Going back to the question of the confidence level, though, it’s a simple test for validity. All of the major polls use – or claim to use – similar methodologies and demographic weighting, with the exception of party affiliation weighting. Some of these groups insist that party affiliation is not a static demographic, and therefore should not be weighted at all, so for here we will use their logic in applying the numbers. The polls all claim a 95% confidence level. In statistics, they are saying that if the same method is used, polls should produce results within the margin of error 19 times or more out of every 20 polls. So, it should not be difficult to test that claim.

Here are the polls listed at Real Clear Politics for the last ten days (where a poll has been done more than once in that period, the most recent results are used) . I am listing these in descending order of support for Barack Obama, then in support for John McCain, noting a 3% claim for MOE and how many polls agree or disagree with the stated poll:

Pew Research – Oct 26 – Obama 53% (agree 8, disagree 4) FAIL
Newsweek – Oct 23 – Obama 53% (agree 8, disagree 4) FAIL
ABC News/WaPo – Oct 29 – Obama 52% (agree 9, disagree 3) FAIL
CBS News/NYT – Oct 29 – Obama 52% (agree 9, disagree 3) FAIL
Rasmussen - Oct 30 - Obama 51% (agree 11, disagree 1)
Gallup (Expanded) – Oct 29 – Obama 51% (agree 11, disagree 1)
Reuters/C-SPAN/Zogby - Oct 30 - Obama 50% (agree 12, disagree 0)
Gallup (Traditional) – Oct 29 – Obama 50% (agree 12, disagree 0)
Ipsos/McClatchey – Oct 27 - Obama 50% (agree 12, disagree 0)
GWU/Battleground – Oct 30 – Obama 49% (agree 10, disagree 2) FAIL
Diageo/Hotline – Oct 29 – Obama 48% (agree 8, disagree 4) FAIL
IBD/TIPP – Oct 29 – Obama 48% (agree 8, disagree 4) FAIL
FOX News – Oct 29 – Obama 47% (agree 6, disagree 6) FAIL


Rasmussen - Oct 30 - McCain 47% (agree 7, disagree 5) FAIL
GWU/Battleground – Oct 30 – McCain 45% (agree 9, disagree 3) FAIL
Gallup (Traditional) – Oct 29 – McCain 45% (agree 9, disagree 3) FAIL
Ipsos/McClatchey – Oct 27 – McCain 45% (agree 9, disagree 3) FAIL
FOX News – Oct 29 – McCain 44% (agree 11, disagree 1)
Gallup (Expanded) – Oct 29 – McCain 44% (agree 11, disagree 1)
ABC News/WaPo – Oct 29 – McCain 44% (agree 11, disagree 1)
IDB/TIPP – Oct 29 – McCain 44% (agree 11, disagree 1)
Reuters/C-SPAN/Zogby - Oct 30 - McCain 43% (agree 10, disagree 2) FAIL
Diageo/Hotline – Oct 29 – McCain 42% (agree 10, disagree 2) FAIL
CBS News/NYT – Oct 29 – McCain 41% (agree 8, disagree 4) FAIL
Newsweek – Oct 23 – McCain 41% (agree 8, disagree 4) FAIL
Pew Research – Oct 26 – McCain 38% (agree 2, disagree 10) FAIL


Note that every polling agency fails one side or the other of this validity test. Every one of them.

But let’s move on. We can look at the RCP averages from one of two perspectives. The RCP folks take the polls from the last week by polling date (not release date) and average them. That gives a claim that Obama is leading McCain 49.7% to 43.8%, with a 3 point MOE. If we extend that back to polls taken October 20 or later, then it becomes Obama 50.3%, McCain 43.3%. So, RCP’s national polls, if aggregated as they like it, show a 5.9% lead or a 7.0% lead.

OK, now let’s take a look at the RCP state polling. There are dozens of polling groups which have put out state polls, and I cannot speak here to their total authenticity. That, of course, is also a problem with some of the national polls, but for consistency we can use the RCP numbers. Now, if each state’s aggregate claimed level of support for Obama or McCain is applied to the state’s proportional level of the national vote (using 2004 voting statistics), we find that if the state aggregations are right for RCP’s state averages, plugging those numbers in gives Obama 46.9% of the popular vote, to 43.9% for McCain. The aggregation of the state polls, is we are going to accept them as valid, shows that the national polls are overstating Obama’s support. Once again, a simple check for validity shows that the confidence level test fails for the national polls.

One last thing. The state polls have assumed a significant shift from 2006 towards increased democratic participation, but even if that happens, the state polling indicates that Obama will still fail to reach 50% support. If those polls are reweighted according to 2006 turnout proportions and then plugged in to project national numbers, it becomes Obama 46.3% and McCain 47.1%, with 6.6% undecided. Take from that what you will.

12 comments:

Anonymous said...

DJ--

I appreciate your analysis.

I was shocked to discover that the rate of refusal is so high (80% or more).

Can you direct me to a post here where you explain the methodology the pollsters use to compensate for the high rates of refusal?

And isn't it likely that this year is going to exacerbate the problem? I think there's a huge gap in rate of refusal between Obama supporters and McCain supporters.

Do you have current rate of refusals for the major pollsters?

Anonymous said...

Good work but the same flaw in ALL these polls has not been addressed: the internals, when produced show a markedly different number of undecideds than thepublished "Horse race" number.

Regionally, for instance, IBD indicates undecideds at a cumulative 37%--more than four times that of the head-to-head 8%.

It really doesn't matter who is agreeing with whom if the essential number of undecideds is so far off.

Anonymous said...

DJ, and Deep,

Thank you so much for confirming what I've been thinking. Obama may win, but unless the many undecideds break his, it will not in in a landslide.

McCain still have a real chance, so all of his supporters Turn out, turn out, TURN OUT!

vnjagvet said...
This comment has been removed by the author.
vnjagvet said...

DJD:

Karl Rove said last night that significantly more polls have been published this year than ever before.

From reading your analysis, I have concluded that it is likely that this year the polls in the aggregate will show the most deviation from actual results in the history of polling.

Am I missing something in coming to that conclusion?

Anonymous said...

Article from AP (of all people) stated that about 1/7 or 14% of voters are not set on their decision.
At timesonline is an article about Obama and Co. trying to tone down expectations of what he can do if he wins the election. On another blog someone refered to that as "Bait & Switch".
Jerusalem Post has articles about Iran's Al-Quds force and new efforts to get highly enriched uranium.
World recession is really hitting the various 3rd world nations.
It will be an interesting and dangerous four years. Is any of the "average" voters paying attention outside the blogs?
rifle308

Bob said...

So this is like a conspiracy then. All these pollsters, some who got it dead on in 2004 for Bush are in the tank for Obama. Even real clear politics huh? Come on folks, all the polls show Obama ahead. It means something dont you understand. McCain is in deep trouble. To imply otherwise is denial. As a McCain supporter and one who believes he is solidly about 6-7 points behind, the only thing that an save him is if all Republicans turn out in record numbers thus killing the dem id thing. And a lot of help from Hillary people. Pray for that before you invent a conspiracy.

DJ Drummond said...

Gee bob, since I never said, implied, or hinted at a conspiracy, but gave a detailed and supported explanation of how things got this way, I'd be inclined to think you didn't bother to really read what I wrote.

Maybe you let your own assumptions drive your coments?

Unknown said...

Bob, it doesn't do justice to DJ's analysis to call it a conspiracy theory. You may want to read what he said and rethink your silly post. He is merely saying we just don't know because many of the base assumptions being used are fallacious. That doesn't mean that Obama isn't going to win, just that the premises that lead towards this (possibly correct) conclusion are erroneous.

Anonymous said...

Bob, it's not about conspiracy, it's about entering completely new terraine politically and using models that are not equipped to accurately gauge voter behavior.

There are at least THREE novel variables this cycle: a black american at the top of one ticket; an equally charismatic woman (despite what the MSM media would have you believe about Palin's dwindling popularity she outdrew Bill Clinton at Penn State Friday) on the bottom of the opposing ticket; and a kinetic international economic crisis.

There is simply no way to accurately gauge the concomitant
effect of these unknowns on the outcome of the race so pollsters are guessing.

Here's what one surprisingly candid pollster admitted:

"The varying size of Mr Obama’s lead in different polls reflects differences in methodology between rival pollsters as they grapple with uncertainties that make this year’s election unusually difficult to predict.

“It’s driving us all insane,” says Matt Towery, chairman of Insider Advantage, an an Atlanta-based polling company. “Anybody who says they have the right model for this election is a liar because nobody knows for sure.”

Were this not true Gallup would not be attempting to play both ends against the middle by using three polls, Zogby would not change his opinion three times in less than 10 days, and Obama would not be taking calls from Governor Rendell
to get back to Pennsylvania and to bring Bill Clinton with him.

Look at the internals of the TIPP/IBD or the Harris organization yourself to see how many voters are still on the fence. the number will not fall to 8%--10% until Monday. As of today--Saturday--it's probably still in the mid-20's
and that's an indication voters can't
as of the moment commit to Obama. McCain's electorate is already locked and loaded.

Anonymous said...

What do you suggest are the variables causing a problem with multicollinearity? Are you sure that the methodology isn't correcting for issues of multicollinearity - it's a pretty common issue in social scientific statistics not just isolated to economics?

Anonymous said...

Re: the variables

The greatest unknown is Obama himself. Notwithstanding his status as the first black American to be on the top of a presidential ticket, I can not remember a candidate who was at the same time the object of such adoration and contempt.

Nixon was hated by many but accepted by most during two election cycles.

George Bush, also, was despised by many but supported by more.

There is no emotional neutrality concerning Obama:
democrats deify him; republicans feel intense dislike for him.

I believe, reasonably, that the current models simply cannot accurately reflect how many people will openly confess to being at the very least suspicious about the guy and vote on that basis.

I am completely convinced that as of now, Sunday afternoon, there are still 20%-30% undecideds, that will be shaved to 8%-10% by Monday night and between 2% and 8% on Tuesday.

Many more people than pollsters want to admit are still on the fence. The profile of the run-up to election day looks very much like that of voters the weekend before the New Hampshire primary: Obama was ahead by double digits in all but two polls that had it close.

Nine months later pollsters still will not confess their models were insufficient to assess the way the public eventually voted--for Hillary Clinton by two
points.

They have all sorts of clever justifications for why they had it wrong--none of which suggested a hole in their methodology.

I see no reason now to assume they have acknowledged the "experimenter bias" in New Hampshire which resulted in the greatest polling disaster since 1948. That being so,I see no reason to believe that Rasmussen, Zogby, Gallup or any other mainstream pollster has got it right today.
mainstream polling organizations