Wednesday, October 24, 2012

Polling Fudge

I don’t work for any polling group.  On the one hand, this means I don’t have first-hand knowledge of how a poll determines its methodology, weighting, or resolves conflicts, but on the other it also means I am free from any pressure to excuse or cover-up mistakes or false claims.  One of the boldest of those claims is not stated explicitly, but is allowed to fly unchallenged – that the polls are accurate measures of voter opinion and are reflected by results close to their prediction.  That’s just not so, when you check it out.

Here are some interesting final poll results from Presidential elections:

Gallup 1992:  Clinton 49, Bush 37 (off by 6.4 points)
Gallup 1980:  Reagan 47, Carter 44 (off by 6.8 points)
Harris 1992:  Clinton 49, Bush 37 (off by 6.4 points)  
Harris 1984:  Reagan 56, Mondale 44 (off by 6.4 points)
CBS/NYT 2008:  Obama 51, McCain 42 (off by 6.0 points)
CBS/NYT 2000:  Bush 44, Gore 45 (off by 7.3 points)
CBS/NYT 1996:  Clinton 53, Dole 35 (off by 9.5 points)
CBS 1980: Reagan 44, Carter 43 (off by 8.8 points)
CBS 1976:  Carter 45, Ford 41 (off by 12.2 points)
USA Today 2008:  Obama 50, McCain 42 (off by 7.0 points)
USA Today 1992:  Clinton 49, Bush 37 (off by 6.4 points)
USA Today 1988:  Bush 52, Dukakis 42 (off by 5.1 points)
USA Today 1984:  Reagan 61, Mondale 34 (off by 8.6 points)
NBC/WSJ 2008:  Obama 51, McCain 43 (off by 5.0 points)
NBC 2000:  Bush 47, Gore 44 (off by 5.3 points)
NBC 1980:  Reagan 42, Carter 36 (off by 13.8 points)
Fox 2000:  Bush 43, Gore 43 (off by 10.3 points)
Ipsos 2004:  Bush 46, Kerry 49 (off by 5.4 points)
IBD 2004:  Bush 46.9, Kerry 44.3 (off by 7.8 points)
Rasmussen 2000:  Bush 40, Gore 49 (off by 8.5 points)
Pew 1996:  Clinton 52, Dole 38 (off by 5.5 points)
Marist 2000:  Bush 49, Gore 44 (off by 5.5 points)
Newsweek 2000:  Bush 45, Gore 43 (off by 8.3 points)
LA Times 2008:  Obama 50, McCain 41 (off by 8.0 points)

 What should be noticed in these polls, is not only how far off they ended up, but the fact that polls often over- or under-estimated one candidate’s support .  That, by the way, is also why my review of their errors is different than you will hear from some polls.  Some will simply compare the margin in their final poll to the election margin, while others will take the total variance between their poll and the election result and cut it in half to call it an “average”, but neither is statistically correct.  Polls measure specific levels of support for each candidate, and so their margin of error is actually the total distance between their call and the result for each candidate.  As an example, let’s say PollCo releases a poll saying candidate A will beat candidate B 53% to 45%, but in fact candidate A wins 51% to 48%.  The poll might claim that their margin was +2 for A and -3 for B, so the average margin is 0.5 points off, but in fact the actual margin of error would be 5 points off.  This is often trivial in itself, but let’s say PollCo is usually off between 4 and 5 points, understating one candidate while overstating another.  Shouldn’t you know that history when, say, in another year PollCo says candidate G is leading candidate H 49% to 48%?  In a close race, a poll with a history of missing the mark by a sizable chunk is not really reliable, is it?

Ah, but there’s more.  Only polling nerds like myself would recognize the name Walter Mitofsky, but this gentlemen was a legend in opinion polling.  Having worked for the Census Bureau then CBS News, Mitofsky for all practical purposes created the Exit Poll as we know it.  Mitofsky knew how polls worked, and significantly observed that there is a chronic bias in favor of Democrat candidates in opinion polling, not just once in a while, but all the time. 

There are exceptions, but in general polls tend to undervalue support for both Republican candidates and challengers.  The implications for this election are rather obvious.

I have said before that polls try to get the results right, but we should be very careful to test their headline claims, which are often driven by the narrative of the moment.  This week it’s amusing to hear the excuses being thrown out by Democrats, that the state polls are correct while the national polls, somehow, are not.  Republicans, in some cases, say the opposite, that the state polls are wrong while the national polls are right.  What I think is something else entirely – a lot of people do not realize what the polls are really telling us, and so assumptions drive emotion to error.

Let’s start with three obvious facts:

1.  In general, poll groups try very hard to publish accurate representations of voter intent.  This point often gets lost in all the emotion, but looking for conspiracies or attempts to mislead voters.  Errors happen but are honest mistakes due to faulty (sometimes common) false assumptions.

2.   All polls have errors and unknowns.  Expecting a poll to be perfect means you expect voters to have no doubts, to never change their mind, and to respond to poll queries in the exact proportions that they will vote, demographically.  To understand a poll, you do not count it as part of an average of polls, you do not assume that the margin held at any date a week or more out will hold through the end, and you do not ignore the internal data.  To understand a poll, you note shifts in trends, momentum, you observe weak and strong demographic groups for each candidate, and you make sure the poll has not changed its methodology or demographic weights since the last poll release.

3.  The state and national polls are inextricably linked.  If there is disagreement in the topline conclusion between state and national polls, either the national polls will correct to be in line with the correct state polls, or state polls will correct to be in line with the national polls.  This does not happen because someone wants to avoid embarrassment, but because math requires it.  Four quarts always make a gallon, sixteen ounces always make up a pound, and the fifty states plus D.C. have to make up the national total.  Resolution is inevitable.

So why is there argument?  For one thing, a lot of otherwise intelligent people do not seem to understand that 2012 is not 2008.  The economy, world condition, social and legal issues, are all different from four years ago.  The candidates are different from four years ago, including Barack Obama.  Mister Obama cannot run as the fresh young challenger this year, he has to run on his record and accept Mister Romney will be on offense this time.  What that means in the polls is that many assumptions have to change to match the new paradigm, especially in demographic terms.  A rather large number of polls were set up on assumptions which are clearly in dispute now.  As a result, state and national polls are sometimes in sharp disagreement about the party participation by Democrats, Republicans, and Independents in this election.  Also, Pew reported earlier this year that the response rate by voters to polling groups has plummeted below ten percent, indicating  large portion of voters do not respond to poll queries, which calls poll results into question, especially when sample sizes are low, as is generally the case with state polling.  Also, state polls are performed far less frequently than national polls, and even then by a variety of agencies rather than by the same groups on a regular schedule.  Consequently, if something shifts within the dynamics of a state, the state polls tend to lag behind national polls in observing and reporting the new trend.     

