Saturday, October 18, 2008

A Few Thoughts About the State Polls

Friday, I posted a review of where things stand right now in the state races. As I wrote at the time, I do not believe they will actually shake out exactly that way in the end, for a number of reasons, but the numbers are valid, even if they are a bit unexpected in places.

Some states have been big surprises for people, which in turn leads to polite requests to explain how I got there, as well as some less than polite suggestions that because the numbers are sometimes surprising, that makes them wrong. So, today’s article is a brief discussion about reweighting polls and the difficulties in finding the “true” numbers.

Let’s start with the fact that there are a lot of polls out there. Besides all the people doing national polls, there are over four dozen groups doing state polling. Some are professional and thorough, others much less so. The trouble is, people generally do not know how to read poll results and so what usually happens is that the polls are all thrown together and an average is taken and that’s what gets fed to the public as the true standings. The problem there, is that this is statistically invalid.

Statistics is a mathematical science, and opinion polls are some of the trickiest statistics to manage, since human behavior is sometimes erratic and even deliberately disruptive. As a result, effective polling is a work still very much in progress. You will hear some people say that polling has gotten very accurate, but that is not really true. As I have written many times, there are a lot of polls before the end of a political campaign which careen all over the place, making moves which appear to have no connection to the actual events. Worse, there are many instances where even the final poll from a respected group turned out to be very wrong. Take 1936, for instance. In that day, the most respected presidential opinion poll was the one done by the Literary Digest. The LD poll was mailed out to literally millions of people at a time, and had correctly predicted the results of federal elections in every year from 1916 to 1932. So, when LD picked Alf Landon to beat Franklin Roosevelt with 57% of the popular vote, using the results from over 10 million respondents (and to think, we are impressed when more than a thousand people answer a poll these days!), it was considered a very bad bit of news for Roosevelt. But a new company, formed by George Gallup, used a different system, using demographic weighting for his projections, predicted that FDR would win with 54% of the vote, even though his polls only consulted about a thousand people at a time. When Roosevelt did actually win with 61% of the vote, it was a stunning blow for the Literary Digest, and it launched the long career for George Gallup as a polling authority. It was, in fact, the Gallup Organization which first understood that a respondent pool had to be reweighted to match demographic norms in order to produce a valid reflection of the national mood.

But Gallup is a business, and polls discovered that reports which had no surprises, no drama, did not sell well to the media. So, while I cannot prove it, I find it intriguing that as Gallup’s revenues became significant, the polling trends became more of a roller coaster, including the odd claim of convention “bounces”. The problem of bounces, is that it mistakes enthusiasm by a party base following a party convention as an increase in voter support. This is not really the case, but the polling groups continue to report this false “bounce”, which always fades and a more nominal trend if seen again. I don’t want to sound cynical, but this is a case of what Dr. Heisenberg warned, that in observing human behavior, the observer often forgets that he too is affected by the event and this creates bias in his reports. Polling groups fall prey to media spin just like anyone else, and there are numerous cases where they missed the real trend because they were not looking for it. A good example of this can be found in the 1948 and 1976 elections, where the president in the White House was unpopular and tied to a scandal, and the polls reflected the media expectation that the president was doomed. In 1948 the polls wrote off Harry Truman, and in 1976 they wrote off Gerald Ford. The 1948 win by Harry Truman is political legend, but many people do not remember that Ford came back from a 33-point poll deficit to briefly lead the race, before narrowly losing to Jimmy Carter. Similar polling blunders have also occurred in 1980, 1988, and 2000 in presidential races.

My point for this article, is that if national polls by well-respected polling groups can blunder, we should hardly be surprised to find that state polls can also be inaccurate. The mistake made this year, is the assumption that when all the polls say the same thing, they must be right. However, if all the polls are making the same mistake, they can be in complete agreement yet still be completely wrong. That’s a bold statement to make, but it is the heart of this year’s polling condition.

Let’s go back to the national polls for just a moment again. Some polls show Barack Obama way ahead, while others show the race growing very tight again, in some places well within the published margin of error. My point here is not to say which is right, but to point out that the very fact that such a range exists is evidence of statistical invalidity. Gallup recognized this, which is why they have backed off from the “expanded voter” model they were using all summer but never before this year; they have finally recognized that the range of results cannot comply with the validity tests for a 95% confidence level (the 95% confidence level is the commonly used standard of opinion polls; in simple terms it promises reproducibility, that under the same conditions and method, results will be within the margin of error at least 95% of the time); evidence of collinearity has also been discovered, which further damages Gallup’s satisfaction with the model. So Gallup is actually publishing poll results from three different models using the same data, which is essentially the same thing as admitting they have no idea what is really the true condition. If Gallup is admitting this collapse of the system, however tacitly, then it may safely be assumed that all polling this year using non-historical models is statistically invalid. The reason they will not say so plainly, is because they do not want to have to pay back their clients and subscribers. To the point, however, knowing that there serious errors in the models used in national polls, it is reasonable to expect to see similar errors in the state polls, especially as these polls use smaller respondent pools, are taken less often by each polling group, and often involve small-budget operations which cannot afford to go back and validate past operations.

So what went wrong? In short, the political affiliation weighting. There are three schools of thought about party weighting. Some would argue that there should be no weighting. That runs into problems, however, when you consider that there are clearly places where strong political bias exists. For example, I could set up a poll which filled a lot of the criteria for demographic variety by polling a political party’s national convention. I could get both genders, all races, folks from urban and suburban locations, young and old, and so on. On paper it would look just fine, but of course such a poll would be a crock. Well, the same problems exist anywhere that you do not make sure to use historically valid weighting. That’s the reason why some of those polls are way out of line, they are not paying attention to ridiculously unbalanced political participation. The second group uses an even worse idea, what one poll calls “dynamic weighting”. They take the average response from a number of polls, assume that the average of those responses represents the “new” political affiliation, and weight future polls for a while by that weighting. To see how absurd that is, imagine if you had a series of polls in which black respondents were only 5% of the raw data. Would you conclude that the Census figures were wrong, and that the percentage of blacks in the United States was only 5%? Of course not, yet that is the rationalization used by the polls which play that stunt. A similar and even more ridiculous tactic is the subjective weighting used by some polls, a weight which has no basis whatsoever in past election participation, nor even in poll participation, but is artificially determined by the polling group’s management. Such behavior is every bit as dishonest, as if they had run short of respondents to reach a target pool, and they had made up responses to fill the rest. The third option, used by very few polls but it is the one defensible method, is to take actual political affiliation from previous elections, especially since it has been shown that the proportional split between democrats, republicans, and independents has been remarkably consistent for over forty years. When polls refuse to use the numbers known to be valid from a generation of actual elections, it should be no surprise when their projected results fail to match the actual election results.

For my analysis, I had to make certain assumptions. First, I had to assume that there would be no deliberate attempt to publish false reports or to mislead respondents. Second, I had to assume that the historical norms which have been valid for forty years would also be valid in 2008. And third, I had to assume that backing out the polls to get back to the original numbers and reworking the data using historical weights, would result in valid and useful information. The procedure was simple enough. I looked up the most recent state polls which also made their internal data available, then backed out the numbers to unweighted versions, then reweighted them using the historical value for each state, collected from CNN’s archive. That produced the results in Friday’s article.

Some of the states, however, were big surprises with the reweights. Vermont going to McCain? West Virginia within the Margin of Error? New York closer than Pennsylvania? Illinois less than six points apart? That just looked screwy, but I published them as the results displayed them. This is not to say that I believe the states will play out this way, it actually speaks to the quality of the polls. Remember, the numbers I used were not mypolling, but merely the reweighted results from the state polls themselves. There are two reasons which come to mind right away as to why the numbers changed so much. First, the state polls used some very weird party numbers in some states, and so the correction to historical bases made for very different results. It’s a basic rule of statistics that large variance lowers reliability, so the effect of changing part affiliation by a large number, in either direction, results in a less reliable result. It does not mean that the original published results were correct though, only that they skewed results to a degree that could not be corrected by simply backing out and using a more valid affiliation factor. Second, in past elections I have seen states that never panned out the way the polls expected. Bush did much better in Oregon’s 2004 polling, for example, than he did in the election results. The third reason is the simple fact that polls are sometimes just plain wrong, and there is not enough information for an analyst to demonstrate what the correct value should be. There are a myriad number of possible causes for polling error, including question wording, order, time of contact, RDD failure, training of poll takers (one reason the 2004 exit polls were so bad early on, was that many of the poll takers had received little training or supervision), influence by the sponsor, data corruption, and poor response rates.

It also helps to know a state’s history. New Hampshire, for example, is a bit contrarian to the rest of New England, which is one reason George W Bush took New Hampshire in 2004. In other states, the large proportion of independent voters means that the results at any one time may be much more volatile than expected. This is due to the effect of the undecided. To understand that, let’s say we have 1,000 voters, of whom 260 are independents. Let’s say that the candidates are split 50-50 among their party bases for the 740 voters, 370 to 370. If the independents are all decided voters, they will decide the election but each independent voter will have about 0.38% of the decision. If candidate A has a 50-voter advantage among the independents, he has 525 to 475 advantage or 52.5% of the vote. Now let’s say that we are polling the group, and now only 200 of the independents have made up their mind. That would mean that each independent voter has 0.50% of the decision. A 50-voter advantage now becomes a 495 to 445 advantage, or 52.7%. But if the 200 who have made up their mind are split, then the last 60 will make the difference and each of their votes is worth 1.67% of the decision, or 4 and one-third times what it was to start with. So whether or not you count the undecideds in a poll is important to its results, and we have not even touched on what a refusal to participate does to a poll.