Stolen Thunder

Saturday, November 01, 2008

Funky Gallup

Now that the election is entering its final days, I had expected the polls to start tightening the race in order to reflect actual demonstrated conditions. For several days this has been happening in a number of major polls, but today Gallup posted a surprising number; they show Obama leading McCain by double-digit margins in all three models of their polling.

I will admit that when I first saw this, I was shocked and a bit dismayed. For all the criticism I have thrown at them, Gallup has always appeared to me to be the most professional of the polling outfits, and it they showed such a strong and consistent Obama surge at the end, then maybe I was wrong and we should expect a rout to conclude on Tuesday.

Then my brain kicked in and said , ‘hold on there, wait just a minute’. You see, there are some weird things going on here with Gallup, and yes they are important. First off, Gallup used to be simple enough; they took a poll and announced the results and internals, just as they have for decades. But this year, Gallup is running three different models, one in which they have admitted punching in inflated youth and minority race participation at unprecedented levels (their ‘expanded voter’ model). They stepped back from that when it became obvious that this model was giving numbers which did not jibe with any reasonable judgment, and tossed out a ‘traditional’ model which played the numbers with a more nominal weighting. So, for some time now we have seen three models, which have tossed out a range of support in which the ‘expanded’ model favors Obama more than the ‘traditional’ model. Yet today we see Gallup claiming 52-41 Obama in its Daily Tracking of all registered voters, 52-42 in the ‘expanded voter’ model, and 52-42 in the ‘traditional voter’ model.

Now, stop and think about why that almost has to be bogus. First, Gallup is saying that McCain lost 5 points of support and Obama gained 3 points of support in just 5 days. Does that heavy swing of support make sense? And if it does, why does Fox say McCain gained six points in the last week?

And why does Zogby show that McCain led Obama in Friday’s one-day polling, yet claim that in three-day tracking he’s still down by 5 points?

You get the idea; the volatility of the polls is a warning sign that they are not to be trusted. The trends are going different directions, and they do not even always agree with their own headlines.

Weird.

But Gallup is getting a trip to the woodshed for today’s stunt. You see, they’re not being honest with you and I think I can prove it.

Gallup has been using three different models for their reports. The first one just takes registered voters and only weights it for Census norms. The second is the ‘expanded’ model which weights the results to show heavy participation by blacks and young voters. The third model is what they are calling the ‘traditional’ model, but in fact this is not the same as past years, but is just the ‘expanded’ model with the extra black and youth votes reweighted back to historical norms, but which fails to adjust for assumptions made in the polling process and respondent pool construction.

Now think about this. Gallup claims to be using three models, yet is claiming they are producing identical results, as well as showing volatile changes in both candidates’ support levels going into the weekend. How is this possible? The only way this can be possible, is that Gallup is claiming that youths and black voters are voting the exact same way as voters overall. There’s no real way that the math works out, otherwise.

And what does Gallup say about youths and the black vote? Well, starting with the youth vote, there is not much to say. Gallup has admitted that the youth vote is not doing anything special this year.

So we should be seeing the ‘expanded’ model recede a bit, not show Obama’s lead growing, at least not because of the kids. What about the black vote? Gallup is all kinds of geeked about the black vote this year, saying they expect about a three percent increase from 2004 participation. OK, I can agree with that, but since Gallup has said they were already weighting blacks more heavily in their ‘expanded’ model, how do they explain that model surging this week, and why would the other models change as well? Frankly, the most likely possibility is that Gallup has recognized that their polling methodology used this year was in line with the ‘expanded’ model they made so much of earlier this year, and they are simply reinforcing the oversamples in anticipation of a rout which may not in fact exist.

Gallup is also getting goofy on another count: Early Voting. We’ve been hearing three things all this season about turnout – first, that we should expect around 130 million voters this year, that early voting will top 30% of all voting, and that the youth and black vote will break records this year. Gallup is reporting that as of October 31, 27% of their respondents say they have already voted and another 8% say they will vote early. Got those numbers? OK, with them in mind, let’s go visit Dr. McDonald again.

Dr. Michael McDonald at George Mason University has been tracking the early voting results. Now, we are not going to see exit polling data before the polls close on November 4, much less the actual election results, but we are getting some interesting details. Once again, I recommend everyone spend some time at his site to see the numbers for yourself.

OK, so looking at the numbers as of Saturday at 5:54 PM Texas time, we see that a total of 22,498,237 votes have been cast in early voting, known absentee and in-person votes combined. Now, if Gallup is right and 27% of the voters have done it already, that projects a total national vote of 83,326,804 voters, or a drop of 33% from 2004’s voting tallies. Dr. McDonald’s numbers come from the states’ official offices, so they’re as reliable as you will find. So, you have a choice of believing that only 83 million people are going to vote this year, or Gallup is wrong to claim that 27% of the voters voted early. If the actual tally is 130 million, then the early voters only made up about 17% of the total voters, and November 4 is going to be a madhouse.

And about that 8% who have not yet voted but plan to vote early? If we’re going to get to 130 million, then the 17% who have voted early did so over about a two-week period so far, or just about 8.5% a week. With that pace, three days of potential ‘early’ votes would project about another 3.6% of eligible voters will actually vote early, assuming the same early voting conditions exist.

So, Gallup’s assumptions about early voting may not be as big as they expected. Before I discuss what that means for November 4 conditions, let’s consider the black vote and the early voting so far.

Dr. McDonald shows that nine states are reporting voting by party affiliation, and three by racial demographic (only North Carolina is reporting results by age group, and as was reported earlier, the kids are not showing up this year either) . Among black voters, turnout where reported is indeed healthy.

Georgia is reporting that 35.1% of its early voters are black (versus 29.9% of the population and 25.7% of all registered voters), Louisiana is reporting that 36.3% of its early voters are black (versus 31.7% of its population and 31.2% of all registered voters), and North Carolina is reporting that 26.3% of its early voters are black (versus 21.7% of its population and 20.7% of all registered voters). So for those three states, early voting is averaging 4.8% ahead of population levels and 6.7% ahead of registration totals. Given the 11% representation of blacks relative to the total voter participation in 2004, an increase of 6.7% to that demographic would raise their portion of the total voter poll to 12%. Therefore, the demonstrated performance by blacks in early voting this year does not justify the heavy weighting used by Gallup.

Now, let’s look at that early voting number. Nine states are reporting participation by party affiliation. Here’s how that turns out so far:

Colorado: D 37.7%, R 35.9% (registration 32.8% D, 33.1% R)
Florida: D 45.6%, R 37.8% (registration 42.0% D, 36.1% R)
Iowa: D 47.3%, R 28.8% (registration 32.4% D, 27.8% R)
Louisiana: D 58.5%, R 28.4% (registration 52.5% D, 25.3% R)
Maine: D 42.9%, R 28.2% (registration 31.1% D, 28.1% R)
Nevada: D 49.6%, R 33.0% (registration 44.0% D, 35.6% R)
New Mexico: D 53.4%, R 32.9% (registration 50.1% D, 31.7% R)
North Carolina: D 51.8%, R 30.0% (registration 44.8% D, 34.3% R)
West Virginia: D 59.4%, R 31.5% (registration 55.7% D, 29.2% R)

For these nine states on average, the democrats are early voting at a rate 2.6 points higher than their registration, while republicans are early voting at a rate 3.4 points lower than their registration. Since the early voting currently represents 17% of the anticipated turnout this year, this works out to a total voting advantage by party of 1.02 points. Obviously, if the democrats enjoy a similar +2.6 to -3.4 turnout advantage in actual voting on November 4th, this would inflate their party advantage (assuming democrats support Obama in equal degree that republicans support McCain) by six points, which appears to explain Gallup’s sudden shift: Gallup has decided that the trend in early voting will be reflected in the November 4th turnout, which is a dangerous assumption, for the following reasons:

1. The 6-point advantage for democrats is reported in just 9 states out of 34 which have early voting; there is no clear information on party participation on the other 25 states which have early voting, and these numbers may be significantly different.

2. The record on early voting is too short to establish a statistically valid trend, but even the last two elections have shown significantly different levels of participation in voter turnout by party between early and election-day voting. There is no basis for presuming that early voting turnout will be reflected the same way on November 4.

3. Obama has urged his supporters all year long to vote early, while McCain has not made the same push. A slightly higher percentage of republicans this year than democrats have stated an intention to vote on November 4 rather than early.

4. Voters who participate in early voting will not also be participating in election-day voting. This datum is significant with regard to black voters. Black voters have been shown to be participating in the three states which release that detail, at a rate 6.7 points ahead of registration proportions. While increased participation overall by blacks may produce a modest increase (roughly 1 percent) to Obama’s support, the ceiling level of the black voter demographic necessarily means that black voter participation will decline significantly on November 4. Consequently, even if all other conditions are the same, republican participation on November 4 should be expected to improve measurably.

In conclusion, Gallup is assuming that because some democrats in some states are showing up strong in early voting, that this means a blow-out is coming. In truth the lower-than-expected totals of actual voting, combined with reports that no state so far is reporting blow-out numbers, demonstrates that the election is highly volatile and far from over, and depends as it has all along on the three key components of voter turnout, who wins the independent voter support, and which way the undecideds break. Don’t be fooled, this race is still red hot.

Friday, October 31, 2008

Things That Make Polls Go D’Oh

It should be obvious by now that I will never get a job offer from Gallup, Rasmussen, or Survey USA. I’ve been pretty hard on them regarding the way they’ve weighted their party affiliation demographics, and I have repeatedly pointed out that ALL of the major polls are failing to comply with NCPP standards for disclosure and transparent practices. Frankly, I once held polling groups in much higher respect than I can do right now. And besides reporting what the invalid polls mean for this election, I also feel compelled to warn readers that opinion polling in general has lost its ethical core. I hope it will return to its commitment to accuracy and honest reporting, but for now polling seems to have gone the way of responsible mainstream journalism.

Liberal critics of my articles, and those who still trust the polling groups because of past work which was accurate and appeared trustworthy, have asked a very legitimate question: What if I am wrong? Isn’t it possible that I just cannot accept that Obama is going to win this election, and I am grasping at straws for moral support? I would consider answering that they could be right and I could be wrong, but even then I’d have to start by asking for clarification on exactly what they mean to ask.

Do they mean the Associated Press/Gfk poll which says Obama will win by one, or the Pew Research poll which says Obama will win by fourteen?

Do they mean the Battleground poll which says Obama will win by three, or the CBS/NYT poll which says Obama will win by thirteen?

You get the idea. The polls simply do not agree with each other. And yes, those margins are significant evidence of invalidity. I read a professor’s blog earlier this week, who is assuming that since all the polls say Obama is going to win, then they really do agree with each other and the margins do not matter. He contends that the polls which show a close race are really just the low end of the range, the wide lead polls are the upper end, and the average is really how things are going now. These assumptions, however, are invalid because the confidence level tests show the polls do not agree closely enough to avoid evidence of collinearity, and if collinearity exists then the results of the poll cannot be accepted, regardless of whether they appear believable or not.

Also, each poll has its own margin of error, usually around three percent, which is to say that Obama and McCain could each be as much as three points lower or greater in support than the poll shows. As a result, any poll which shows less than a six point lead for Obama is, statistically, saying that McCain could possibly be winning. Whether or not McCain is shown to be in the lead is not statistically relevant, except that we can say the polls do not indicate a McCain lead outside the MOE. However, even then we have to be careful to note that because of the invalid range of poll results, no valid conclusions can be made at all. None.

We also need to observe what’s been going on with the poll trends. In the last ten days, for example, Rasmussen has shown swings of up to 5 points, or a half-point per day. He’s saying that more than a half-million people on average are changing their minds every day. Does this sound reasonable to you?

The latest Fox poll shows McCain closing six points in just a week. That’s 7.8 million voters changing their minds in that time. Has McCain’s campaign done anything different that would explain that shift to you? And if not, why is the poll changing so drastically now that the race is coming to an end?

Gallup is still admitting they are clueless, as they continue to publish three separate models of voter opinion. You really should ask yourself, if Gallup was on top of things this year, why did they trash the original model in favor of one using unprecedented demographic assumptions, then use that same data to backtrack and try to reflect a “traditional” model? What did they see that made it clear they were wrong? And having been wrong not once but twice in fundamental operations this year, why should you assume they got lucky on the third guess, which in any case is built on the same methodological decisions they have tacitly admitted were wrong before?

The first rule the NCPP says any journalist should ask about a poll, is who is paying for it. With that in mind, shouldn’t you be skeptical that the polls reporting the largest leads for Obama are sponsored by agencies known to be pro-Obama and anti-McCain, specifically CBS News, the New York Times, ABC News, the Washington Post, and Newsweek? And shouldn’t you wonder if the community of pollsters just might be letting itself be influenced by Obama’s big-dollar media machine? Half a billion dollars of media publicity is bound to have an effect, and why wouldn’t it affect people who run the polling groups? People like Zogby, who called the 2004 election for Kerry months before the actual voting? People like Scott Rasmussen, who is getting serious coin to sell the story of this election by subscription? One area where I can tell you I am clearly more worthy of your trust, is that no one is paying me anything for what I do on the blogs. Not a penny. So, while I’d like to be rich someday, it doesn’t look like I’m going to get there by blogging on polls, but that means that you will be getting my honest opinion, based on my reasoning and the evidence, not on what effect it will have on my bank account. Sorry, but a pollster who refuses to show internal data to the public is a mercenary, not a professional, and a pollster who lets any media outfit decide what questions will be asked, what order they will be in, and which respondents are appropriate and how/when they will be contacted, is a media whore and his analysis is inherently dishonest.

OK, that’s pretty harsh, and I want to emphasize that many polls are indeed trying to be professional and accurate, as much as the business will let them be. And even in the media whore groups, there are individuals who are honest and honorable (and probably miserable) and trying to put out a solid product. The problem comes from two directions. First, polling has become a business more than a profession, meaning that the guys directing the polls have become too willing to sell a story, even if that story is not exactly true. This becomes apparent when polls report shifts which are not caused by valid events, most easily seen in the phenomenon of convention ‘bounces’. It’s one thing to expect a party’s base to become energized when the nominee is finally known and he comes out formally in a way that shows confidence and capability, but in recent years the pollsters have also decided this somehow affects the opposing party’s support levels, a patently absurd notion on its face. I mean, what did Obama do at his convention that is supposed to have won over some Republicans, and just why should we believe that a number of Democrats, even briefly, supported McCain because he chose Sarah Palin for his running mate? That’s manipulation of the data, folks, and cannot be explained any other way. It’s been going one a while, that roller-coasting of the numbers, since polls in the media need to keep attention, and to do that they need to be exciting, even if it means being dishonest. They get away with it because they have a lot of time to worry about closing in on accuracy in the late weeks. Of course, some years they blow that, too. It needs to be said, repeated and repeated again, that polls blow the call by more than their published margin of error about 40% of the time.

The other problem is the Obama Machine. There are a lot of unprecedented conditions in this election, and I do not think the polling groups ever really sat down and thought about what the new conditions would be. Well, actually they did, but they did not test their conclusions, and as a result bought into some pretty tall tales from the Obama people. This year, the polls assumed the following things would be very different about this year:

1. Barack Obama being the first black to receive a major party nomination for President, black voters would be greatly motivated to register and vote, and this would swing decisively towards Obama. This led some polls to over-sample black voters, in the expectation that their influence would be more significant this year.

It’s true and false. Black voters have indeed become more motivated this year, but as a demographic group blacks have always been enthusiastic, and have always overwhelmingly supported the democrat’s nominee in presidential elections. As a result, it is mathematically impossible for black voters to significantly change the outcome of the election by supporting Obama. In a tight race, the increased participation could make the difference in some states, but nationally the effect is minimal and polling models should not be changed because of it.

2. Barack Obama would greatly inspire and motivate young voters to register and vote, and this demographic would swing decisively towards Obama. This led some polls to over-sample young voters and to count more newly-registered voters as likely voters.

This one has been difficult to prove, since only the actual election can confirm or disprove the theory. However, John Kerry saw a strong rise in democratic party registrations in 2004, in part due to the primary efforts of Howard Dean. This created an apparently significant advantage for the fall campaign, which was one of the reason that Zogby called the election for Kerry early in the summer. In the actual election, however, under-30 voters’ proportion of the vote did not change from the 2000 election, and many of the newly registered voters simply did not vote, which is also consistent with historical behavior. Accordingly, it is not reasonable to alter polling models to behave in a manner inconsistent with historical norms.

3. The combination of excitement over Obama’s campaign, coupled with the nation’s dissatisfaction with President Bush and the Economy would lead to a great increase in democrats’ participation relative to republicans, as more people would see themselves as democrats and republicans would be likely to stay home. This led almost all polls to report results which either left democrat-heavy respondent pools unweighted, or which weighted polls to reflect heavy democrat advantages.

As with rumor 2, this cannot really be confirmed or disproven until the election is finished. However, history indicates the rumor is unfounded. In 1976, the republicans were expected to be dis-spirited, Richard Nixon having resigned in disgrace just two years previously. This was one reason that just after the party conventions, Governor Carter of Georgia led Ford by 33 points, a blow out seemingly undeniable. Yet in the actual election, Carter won by only two percentage points, and some political experts believe that if the election had been held a week to ten days later, Ford would have won. Part of the reason was that republicans in 1976 did show up to vote, less than the democrats but in far greater numbers than pollsters had expected to show. The same thing happened in 1948, when democrats were supposed to have given up, yet the record shows something far different. If a poll’s model is based on known history rather than pure speculation, then that model should not deviate from historical norms.

In my opinion, the polling groups allowed themselves to believe unfounded myths in all three of the cases I just mentioned. But they also failed to consider the influence of the half-billion dollars being spent by the Obama campaign, the rock-star behavior of his cadre (and a comparable level of professional knowledge and interest in middle America) in influencing and intimidating the media and public image (‘vote for Obama or you’re a racist’), and the heavily-urbanized character of his campaign and publicity efforts. The polling groups failed to note the dichotomy between the tone of Obama’s early primary victories and the voter response as the campaign wore on, failed to adjust their weighting to reflect actual results from primary elections and track with historical norms in each state and nationally. A massive effort by the Obama campaign to cast this election as unprecedented resulted in every major polling group abandoning historical models to create unproven models based on assumptions. What we are seeing now is the result of these models failing as key assumptions fail.

More Fun With Poll Numbers

The election is coming to a close, or at least we hope so (thank you Al Gore for proving that sometimes the nightmare just continues). All along, I have been saying that the poll numbers are invalid on their own standards, and once again I found another reason to repeat that claim: The state polls contradict many of the national polls.

The claim made by those who like the polls, has generally run along the lines that they cannot all be wrong, and that a consensus of the polls should be trusted. I hardly agree, because of a factor in statistics known as collinearity. Here’s the formal definition from statistics.com: “In regression analysis , collinearity of two variables means that strong correlation exists between them, making it difficult or impossible to estimate their individual regression coefficients reliably.”

Informally, collinearity is a warning to statisticians to make sure that they are using data which is truly independent of other data. When data is redundant or co-related, using the additional data gives an invalid additional weight to the data used, corrupting the results. Tests have been created to detect multicollinearity, such as the Farrar-Glauber test (most commonly used in econometrics), but it does not appear that vector testing is commonly practiced in opinion poll analysis.

The math in that line of testing tends to get a bit complex for a casual discussion, so for here I will come back to another point of opinion polling: the statistical level of confidence. That is a critical test for an opinion poll, and what it means is a quick reference on whether the poll is valid. “Valid” does not mean right or wrong, it means the poll’s method is considered trustworthy. “Invalid” means that whatever the poll says, you should not rely on it. Again, I refer the reader to the National Council on Public Polls (NCPP), and their criteria for polling and their principles of disclosure. In short, when a poll will not tell you who paid for the poll, hides how many people refused to take the poll when contacted, or refused to release internal demographics used in the poll and from the response pool, that poll is in direct violation of NCPP rules and should not be taken seriously, even if you find their results believable. The bad news there, is that almost none of the publicly-released polls are in full compliance with NCPP standards.

Going back to the question of the confidence level, though, it’s a simple test for validity. All of the major polls use – or claim to use – similar methodologies and demographic weighting, with the exception of party affiliation weighting. Some of these groups insist that party affiliation is not a static demographic, and therefore should not be weighted at all, so for here we will use their logic in applying the numbers. The polls all claim a 95% confidence level. In statistics, they are saying that if the same method is used, polls should produce results within the margin of error 19 times or more out of every 20 polls. So, it should not be difficult to test that claim.

Here are the polls listed at Real Clear Politics for the last ten days (where a poll has been done more than once in that period, the most recent results are used) . I am listing these in descending order of support for Barack Obama, then in support for John McCain, noting a 3% claim for MOE and how many polls agree or disagree with the stated poll:

Pew Research – Oct 26 – Obama 53% (agree 8, disagree 4) FAIL
Newsweek – Oct 23 – Obama 53% (agree 8, disagree 4) FAIL
ABC News/WaPo – Oct 29 – Obama 52% (agree 9, disagree 3) FAIL
CBS News/NYT – Oct 29 – Obama 52% (agree 9, disagree 3) FAIL
Rasmussen - Oct 30 - Obama 51% (agree 11, disagree 1)
Gallup (Expanded) – Oct 29 – Obama 51% (agree 11, disagree 1)
Reuters/C-SPAN/Zogby - Oct 30 - Obama 50% (agree 12, disagree 0)
Gallup (Traditional) – Oct 29 – Obama 50% (agree 12, disagree 0)
Ipsos/McClatchey – Oct 27 - Obama 50% (agree 12, disagree 0)
GWU/Battleground – Oct 30 – Obama 49% (agree 10, disagree 2) FAIL
Diageo/Hotline – Oct 29 – Obama 48% (agree 8, disagree 4) FAIL
IBD/TIPP – Oct 29 – Obama 48% (agree 8, disagree 4) FAIL
FOX News – Oct 29 – Obama 47% (agree 6, disagree 6) FAIL

Rasmussen - Oct 30 - McCain 47% (agree 7, disagree 5) FAIL
GWU/Battleground – Oct 30 – McCain 45% (agree 9, disagree 3) FAIL
Gallup (Traditional) – Oct 29 – McCain 45% (agree 9, disagree 3) FAIL
Ipsos/McClatchey – Oct 27 – McCain 45% (agree 9, disagree 3) FAIL
FOX News – Oct 29 – McCain 44% (agree 11, disagree 1)
Gallup (Expanded) – Oct 29 – McCain 44% (agree 11, disagree 1)
ABC News/WaPo – Oct 29 – McCain 44% (agree 11, disagree 1)
IDB/TIPP – Oct 29 – McCain 44% (agree 11, disagree 1)
Reuters/C-SPAN/Zogby - Oct 30 - McCain 43% (agree 10, disagree 2) FAIL
Diageo/Hotline – Oct 29 – McCain 42% (agree 10, disagree 2) FAIL
CBS News/NYT – Oct 29 – McCain 41% (agree 8, disagree 4) FAIL
Newsweek – Oct 23 – McCain 41% (agree 8, disagree 4) FAIL
Pew Research – Oct 26 – McCain 38% (agree 2, disagree 10) FAIL

Note that every polling agency fails one side or the other of this validity test. Every one of them.

But let’s move on. We can look at the RCP averages from one of two perspectives. The RCP folks take the polls from the last week by polling date (not release date) and average them. That gives a claim that Obama is leading McCain 49.7% to 43.8%, with a 3 point MOE. If we extend that back to polls taken October 20 or later, then it becomes Obama 50.3%, McCain 43.3%. So, RCP’s national polls, if aggregated as they like it, show a 5.9% lead or a 7.0% lead.

OK, now let’s take a look at the RCP state polling. There are dozens of polling groups which have put out state polls, and I cannot speak here to their total authenticity. That, of course, is also a problem with some of the national polls, but for consistency we can use the RCP numbers. Now, if each state’s aggregate claimed level of support for Obama or McCain is applied to the state’s proportional level of the national vote (using 2004 voting statistics), we find that if the state aggregations are right for RCP’s state averages, plugging those numbers in gives Obama 46.9% of the popular vote, to 43.9% for McCain. The aggregation of the state polls, is we are going to accept them as valid, shows that the national polls are overstating Obama’s support. Once again, a simple check for validity shows that the confidence level test fails for the national polls.

One last thing. The state polls have assumed a significant shift from 2006 towards increased democratic participation, but even if that happens, the state polling indicates that Obama will still fail to reach 50% support. If those polls are reweighted according to 2006 turnout proportions and then plugged in to project national numbers, it becomes Obama 46.3% and McCain 47.1%, with 6.6% undecided. Take from that what you will.

Wednesday, October 29, 2008

Thoughts About the Early Voting

There was a time when we would count down to election day. In fact, a lot of media is doing just that. But this year, by the start of ‘election day’, perhaps more than thirty percent of the voters will have already voted. Recent changes in absentee and early voting laws have created an opportunity for voters to have a much more convenient chance to vote. As a matter of fact, I voted over a week ago myself, because Texas opened early voting back on October 20th. A lot of pundits and media have been talking about early voting, which makes sense, but there has also been a lot of opinion tossed about which turns out not to have firm foundation under it. For example, I have read and heard about a supposed historical tendency for republicans to have an advantage over democrats in early voting. That’s true to a degree, because historically more seniors vote early and they have tended to be republicans. However, that trend was established with the restricted absentee votes, and since no-excuse absentee and early voting have begun, that trend evaporated. 34 states offer early voting this year, many for the first time, which is one reason why there is so little history for the practice as a national exercise. In 2000, roughly 14% of voters voted before the designated election day. In 2004, that portion rose to 22%, and this year election officials expect that portion to climb above 30%. Barack Obama has repeatedly urged his supporters to vote early and not wait for election day.

A lot of talk has focused on the results from early voting. That data is necessarily limited, by law as well as ethical rationale. It’s been long noted, for example, that some folks like to vote for a winner, and if they are persuaded that a candidate has locked up the win, they will go along rather than feel that they backed a loser. As a result, election results – especially vote tallies – are not supposed to be released until after all the polls close in a state. Poll results are often used to hint at the results, which may or may not be cheating, depending on whom you ask and how that information is presented, and we’re seeing a version of that in looking at the early voting results so far. Before we look at those results, I need to emphasize that there is no established standard to prove the meaning of a particular event in early voting. I had thought, myself, early on that it would be a good sign for McCain if republicans voted in numbers equal to democrats. It turns out that I had not thought that one through.

I read an interesting paper on the significance of early voting, by Kate Kenski writing about the Annenburg Election Survey for the 2000 and 2004 presidential elections. For example, Kenski noted that early voting by black voters was low (2.8%) in 2000, but more than quadrupled that response in 2004. From that trend, it should not be surprising that black voters continue to increase participation in early voting, especially with Obama on the ticket. Another point of interest was that in 2000, Bush earned a much higher percentage of the vote in early voting, but in 2004 the percentages were much closer to election-day voters, possibly due to the extraordinary turnout in the overall election. 2008 will provide a lot of useful information about early voting demographics, but for now we are limited in what we can say from the existing record.

Dr. Michael McDonald at George Mason University has a website up for easy reference on early voting. It shows that already, more than sixteen million early votes have been cast. Party-specific references can be found for just nine of the thirty-four states offering early voting, so we should be careful about assuming the information is true for the whole nation, but so far it does indicate that democrats have been better-organized so far than republicans, from the following state results:

West Virginia: 59.4% democrats, 31.5% republicans
North Carolina: 54.0% democrats, 28.6% republicans
New Mexico: 55.1% democrats, 32.3% republicans
Nevada: 53.7% democrats, 29.6% republicans
Maine: 44.5% democrats, 28.6% republicans
Louisiana: 58.4% democrats, 28.5% republicans
Iowa: 48.9% democrats, 28.5% republicans
Florida: 45.4% democrats, 39.0% republicans
Colorado: 38.6% democrats, 37.9% republicans

Except for Colorado, the states which are reporting results by party affiliation show a strong showing by democrats relative to republicans. One possible reason for this is the strong proportion of black voters. The following states have reported the following percentages of black early voters to all early voters so far:

North Carolina: 27.6%
Louisiana: 36.0%
Georgia: 35.2%

This news is likely to be taken, indeed has already been reported by some media, as evidence of a wave of Obama support. To some degree and reasoning this is true, since it is quite reasonable to expect that the heavy majority of democrats will vote for Obama, and therefore a large proportion of democrats means a lot of Obama votes. However, the reader should be reminded that each of these voters in the early count is a voter who will not be voting on election day; the high proportion of black voters now will, mathematically, require a lower proportion on election day, since no demographic can exceed the ceiling of its representative total. It benefits Obama insofar that a voter who has submitted their ballot represents the surest kind of voter turnout, but it should be remembered that 59 million votes was not enough for John Kerry to win in 2004, so the 16 to 17 million votes submitted so far can provide a head start for a candidate, but is far from all he will need.

Also worth considering, is the behavior of voters. Gallup has a nice article up on its site, and while it tilts a bit towards Obama, it notes that except for the West, most voters still plan to vote on Election Day, and it should be noted that in Kenski’s paper, she observed that most early voters vote less than seven days before election day, meaning that we could see a wild finish to early voting, one that could significantly change what we are seeing now in demographic terms. Also, while it is the only state which released early voting behavior by age group, I found it interesting to see that in North Carolina, only 12.3% of the voters were under 30, with 20.7% coming from the 30-44 group, 41.7% from the 45-64 age group, and 25.3% from the 65 and older group.

In conclusion, you can expect the Obama supporters to use this early information to claim they are winning easily, but there’s still several more days of early voting, and even if it’s record-setting in scale, the numbers from November 4 will still be the ones which do the most to decide the election.

To see why McCain supporters could still take hope, let’s play a little bit with the numbers we have available. I emphasize that these are not hard numbers nationally, but merely using the same extrapolation that Obama supporters would use for their own encouragement, but taken here to encourage McCain supporters. The nine states which are reporting party affiliation numbers are indicating an average of 48.1% of democrats among early voters, and 28.6% of republicans among early voters. The three states reporting black voter participation are reporting an average of 31.5% participation. Projected nationally, that would be 5,200,862 votes placed by black voters out of 16,514,867 total early votes. Since the polling data says that essentially all black voters are democrats this year, that means that there may have been 11,314,005 votes by non-black voters so far, of which 24.2% would be democrats and 41.9% are republicans. Further, if we assume that there will be roughly 130 million voters this year and that black voters represent about 11% of those voters, then we project that 14.3 million black voters will vote this year. With 5.2 million already having voted out of 16.3 million early votes so far, that would project the remaining black vote would be 9.1 million out of 113.7 million, or 8.0% of the remaining vote. Accordingly, the overall democrat percentage is going to drop as the vote progresses, as will the republican portion of the remaining non-black vote. As I have said before, the numbers may seem heavy in one direction now, but in the end the independents and late-deciders will make the difference.

The Secret Poll October 29 2008 edition

Hello again, and time for another edition of the Secret Poll. The election remains winnable by either John McCain or Barack Obama, the keys still coming down to turnout, the independents, and just plain not giving up. This being the final week, it is unlikely that either candidate could say or do anything to significantly improve his profile, although a badly-timed gaffe or surprise piece of bad news could influence the remaining undecideds, still over 10% of all voters. With early voting projected to represent more than 30% of all votes this year, Obama’s night-before election infomercial is unlikely to change anyone’s mind, although Obama is likely doing this as a last-step effort to maintain the high emotion on which his campaign has run.

So once again, here’s the recap of where I think the true numbers have played out, and where we are now:

August 31: McCain 41.77%, Obama 41.06%

September 7: McCain 42.45%, Obama 42.04%

September 14: McCain 45.71%, Obama 39.62%

September 21: McCain 44.48%, Obama 42.06%

September 28: McCain 42.73%, Obama 41.62%

October 5: McCain 44.09%, Obama 43.96%

October 12: McCain 42.68%, Obama 45.31%

October 19: McCain 43.49%, Obama 46.03%

October 26: McCain 44.50%, Obama 44.48%

McCain gained more support among independents and is eating into ‘soft democrat’ territory, the first possible indications of PUMA support. However, republicans are still less active than democrats in GOTV efforts, so winning may well depend on genuine last-minute efforts to encourage republicans, to focus on Sarah Palin’s future and the consequences of an all-Democrat government. Independents appear to be responsive to tax and stability issues, especially as Obama has neglected answering them in any depth, apparently believing they can only hurt him if he addresses them. The undecided portion has increased slightly to 11.02%, indicating that the emotion-based voters have begun to lose excitement, especially as these numbers appear to have fallen directly off Obama’s support, which had been at a campaign-high 46.03% last week.

The keys, again, are the following:

Turnout – if one party clearly does a better job getting its base to vote, that party will clearly win. More than ever, your vote matters.

Independents – Right now, the Independent vote is essentially tied, with about 25% of Independents still undecided. Whoever wins the most of that vote will win the election.

Undecideds – Overall, 11.02% of voters are still undecided. It’s slowly resolving itself, but there will still be a large pool of voters waiting to be convinced just before election day. Finishing strong could make all the difference.

States and Shadow

Earlier this season, I wrote about the statistical effect of what I call “shadow”, the combination of a poll’s margin of error and the undecideds. In today’s article, I apply this again to the state polls and address the errors of aggregation and over-simplification.

The national polls are showing a tightening race in several polls, notably Gallup, Battleground, and the AP-Gfk poll. Of course, other polls are claiming a large lead for Obama, notably Pew and Newsweek. The state polls are also showing some movement, although its not as rapid for a number of reasons, not the least being that state polling is not done as regularly as national polling. Obama supporters have greatly enjoyed the RCP aggregate numbers for Obama, which they would, since the RCP aggregates at both the national and state level indicate Obama is winning easily over McCain. The problem, of course, comes when you start to look closely at the support for that belief.

If the methodology is sound, national and state polls should track in similar fashion. This does not mean that every state poll will reflect national support to the same degree, but if a national poll is done properly, it will include proportionate responses from every region of the country, ideally from every state, and so the national numbers will reflect the sum of the state supports. So, the tightening of the national race has to mean – assuming the polls are valid – that McCain is gaining support in some large states or in enough small to medium states to be reflected in national numbers. But as I said, major polling is done less often at the state level; most state-level polling is done less than once a month by polling agencies. Survey USA, for example, who has done more state polls than any other agency, has not done a state poll in the last two weeks in 35 states, and has not done a state poll in the last 10 days in 41 states. That’s important to keep in mind.

I need to address the problem of aggregation in polling now. Aggregates are popular because they are easy to read, and seem to be helpful in telling how much someone is ahead. After all, you don’t want to be fooled by paying attention to an outlier, and there is a sense that if most of the polls say the same thing, that’s most likely what’s really going on. The problem with that, is the assumption that all of the polls in an aggregate are valid, that all can be accepted with equal confidence. But that would be erroneous. First of all, not every polling group is really professional at what they do. Remember the disastrous early exit polling in 2004? In that case, a lot of brand new pollers were hired and hustled out without proper training, orientation, or supervision. Does anyone really think that was the only occasion where that happened? The fact is, a lot of polling errors get made without the public ever hearing about it, for a number of reasons, not the least being that if their results are what is expected, the error is not obvious. Also, even professional polling groups may look for different characteristics, such as polling adults, registered voters, people who have voted in recent elections for their ‘likely voter’ category, people who simply claim they are ‘likely’ to vote, and so on. Take a look at some of these state polls, and you will also find that it can be difficult to see how they arrived at their numbers; many simply do not provide access to the raw data or their internal demographics. As a result, a significant portion of the state polls are likely to be flawed in a functional manner, and aggregating such polls tends to magnify such errors, not eliminate them.

The next problem is over-simplification. This shows up most often in the way that polls are reported. Whether you like the results from a poll or not, it’s very important to understand that polls are sometimes just plain wrong, and even if a poll is valid, it’s only valid to the extent that it demonstrates a trend against its earlier report using consistent questions and methods, and polls have never predicted the surprise results, because they are modeled in a way which reflects the public’s assumptions far more often than the actual condition. Polls are opinion polls, after all, not predictors of future events. Polls only “predict” the results of an election to the degree that the voters behave in line with the poll’s assumptions.

So, with that said, I am addressing the state polls with respect to the statistical phenomenon of shadow. ‘Shadow’ is the total amount of uncertainty in a poll, the combination of the undecideds plus two times the published margin of error. For example, let’s say candidate A is leading candidate B in a poll, 51-44 with a published margin of error of 4%. Game over, it seems. But that 4% MOE means that either candidate could be as much as 4 points stronger or weaker, meaning its candidate A at 47 to 55, and candidate B at 40 to 48. Also, there are 5% undecideds in the poll, so while B looks to be out of it, it’s mathematically possible for the actual condition to be A 47, B 53. It could also end up being A 60, B 40, with the same level of probability as the other extreme. And of course, this does not consider the possibility of some voters changing their minds. I do not think that happens as wildly as the polling groups seem to claim, but it is a valid factor. Considering that, we can now examine the state polling condition.

I took the RCP aggregates (I know, I know, but I do not have the time or space to examine each and every state poll for validity, I don’t need to have anyone whining about ‘cherry picking’ polls, and I can make my point even by using the aggregate reports) and applied the percentages claimed to the 2004 voting results as a two-party vote split. If we count all of the states according to who leads according to the RCP aggregates, Barack Obama would take 50.2% of the popular vote to 43.0% for John McCain, and 364 electoral votes to 174. However, even using those aggregates, the numbers change considerably if we consider the effect of shadow. Applying the shadow rule (undecided plus double MOE), it becomes 200-118, Obama still in good shape but with 220 electoral votes still to be decided.

Before ending this article, I also looked at the trends and outliers in the polling I have seen, especially given certain key internals. I will not call it definitive, but in my opinion if the demographic weighting is corrected the popular vote becomes Obama 46.9%, McCain 46.6%, but with McCain taking the electoral vote 278-260. When the shadow effect is applied, the electoral numbers change to 147-71 McCain, with 320 to be decided. The message is clear then, that the race remains to be decided.

Tuesday, October 28, 2008

Demographic Thresholds

I have been saying all along, that McCain was closer to Obama in the election campaign than the polls were indicating. However, I have been doing so by focusing on party affiliation, the demographic most fiddled with by the polls. Those critiquing my analysis have sometimes pointed to internal demographics which show problems for McCain. With just a week until the end of the season, let’s see where the thresholds for election are, with regard to demographics. That is, what is the minimum performance in each area which was enough to get the win? Here are the numbers:

In 1992, Clinton won with just 41% of the vote from male voters. Right now Barack Obama is tracking 40% of the male vote in Battleground, and 47% in Gallup, while McCain is tracking 44% in Battleground and 46% in Gallup. Before I go further, I want to note that the party skewing could affect this numbers, and also that the numbers in a poll may end up being a bit different in the actual election results. Therefore, all we are doing is seeing whether the candidates are roughly where they want to be.

In 1968, Nixon won with just 43% of the vote from women voters. Right now Obama is tracking 54-55% with Battleground and Gallup, while McCain is tracking 35-39%. As I said, however, while this is a problem area it may be artificially low, as republican women have much higher support for McCain than do democratic women polled.

In 1992, Clinton won with just 39% of the White vote. Right now Obama is tracking 39-44% with White voters, while McCain is tracking 47-50% with White voters.

In 2000, Bush won with just 9% of Black voters. Right now Obama is tracking 82-91% with Black voters, but McCain is tracking only 3% with Black voters in both Battleground and Gallup. It’s very unlikely that McCain will reach the 9% mark, so either this measure will prove to be meaningless, or it will be a key demographic since Obama has locked it up.

In 1968, Nixon won with just 38% of the under-30 vote. Right now Obama is tracking 56-59% of that demographic, while McCain is tracking 29-38% with that group.

In 1968, Nixon won with just 41% of the 30-49 vote. Right now Obama is tracking 43-50% with that group while McCain is tracking 43-45%.

In 2000, Bush won with just 45% of the 50+ vote. Right now, Obama is tracking 35-45% with that group while McCain is tracking 44-50%.

In 1980, Reagan won with just 86% support from republicans. Right now McCain is tracking at 83-92% from republicans.

In 1992, Clinton won with just 82% support from democrats. Right now Obama is tracking at 80-89% from democrats.

In 1960, Kennedy won with just 5% support from republicans. Right now Obama is tracking at 5-7% support from republicans.

In 2000, Bush won with 10% support from democrats. Right now McCain is tracking at 7% support from democrats (PUMA influence not known).

In 2000, Bush won with 42% support in the East. Right now Obama is tracking at 52-58% and McCain is tracking at 35-36% in the East.

In 1992, Clinton won with 44% support in the Midwest. Right now Obama is tracking at 46-52% there and McCain is tracking at 37-39% in the Midwest.

In 1968, Nixon won with 38% support in the South. Right now Obama is tracking at 36-42% there, while McCain is tracking at 51% support in the South.

In 2000, Bush won with 47% support in the West (Clinton won with the same level in 1992). At this time Obama is tracking at 44-54% in the West, while McCain is tracking at 38-40% support.

As I wrote at the beginning, these numbers are comparing poll numbers to election results, and at this time in 2004 both Bush and Kerry were more than 5 percentage points away from their final results in many categories. The undecideds play a key role in the final tallies, and they will do so again this year. Also worth noting at this time are comparisons in these additional demographics:

Urban voters: 55-31 Obama
Suburban voters: 48-39 McCain
Rural voters: 41-40 Obama or 44-40 McCain depending on the poll

Single voters: 61-24 Obama
Married voters: 47-39 McCain

And finally, for some reason no major poll seems to be releasing any internal demographics for Asian voters. Sure, we’re talking between 1 and 2 percent, but in some places they could matter.

Monday, October 27, 2008

From Hillbuzz, Why Democrats Should Support McCain

1. A new kind of politics = voting for the best candidate, regardless of party

2. McCain won’t raise taxes

3. Experience and accomplishments matter

4. Bipartisan record of working with Democrats

5. In 35 years, no Republican president has threatened Roe v. Wade

6. The president can only nominate judges, while a Democrat-controlled Congress will appoint them

7. Energy independence for the United States, using all means available

8. 100% open government and unfettered press access

9. Risked political career to do what he thought was best for the country

10. Never earmarked a single pork barrel project

11. Sarah Palin for Vice President

Gallup In The Tank?

Back in 2004, I jumped pretty hard on John Zogby. Zogby did two things which I considered, and still do, to be unacceptable conduct for a pollster. First, was that Zogby flat-out called the election for Kerry back in May of 2004, a prediction he hung onto through the rest of the campaign. The second reason was that Zogby started mixing results from his telephone polls with his online polls, which invalidates the results from both methods. I would also point out to the reader that in 2004 and 2005, I was unhappy with political affiliation weighting at the time, and had adjusted my own expectations by reversing the bias from polls. My point is that even four years ago I was challenging poll methodology when it deviated from NCPP guidelines, and even if Zogby is publishing prettier headlines now, that does not change my wariness from past experience. I will challenge any behavior at odds with valid practices.

This year, all of the major polls show Obama ahead in the presidential campaign right now, some saying he is well ahead. I found serious problems in their fundamental assumptions, not the least being the heavy weighting of democrats in the polls (and let’s not mince words – any poll weights by party affiliation, the ones which simply accept what is called in are just accepting the raw data as demographically accurate, which is just as absurd in terms of party affiliation, as it would be if they assumed that race, gender, age, or educational demographics did not need to be reweighted). I have wondered two things as the campaign moved along – what would I say if I turned out to be completely wrong, and what would these polling groups say if I turned out to be right and they were the ones who blew it? For my case, I intend to review the election from a statistical standpoint, and if Obama wins in a landslide because the nation really did decide it was 48-25-27 DRI, then I will admit it plainly and take my lumps. I suspect the polling groups will have a harder time being forthright if my argument turns out to be correct. One reason for that is today’s polling discussion from Gallup.

Gallup has noted the strength of early voting this year. The most significant points from that article are these; early voting is stronger than expected this year, and so far republicans have been just as eager to vote early as democrats. The third point is the most important signal of all. Says Gallup; “Early voting ranges from 14% of voters 55 and older (in aggregated data from Friday through Wednesday) to 5% of those under age 35. Plus, another 22% of voters aged 55 and up say they plan to vote early, meaning that by Election Day, over a third of voters in this older age group may already have cast their ballots.”

The last two statements are very good news for McCain and bad news for Obama. This is because it demonstrates that enthusiasm to actually vote by republicans is equal to enthusiasm to vote by democrats. This runs directly against claims made in polling up to now, demonstrating that participation in polls is not directly related to voting this year. Second, the higher participation by senior voters and weaker participation by younger voters is directly in line with historical norms, again running against the poll expectations that this year would see a wave of young people voting but seniors staying at home. Gallup’s own data proves this is not happening as they predicted, and the polls are therefore invalid in those respects, in addition to obvious flaws in the party weighting. The reasonable expectation from these facts, would be for Gallup to back down and correct its weighting to match the observed behavior. As of yet, Gallup has not taken that step. They did, I note, tacitly admit that the “expanded voter” model they introduced this year is invalid, but now they are running no less than three models of polling, which makes me wonder if they are going to wait to see which one comes out the closest (or the least embarrassing) and call that one their ‘official’ call – when a major polling group throws out three guesses instead of just one judgment, you can be sure they have lost confidence in their system.

Saturday, October 25, 2008

The Two-Track Hypothesis of Voter Decisioning

Earlier this campaign season, I began to question the polling methodology being used, especially when compared to historical norms. Polls released at the same time, claiming to use the same methodology, were publishing results well outside the range of their margins of error, demonstrating fundamental mistakes in their models. While some of those polls played fast and loose with racial, economic, and age demographics, and at least one major poll grossly over-sampled unemployed adults and another poll which had published its demographic internal data through 2006, for this election stopped revealing that data, the most common disparity between poll models has been political party affiliation. These have come in two flavors – polling groups which have oversampled democrats in the belief that democrats will dominate the actual voting to a degree not seen in most Americans’ lifetimes, and polling groups which do not weight their samples for party affiliation but merely report the proportion of party affiliation of the people contacted. The first assumption is based completely on subjective prejudice and in some states is wildly variant from the actual election support from the 2006 (last federal) and 2004 (last presidential) elections. The second assumption is absurd on its face. To illustrate, I could have taken a poll at the Democratic and Republican conventions this year, and covered pretty much all of the census demographics, including gender, age, education, work background, geographic hometown, urban/suburban/rural split, religion, and so on. Yet I think we can safely say that polling only democrats or only republicans would produce a poll which would be absolutely useless in telling us how the nation really felt; political affiliation is undeniably a significant vector in voter support for a candidate. You do not have to be an expert in political analysis, to understand that democrats and republicans will overwhelmingly support their party’s nominee for president, and so increasing the proportions to favor one party in representation will unavoidably skew the results in favor of that party’s nominee.

I have noted before that history shows a remarkably stable proportion of party affiliation, the democrats generally outnumbering the republicans by between 2 and 4 percent. A poll, therefore, which assigns 10 to 15 percent higher participation nationally by democrats in a presidential election is simply unsupported by any historical sample in decades. This raises a valid question, though: Why then are so many people taking part in polls calling themselves democrats? The answer to this question is important to understanding not only why I believe the polls are wrong for the most part, and why the election strategies of Barack Obama and John McCain have always been different by need as well as design.

If you look at the kinds of arguments between democrats and republicans, especially between liberals and conservatives, you may note that the dialogue generally breaks down early. This is not only because common ground is so hard to find, but because the motivations are different. Someone rang a bell in my head earlier this season, when they noted that Obama supporters generally support him because of how he makes them feel. The person I was speaking with, was explaining this as one reason why Hillary Clinton did not win the primaries early on; she did not make democrats feel excited the way that Barack Obama did. I have also noted that people who are still undecided, frequently say that they have not yet made up their minds, that they have questions for which they want answers from the candidate they are considering. In a nutshell, these are the two types of decisioning with voters; some make their decision largely on emotion, while others make their decision largely on intellect. That’s not a democrat/republican thing all the time, nor is it that one type produces winners more often than the other, and it’s not that people are one or the other; I believe we all react both emotionally and intellectually for or against a candidate, and our personal balance makes the decision. But it does explain how support is collected for a candidate, as the emotional commitment is made far earlier than the intellectual buy-in; in fact I suspect that almost all last-minute deciders are heavily influenced by intellect rather than emotion. If a candidate is charismatic he can win over the emotional base, but an experienced candidate is more likely to claim the intellectual base. When a candidate is able to address both types well (a Reagan or an FDR, for example) then you see landslides. If a candidate is grossly unqualified in one of those venues, then he may lose in a landslide (like McGovern or Goldwater).

If this theory is correct, then obviously Barack Obama has the advantage in emotion-based campaigning, while John McCain has the advantage in intellect-based campaigning. Obama’s lack of experience makes it very difficult to build a case for him on accomplishments; he simply has no resume. McCain’s lack of glamour makes it very hard for him to gain, much less hold, the attention of anyone not already inclined to give him a chance to make his case; he simply does not sparkle. The question at hand, however, is which approach is more effective in this year’s campaign. The polls would seem to indicate that Obama grabbed most voters’ attention, won them over, and they never gave McCain a serious look. That, however, ignores an obvious side-effect of the emotion-based voter. Pollsters this year have – among themselves – remarked about the difficulty in getting responses from people they contact. Some of this is blamed on new technology and the fact that many people spend less time at home to be contacted, but it is also an important historical fact that democrats have traditionally always been more interested in taking part in polls than republicans, and this year the emotion-based voter is much more inclined to take part in a poll to discuss how he feels, than an intellect-based voter who wants to make sure of his vote before he tells anyone else about it, and who in any case has no particular interest in talking to a stranger about how he feels. As a result, the polls may be feeding off their own assumptions, using the circular logic that the results from their skewed polls justify the bias. If I am correct, more than a few polling groups will be doing a lot of work come December and January to try to figure out what went wrong. The fact that the practices at these groups do not include applying a Deming loop, is a warning sign they missed years ago, so I am skeptical about their ability to learn. The worst-case scenario from my point-of-view, is that the effect of these invalid polls might dismay republicans enough to stay home and not vote, creating the sort of disparity in voting patterns to indicate the polls were right, so that they might never consider that their bias could be creating the effect. We shall see. Obviously, I have to admit that I could be wrong, so in the event that Obama wins all 60 of the states he calls America (where he gets the additional ten states, he has never made clear), I will be reviewing my own work in the interest of honesty and that same Deming loop I was just talking about.

I would like to make a few points in closing this article. If I am wrong and Obama is really crushing McCain, there are certain indicators which will show this. First, the early voting should be much, much heavier among democrats than republicans, and the youth vote we heard so much about should be a big part of the early voting. Second, we should start to see pan-demographic support in the polls for Obama in all geographic and age groups, since this happened in Reagan’s 1984, Nixon’s 1972, and Johnson’s 1964 landslide victories. And third, since Obama enjoys strong support in heavy-population states like New York and California, if he is going to collect 340 or more electoral votes, we should see evidence indicating he will reach 57 or 58 percent popular support nationally. Conversely, strong republican turnout in early voting is an indicator of stronger McCain support than has been indicated. If McCain continues to hold support in the same demographic groups he held in mid-September, again this indicates a much closer race, and since so many of the “red” states are less dense in population, any indication of a close popular race would support expectation of a close electoral race as well, since McCain could strategically win the electoral race as Bush did in 2000, with less popular support nationally than Obama but winning the necessary electoral votes.

Friday, October 24, 2008

Accountability

Across the top of my personal blog, I have the following axiom: ”A man must be accountable, else everything he does counts for nothing.” I did not write that in hopes of lecturing anyone, so much as it was to remind myself that I am responsible for what I write, and that I have a duty to remember what I owe to my readers and who I am meant to be. I put that at the top of my blog after I received an email from a Marine on active duty in Iraq, who had read my work and found it uplifting. I owe that guy an honest report every time, as do I owe it to everyone who takes the time to read my thoughts and analysis. I’m not perfect at it, but it’s there to remind me what I’m doing here as a blogger.

So, this week I have suddenly received a lot more attention than a guy like me ever usually gets, because some of the bigger luminaries have mentioned me. Some on the left have misquoted me and mocked me (so now I can give folks an accurate sense of how Sarah Palin must feel after a Couric interview), some on the right have taken comfort in my work, and some are just giving readers a chance to hear me out (thank you, Mr. Blogosphere).

Some folks cannot resist sending me emails to reinforce their point. A few folks, in emails and in comments to articles, have been trying to goad me into betting on the election. One fellow in particular tried to claim that if I did not put money up against him, that this would belie my ‘accountable’ claim. Of course he’s quite wrong; betting is not about accountability, it’s about greed, and about the morality of gambling on the outcomes of pivotal events in human history. I suppose such people could find an excuse to gamble on anything. Perhaps such people watch the news for a body count from the day’s murders, in some grotesque version of a ‘numbers’ game. Perhaps the war in Iraq and Afghanistan is, for them, amusement and an opportunity to gain some coin. Perhaps for every family praying for a lost child to return, there is a gambler putting money down on when the body will be found. For me, a national election is a solemn duty, a responsibility to put the best-qualified candidate into office, not an occasion for turning profit or focusing on personal gain.

The reason I am writing today, is that accountability is something this country badly needs to see more often from leaders. The present financial crisis has come about because so many people in finance, banking, and Congress have hid their actions and lied to cover their tracks. We are threatened by enemies who had America for it’s existence and founding principles, yet there are those whose first cuts would be against the defenders who have prevented 9/11 from happening again. It is up to us who are regular citizens, to speak out against the lies and for the defenders of our nation and its infrastructure.

As I wrote before, Accountable Americans understand duty. Accountable Americans respect sacrifice. Accountable Americans recognize the men who have put others first, and they recognize those who would put themselves first, and they are not fooled. In times of hardship, it is not McCain but Obama who would increase taxes and create higher unemployment by making it harder for smaller companies to hire and keep employees. It is not McCain but Obama who would increase opportunities for illegals in the U.S., who would principally take jobs held by minority citizens. It is not McCain but Obama who has taken hundreds of millions of dollars from private sponsors, whom he refuses to even identify, since the public would naturally wonder what sort of promises he made to get all that money.

It is not Obama but McCain who has suffered in service to his country. It is not Obama but McCain who has given generously to charity out of his own pocket. It is not Obama but McCain who has fought to protect the lives of unborn children. And it is not Obama but McCain, who when made aware of vicious comments about his opponent, immediately, directly and repeatedly called for his campaign to operate by ideals of respect and courtesy, asking hard questions but refraining from personal smears. These facts are undeniable, and make clear which sorts of character are present in each man.

Accountability means we recognize which candidate hides where he gets his money, and which has been honest about his past. Accountability means we vote according to what is right for the nation, not what we can get out of it. Accountability means we think about our families and the ideals of our nation, not the opportunity to punish some folks for success and threaten those who ask inconvenient questions.

Thursday, October 23, 2008

The Secret Poll October 23 2008

Hello again, and time for another edition of the Secret Poll. The election remains winnable by either John McCain or Barack Obama, the keys coming down to turnout, the independents, and just plain not giving up. It will take at least 10-14 days for results to show up in the polls; anything showing up in the near future will be in response to prior activities and statements, and some things done now may be too late to show up in time for voters to recognize their meaning. Then again, Obama's 30-minute infomercial on November 3rd could be important in making up minds.

So, once again, here’s the recap of where I think the true numbers have played out, and where we are now:

August 31: McCain 41.77%, Obama 41.06%

September 7: McCain 42.45%, Obama 42.04%

September 14: McCain 45.71%, Obama 39.62%

September 21: McCain 44.48%, Obama 42.06%

September 28: McCain 42.73%, Obama 41.62%

October 5: McCain 44.09%, Obama 43.96%

October 12: McCain 42.68%, Obama 45.31%

October 19: McCain 43.49%, Obama 46.03%

McCain regained support among independents, but republicans are beginning to fade, possibly believing the ‘cannot win’ hype. The undecided portion remains very important at 10.48%, and with just two weeks to go it appears that the undecideds are waiting to be sold on a candidate, or they may decide not to vote. The keys, again, are the following:

Turnout – if one party clearly does a better job getting its base to vote, that party will clearly win. More than ever, your vote matters.

Independents – Right now, the Independent vote is essentially tied, with about 24% of Independents still undecided. Whoever wins the most of that vote will win the election.

Undecideds – Overall, 10.48% of voters are still undecided. It’s slowly resolving itself, but there will still be a large pool of voters waiting to be convinced just before election day. Finishing strong could make all the difference.

Cooking Polls – State Poll Edition

My articles on polling have received a great deal of discussion. Most of it has been emotional in focus rather than factual, and I am amused by those blogs which reference the articles but ignore the context and substance in order to attack something I never even said. There have been a few reasonable questions, however, and one of them addresses the state polls. The national polls are all over the place, but what about the state polls? Don't they show Obama leading in most states, and don't the state polls basically agree? Those questions are good ones, and so state polling is the focus of today's article.

The first thing that jumps out at you if you read the state polls, is that there are a lot more polling groups doing polls at the state level, than at the national level. Also, most of the polling groups which do national polls, do not also do state polling, probably because it is expensive and difficult to try to cover all of the states on a consistent and timely basis. I have written before that national polls often focus on urban centers, which means that many of the states would require a functionally different methodology to work than what is used nationally. State polling tends to be smaller in respondent pool size, smaller in budget, and less frequent. Some polling groups only do one poll for the whole campaign, and it's common for even major groups to do a poll only once a month.

So anyway, I'm looking at the state polls and I notice that there's there's quite a range of opinion there, just like the national polls. California, for example, is all about Obama, but it ranges from +24 down to +16, which is statistically significant. No, it hardly means Cali is in play, but that degree of variance in a deep blue state indicates on the state level much of what I have been noting on the national level. Moving ahead alphabetically, Colorado looks pretty stable, but on the other hand RCP only shows a single poll done there all month. Something to think about, that. Even the Obama people admit Florida is hard to call, polls there taken in October show anything from Obama +8 to McCain +5. Indiana is just as weird, running from Obama +10 to McCain +7 in October polls. Iowa is like Colorado, an important state but with only two polls done there this month. Minnesota is strange as well, with a range of 18 points between reports from the eleven polls done there in October. Looking at Missouri, there have been nine polls done and they range from Obama +8 to McCain +3. Even in safe states there's some hinkiness, as New Jersey has a range of 15 points between polls taken in October. Like California, it's not in doubt but the volatile range of results is sending a signal about the polls' validity, just like the national polls. I don't think I need to go through all of the states to show what I'm saying here, go check out RCP and drill down to specific polls on specific states. The state polls are showing the same volatility that I noted in the national polls, and there's likely a common reason for it.

A reader mentioned Survey USA earlier this week, and I'd like to use them as an example of what I mean. First off, I like Survey USA for making internal data available; it really helps me take apart their process to see what they were thinking. And I found an interesting trend, something which is consistent with the national polls and which explains both the volatility and the invalidity of the current model.

2006 was a bad year for republicans, a year when republicans stayed home and democrats used the opportunity to win a number of close races and take over control of the House and Senate. In a number of states, therefore, it's not surprising that democratic party supporters gained a few points (usually 1 to 3 points) relative to 2004 in voter participation. So I went back and looked at voters by party affiliation, and compared those balances to this year's weighting by Survey USA. In thirty-six states, the party affiliation weights for democrats used by SUSA was five points or more higher than in 2006, a high-water mark for democrats. In twenty states, the party afiiliation weights for democrats used by SUSA was ten points or more higher than in 2006, and in eight states, the party affiliation weights used for democrats by SUSA was thirteen points or more higher than in 2006. Significant battleground states affected by this bias are as follows:

Pennsylvania: D+5 in 2006, SUSA using D+19, 15 point variance
Indiana: R+14 in 2006, SUSA using R+1, 13 point variance
Nevada: R+7 in 2006, SUSA using D+6, 13 point variance
Colorado: R+3 in 2006, SUSA using D+9, 12 point variance
Iowa: R+2 in 2006, SUSA using D+10, 12 point variance
Virginia: R+3 in 2006, SUSA using D+9, 12 point variance
Ohio: D+3 in 2006, SUSA using D+13, 10 point variance
Missouri: R+1 in 2006, SUSA using D+7, 8 point variance
North Carolina: R+1 in 2006, SUSA using D+5, 6 point variance

I've looked at the publicly available records on historical election participation, 2008 new voter registrations, and the Census information on these states, but I can find no valid reason for such large and arbitrary changes in political affiliation weightings. I would therefore submit that the models being used for many of the state polls have design flaws, which threaten the credibility of their published results.

Wednesday, October 22, 2008

Odds at Ends – The Pew and Battleground Polls, with a Gallup Chaser

I’ve laid out a pretty harsh accusation against the polls this year, by claiming that all the major polls are far from accurate. The cause of this, in essence, has been that the polls made some key assumptions about turnout, the independents, and the undecided voters. Assumptions which they never tested, and now are finding cannot be trusted. Poll results vary wildly from one another, and not just at different times. The variance for current polls listed at Real Clear Politics for this morning ranges from the Pew poll which advertises a 14-point lead for Obama, to the Battleground poll which says the lead is only 2 points. The variance is too great (and there are polls relatively close to both ends, demonstrating proof of statistical invalidity for the published confidence level) for even the casual observer to accept as a reasonable. There are four polls which show a 10 point lead or greater for Obama, and another five which show a 6 point lead or less. It is mathematically impossible for so many polls to be valid, yet disagree to such a degree with valid methodology. I said this when McCain was ahead, again when Obama climbed in front, and I am repeating it yet again. The starting point to discussing the polls this year, is understanding that the methodology in common use is flawed, and is producing results which cannot be depended upon.

A quick word here about validity. Opinion polling relies on statistical math, which depends on certain key tests. When a group of respondents exceeds a certain size, a pattern of responses emerges which is generally symmetrical, with few outliers. A p-test can be done to confirm that the results are consistent with the requisite conditions. This produces what is known as a confidence level, which in common words means the likelihood that the process, if repeated using samples from the same data pool and using the same method, will result in the same conclusion. This is called reproducibility, and it is the most signifcant test for human behavior testing. The confidence level basically sets out how often the same results should be expected to repeat. The most common published level of confidence in opinion polling is 95%, which predicts that the same method used at the same time will produce results within the published margin of error no less than 19 out of 20 times. RCP is listing twelve major polls with variances from each other which cannot be covered by the MOE, which proves the model is invalid by its own definition. One poll as an outlier could be explained, but the range is too great to explain the variance between the rest.

People have specifically asked me about the Pew and Battleground polls, since their twelve-point variance is the largest. I will have to say that in both cases, the error has been the same – disregarding historical norms in favor of introducing subjectively chosen demographics, things like over-sampling urban areas, younger voters, and democrats which creates a false image relative to the voting demographic. But to get a sense of the numbers, I would like to examine the Pew, Battleground, and Gallup polls in the context of their direct movement, and in reweighting their party affiliation to historic norms.

First, here are the recent results from Pew:

Sept 17 – 46% Obama, 44% McCain, 10% undecided
Sept 30 – 49% Obama, 42% McCain, 9% undecided
Oct 13 – 50% Obama, 40% McCain, 10% undecided
Oct 20 – 52% Obama, 38% McCain, 10% undecided

Pew is showing what is effectively a zero-sum game, with Obama gaining directly at McCain’s expense, with around 10 percent remaining unsure each time.

Now, the Battleground trend:

Sept 25 – 45% Obama, 47% McCain, 8% undecided
Oct 3 – 45% Obama, 41% McCain, 14% undecided
Oct 9 – 48% Obama, 38% McCain, 14% undecided
Oct 16 – 47% Obama, 40% McCain, 13% undecided
Oct 22 – 49% Obama, 47% McCain, 4% undecided

Battleground shows support movement more independent between the two candidates, and the most recent undecided numbers are much lower than what we saw before.

OK, with that in mind, let’s check the Gallup numbers for the same range of dates:

Gallup Daily Tracking
Sept 14: Obama 47%, McCain 45%, 8% undecided
Sept 21: Obama 48%, McCain 44%, 8% undecided
Sept 28: Obama 50%. McCain 42%, 8% undecided
Oct 5: Obama 50%, McCain 42%, 8% undecided
Oct 12: Obama 51%, McCain 41%, 8% undecided
Oct 19: Obama 52%, McCain 41%, 7% undecided

This model appears to be similar to Pew’s, zero-sum balancing with a constant undcided portion.

Gallup ‘expanded voter’
Oct 8: Obama 52%, McCain 43%, 5% undecided
Oct 15: Obama 51%, McCain 45%, 4% undecided
Oct 21: Obama 52%, McCain 42%, 6% undecided

This model allows a smaller undecided portion, suggesting that undecideds are pressed for a clear decision. Also, Gallup has admitted that this model has no precedent, and uses over-samples of urban and youth voters, in the presumption that they will sharply increase participation this year.

Gallup ‘traditional’
Oct 8: Obama 50%, McCain 45%, 5% undecided
Oct 15: Obama 49%, McCain 47%, 4% undecided
Oct 21: Obama 51%, McCain 44%, 5% undecided

This model removes the urban and youth voter overweights, but otherwise is the same as the ‘expanded voter’ model. This is because Gallup abandoned the true historical model, and so can only attempt to recreate it to some degree by using data sets it knows have been corrupted by invalid methodology.

Now, let’s see what happens to these results when the internal data is reweighted to historical party affiliation norms:

Pew:
Sept 17: 46-44 Obama becomes 45-46 McCain
Sept 30: 49-42 Obama becomes 48-44 Obama
Oct 13: 50-40 Obama becomes 48-43 Obama
Oct 20: 52-38 Obama becomes 50-42 Obama

Bear in mind that this accepts Pew’s polling methodology, which may have over-sampled other demographics besides just democrats. For example, in the Oct 20 poll Pew undersamples seniors and oversamples the 50-64 age group, oversamples high school only education by a large amount, and fails to note regional breakdowns or the urban/suburban/rural split. These are critical points which Pew fails to address, and which hshould make the reader wary.

Battleground:
Sept 25: 45-47 McCain becomes 44-47 McCain
Oct 3: 45-41 Obama becomes 46-44 Obama
Oct 9: 48-38 Obama becomes 48-41 Obama
Oct 16: 47-40 Obama becomes 49-42 Obama
Oct 22: 49-47 Obama (no internal data to reweight yet)

Battleground does not change much, but this reinforces that something odd has happened in the recent poll, most likely among independents.

Next to check is the reweight of Gallup’s polling:

Gallup Daily Tracking (d=daily, e=expanded, t=traditional)
Sept 14: 47-45 Obama (d) becomes 43-44 McCain
Sept 21: 48-44 Obama (d) becomes 43-42 Obama
Sept 28: 50-42 Obama (d) becomes 43-42 Obama
Oct 5: 50-42 Obama (d) becomes 44-41 Obama
Oct 12: 51-41 (d), 53-43 (e), 51-44 (t) Obama becomes 47-39 Obama
Oct 19: 52-41 (d), 52-42 (e), 51-44 (t) Obama becomes 45-42 Obama

Note that the reweights of these polls using historical norms are much more consistent with each other.

The next thing I suggest is looking at comparable metrics. First, base party support:

Support for Obama by Democrats
Pew: 87% 92% 91% 91%
Battleground: 82% 81% 83% 86%
Gallup: 86% 86% 86% 87% 88%

Support for McCain by Republicans
Pew: 90% 86% 91% 89%
Battleground: 85% 82% 80% 81%
Gallup: 84% 84% 82% 82% 84%

Note that Pew reports the highest support within each party. Note also that McCain’s support by republicans is reported to be dropping.

Next, independent support:

Independents

Pew: (for Obama) 38% 38% 45% 51%
Pew: (for McCain) 45% 46% 37% 33%

Battleground: (for Obama) 32% 38% 41% 43%
Battleground: (for McCain) 40% 36% 34% 33%

Gallup: (for Obama) 22% 22% 23% 33% 27%
Gallup: (for McCain) 31% 31% 32% 25% 34%

Note that Pew reports a commanding lead for Obama among independents, while Gallup shows McCain in consistent advantage.

Undecideds

Pew: 08% 07% 08% 07%
Battleground: 15% 15% 15% 14%
Gallup: 16% 16% 15% 14% 14%

Battleground and Gallup agree that there is a lot of the population still waiting to be won over. The election is certainly well within any reasonable boundaries of doubt.

Conclusion

It’s difficult to work with limited internal data, especially when the polling group has altered more than one category of demographic. But it is interesting to note that when the reported data is reweighted to a consistent historical norm, that even the varied results from three different polling groups start to trend in the same way. Obama supporters can take comfort from the indications that his lead stands up to inspection, albeit not as large, while McCain supporters can take heart in the evidence that the election is not at all decided, that the level of turnout, the choice of the independents, and which way the undecideds break (including the choice to stay home) is a vital part of the decision still to be resolved.

Pew’s data comes from here

Battleground data comes from here

Gallup’s polling reports come from here

Gallup’s internal affiliation numbers are reported here

Tuesday, October 21, 2008

Gallup and New Coke

The polls are wrong this year, very wrong. I have been saying this for months, and I have backed up my claim with both statistical and anecdoctal support. The claims I have made have inspired some, caused others to laugh in derision, and brought others to test their assumptions and revisit the hard data. Along the way, there have been a lot of questions about how and why the polls could be wrong. The most common complaint, is that for all of the polls to be wrong, there would need to be some sort of conspiracy, or else an incredibly stupid decision made across the board. Well, I am not a big believer in conspiracies, but I do think that the polling groups have fallen into a groupthink condition. I wrote earlier about the fact that of the major polling groups handling national and state polls, all of them are based deep in pro-Liberal, anti-Conservative territories.

Here’s that list of headquarters again, just to punch in that point again:

Poll Headquarters
ABC News 77 W 66th St, #13, New York City, New York
CBS News 524 W 57th St, New York City, New York
FOX News 1211 Avenue of the Americas, New York City, New York
Gallup 901 F St NW, Washington DC
Hotline 88 Pine St, 32nd floor, New York City, New York
IBD 12655 Beatrice St. Los Angeles, California
The Los Angeles Times 202 W 1st St, Los Angeles California
Marist Institute 3399 North Rd, Poughkeepsie, New York
Mason-Dixon 1250 Connnecticut Ave #200, Washington DC
Newsweek 251 W 57th St, New York City, New York
The New York Times 1 City Hall, New York City, New York
Pew Research Center 1615 L St NW, #700, Washington DC
Quinnipiac 275 Mount Carmel Ave., Hamden Connecticut
Rasmussen 625 Cookman, #2, Asbury Park, New Jersey
Reuters 3 Times Square, New York City, New York
Survey USA 15 Bloomfield Ave., Verona New Jersey
TIPP 690 Kinderkamack Rd, Oradell, New Jersey
Washington Post 1150 15th St NW, Washington DC
Zogby 901 Broad St, Utica, New York

As I wrote then, it needs noting that all of the major polling organizations are based in locations where liberals are strongest and conservatives weakest, where ‘democrat’ and ‘republican’ take on meanings wildly different from the rest of the country. The people making the executive decisions at these polls, most likely including the wording and order of polling questions, whether to focus on urban or suburban areas, the weighting of political affiliation, and the definition of ‘likely voter’, are most likely in regular contact and association with the most liberal factions of politics. It does not mean that they have deliberately skewed their decisions to support Obama, but it is obvious that there is an apparent conflict of interest in their process modality.

I want to stop here and direct the reader back to the ethics of polling. The National Council on Public Polling is, and I got this from their site’s welcome page, “an association of polling organizations established in 1969. Its mission is to set the highest professional standards for public opinion pollsters, and to advance the understanding, among politicians, the media and general public, of how polls are conducted and how to interpret poll results.”

NCPP members identified on the front page include ABC News, CBS News, Gallup, Hotline, Ipsos, the Los Angeles Times, Marist, NBC News, Pew, Princeton Survey Associates, and Survey USA, along with many others. This effectively promises that the major polls will abide by NCPP rules, something you should consider when matching the principles against the polls’ actual statements.

The NCPP has also posted a list of “20 Questions a Journalist Should Ask About Poll Results”, which I strongly recommend every one to read and memorize. Those questions include these very important queries, that I fear most people do not often consider:

2. Who paid for the poll?
In many cases, the poll we see in the papers and on television, was paid for by an agency known to be biased. For example, does anyone really expect CBS News or the New York Times to be even-handed, especially in light of their behavior since 2002?

7. Who should have been interviewed and was not? Or do response rates matter?
This is a sore spot for polling groups, because frankly most people do not have the interest to stop and take an 8-to-10 minute interview, especially from someone they do not know calling them up when they are likely to be busy doing something else. It’s been established as well, that democrats in recent years are more willing to take part in polls than republicans, possibly due to perceived bias on the part of the media. But it is quite important to know if the pollsters were getting one person in ten to take the poll, or only one person in fifty, because the people not interviewed matter just as much as those who do participate. Yet I have never yet seen a poll this year that publishes response rates.

14. What questions were asked?
This is a big one that a lot of folks miss. I have noticed in the details, that all of the polls are asking about the public’s opinion of the economy, and of their opinion of President Bush, even though he is not running this time. Also, I have noticed that many polls ask a question about John McCain just after asking about the voter’s opinion of President Bush, subtly linking the two men. For comparison, no questions have been asked about approval of the specific performance of either Majority Leader Reid or Speaker Pelosi, and no other politician is linked to Barack Obama in the same way that polls link President Bush to John McCain. This is a clear violation of the NCPP’s guidelines, yet it is done in absolutely every poll I have seen. Further, polls taken since Labor day have not mentioned foreign policy at all. There are no questions regarding Russia’s invasion of Georgia, nor of Iran’s nuclear weapons programs, nor about China’s intentions viz a viz Taiwan, even though these are current events which have great significance in a presidential race, yet all of the polls are ignoring them. Again, the economy-only focus betrays a bias which violates the principles of the NCPP.

I have already written extensively about polling groups manipulation of demographic weights, so I will only summarize here that in addition to party affiliation, various polling groups this year have produced polls out of demographic balance with Census norms for urban/suburban/rural participation, minority race representation, age, employment status, and income range. It should not be difficult to imagine how these manipulations might invalidate the results published by the polling groups.

When people reach this point in the discussion, an obvious question comes up; surely the polls want to be accurate, and they would have to understand that this fiddling with internal data to create a false image would destroy their credibility? And the answer to that can be phrased in a two-reminder of just how stupid people can be – “New Coke”.

In 1985, the Coca-Cola company dominated the beverage industry around the world, and it’s flagship product was its first, the Coca-Cola soft drink, literally an icon of Americana. It would seem to be the most obvious of strategic decisions, to leave the base of the company alone. Instead, in a move never explained let alone justified by the company, Coca-Cola announced that they were eliminating Coca-Cola, and replacing their number 1 product with a new formula, called “New Coke”. Everything about the promotion was an unmitigated disaster, and later that year Coca-Cola re-introduced what they claimed was the “original formula”, named “Coke Classic”. The company tried to push “New Coke” on a public that never wanted it, and eventually gave up the next year. The “New Coke” strategy and promotion have become textbook lessons on the worst possible way to listen to customers and meet their expectations. Pretty much everything was done the wrong way, especially the arrogant way that Coca-Cola assumed their customers would accept the elimination of their favorite drink. Near as I can figure it, the essential problem came down to the fact that the company’s marketing people made all the key decisions internally, without once stepping out into the real world to test their assumptions. What seemed a great idea in development, failed miserably in Reality. Obviously, Coca-Cola never wanted to enrage its customers, to drive them to Pepsi, or to put a bullet in their stock value, but that all happened because they made an incalculably stupid strategic decision, and they lacked an effective Deming loop to test assumptions and correct the process.

This is actually not all that uncommon in business. So many people saw Enron as a company made up of crooks, that they failed to notice that Enron did have a Code of Ethics; the problem was similar to Coke, in that too many people never tested their assumptions, and by the time they realized something was wrong, it was too late to repair the damage.

This brings us back to the polls. The thing most folks forget about polls which get published in the media, is that the polls’ first need is not to accurately reflect the election progress and report on actual support levels; it’s about business. A poll needs clients to survive, and the media – always – wants a good story more than they want facts. So polls sell that story, and what would actually be a gradual development of support, with modest changes brought about as the public learned about candidates’ records and positions, is instead sold as an exciting roller-coaster race, careening madly all over the place. If a candidate appears to be popular and charismatic, he might be allowed a strong lead, or the poll might tighten things from time to time just to keep attention on the polls. That’s where that whole “bounce” thing after the conventions comes from – do you really think republicans or independents got more excited about Obama because of his convention, or that democrats and independents were more likely to vote for McCain because of the GOP convention? When you think about it, it should be obvious that these bumps are artificial unless there is a clear cause to show a change in support. And when you take apart the polls and drill down to the raw data, what you find is a close race with a gradually declining but still large pool of undecided voters, which is consistent with the known facts and actions we see from both campaigns.

Obviously, though, the polls want to finish as close as possible to the actual results, but this year they have a problem. There has been unprecedented manipulation of demographics, corrupting even the raw data to the point where effective resolution of public opinion is doubtful. This might be described as an honest mistake, if one is willing to accept greed as an honest motive. Gallup, for example, who has more experience than any other polling group and who therefore should have known better more than anyone else to fiddle with the weights. In several past elections, Gallup and other polls have learned from operational blunders.

In 1948, Gallup screwed with the weighting, assuming the republicans would turn out much in much larger numbers than the democrats, but they were wrong, and badly miscalled the election. In 1952 Gallup assumed the other way, that the race would be tight and down to the wire, but they blew that call as well. In 1976, Gallup assumed the opposite, that democrats would overwhelm republicans because of Watergate, but when it became obvious that republicans would vote anyway, Gallup had to change its model to show their participation more accurately. In 1980, Gallup called Carter ahead until the very end, when they grudgingly granted Reagan a small lead, yet another case where Gallup’s assumptions were well off the mark. In 1996, Gallup overstated Clinton’s support and understated Dole’s support throughout the campaign, and in 2004 Gallup called the race too close to call. This year, trying to gauge the effects of Barack Obama’s ‘rock star’ charisma, Gallup decided to abandon historical norms and overweight urban and youth voters, and to over-sample democrats all campaign long. This model, dubbed the “expanded voter”, has proven a disaster for Gallup, so much so that the group reintroduced a more historically balanced model, which they call the “traditional” model. The problem for Gallup, however, is that their methodology became so skewed throughout the campaign up to now, that it may be impossible for Gallup to correct its procedures before the final election poll. In the light of past blunders, this year missing the call may not be unreasonable at all to expect.

So OK, Gallup is having a bad year, but what about the rest? Well, there the phrase to consider is follow the leader. Gallup has been doing this stuff for longer than anyone else, and the other polls have often fallen into the habit of chasing what they see Gallup do. But for an objective look at their performance, I direct you to another of my past articles, where I noted the NCPP’s record on poll accuracy. From what I see here, if Gallup is having problems, it’s likely just as bad or worse for everyone else.

So, could I be wrong? I have to be honest and admit that I could. But in that case, we’d have to ask why the polls do not generally agree with each other, why Gallup is trying to spin three different models at the same time to get a grasp of the picture, why McCain and Obama are both so interested in Pennsylvania, yet neither is working very hard in Ohio right now. We’d have to explain why McCain-Palin rallies are now attracting thousands more people than Obama-Biden rallies, why Letterman suddenly found it cool to have McCain on his show and SNL decided they wanted Palin on theirs. We’d have to explain why there are not a lot of Obama signs visible, but we hear about his army of lawyers getting ready. We’d have to explain why McCain and Palin appear to be so relaxed while Obama and Biden look like they’re worried.

What I think is happening, is this – the polls’ headquarters were based deep in liberal territory, where the assumption was that Obama’s candidacy would actually create a groundswell of pro-democrat voters unseen in the country since 1932. That McCain is more experienced with the key issues than Obama was ignored, that the historical significance of the debates shows that the effects appear several weeks later was also ignored. That the economy could be as reasonably blamed on the democrat-controled Congress as on the republican President was never considered. That character would be a salient factor in the decisions of voters was rejected out of hand.

The polls are wrong. Make your own mind up, because your vote will matter.