Stolen Thunder: 10/26/2008

Saturday, November 01, 2008

Funky Gallup

Now that the election is entering its final days, I had expected the polls to start tightening the race in order to reflect actual demonstrated conditions. For several days this has been happening in a number of major polls, but today Gallup posted a surprising number; they show Obama leading McCain by double-digit margins in all three models of their polling.

I will admit that when I first saw this, I was shocked and a bit dismayed. For all the criticism I have thrown at them, Gallup has always appeared to me to be the most professional of the polling outfits, and it they showed such a strong and consistent Obama surge at the end, then maybe I was wrong and we should expect a rout to conclude on Tuesday.

Then my brain kicked in and said , ‘hold on there, wait just a minute’. You see, there are some weird things going on here with Gallup, and yes they are important. First off, Gallup used to be simple enough; they took a poll and announced the results and internals, just as they have for decades. But this year, Gallup is running three different models, one in which they have admitted punching in inflated youth and minority race participation at unprecedented levels (their ‘expanded voter’ model). They stepped back from that when it became obvious that this model was giving numbers which did not jibe with any reasonable judgment, and tossed out a ‘traditional’ model which played the numbers with a more nominal weighting. So, for some time now we have seen three models, which have tossed out a range of support in which the ‘expanded’ model favors Obama more than the ‘traditional’ model. Yet today we see Gallup claiming 52-41 Obama in its Daily Tracking of all registered voters, 52-42 in the ‘expanded voter’ model, and 52-42 in the ‘traditional voter’ model.

Now, stop and think about why that almost has to be bogus. First, Gallup is saying that McCain lost 5 points of support and Obama gained 3 points of support in just 5 days. Does that heavy swing of support make sense? And if it does, why does Fox say McCain gained six points in the last week?

And why does Zogby show that McCain led Obama in Friday’s one-day polling, yet claim that in three-day tracking he’s still down by 5 points?

You get the idea; the volatility of the polls is a warning sign that they are not to be trusted. The trends are going different directions, and they do not even always agree with their own headlines.

Weird.

But Gallup is getting a trip to the woodshed for today’s stunt. You see, they’re not being honest with you and I think I can prove it.

Gallup has been using three different models for their reports. The first one just takes registered voters and only weights it for Census norms. The second is the ‘expanded’ model which weights the results to show heavy participation by blacks and young voters. The third model is what they are calling the ‘traditional’ model, but in fact this is not the same as past years, but is just the ‘expanded’ model with the extra black and youth votes reweighted back to historical norms, but which fails to adjust for assumptions made in the polling process and respondent pool construction.

Now think about this. Gallup claims to be using three models, yet is claiming they are producing identical results, as well as showing volatile changes in both candidates’ support levels going into the weekend. How is this possible? The only way this can be possible, is that Gallup is claiming that youths and black voters are voting the exact same way as voters overall. There’s no real way that the math works out, otherwise.

And what does Gallup say about youths and the black vote? Well, starting with the youth vote, there is not much to say. Gallup has admitted that the youth vote is not doing anything special this year.

So we should be seeing the ‘expanded’ model recede a bit, not show Obama’s lead growing, at least not because of the kids. What about the black vote? Gallup is all kinds of geeked about the black vote this year, saying they expect about a three percent increase from 2004 participation. OK, I can agree with that, but since Gallup has said they were already weighting blacks more heavily in their ‘expanded’ model, how do they explain that model surging this week, and why would the other models change as well? Frankly, the most likely possibility is that Gallup has recognized that their polling methodology used this year was in line with the ‘expanded’ model they made so much of earlier this year, and they are simply reinforcing the oversamples in anticipation of a rout which may not in fact exist.

Gallup is also getting goofy on another count: Early Voting. We’ve been hearing three things all this season about turnout – first, that we should expect around 130 million voters this year, that early voting will top 30% of all voting, and that the youth and black vote will break records this year. Gallup is reporting that as of October 31, 27% of their respondents say they have already voted and another 8% say they will vote early. Got those numbers? OK, with them in mind, let’s go visit Dr. McDonald again.

Dr. Michael McDonald at George Mason University has been tracking the early voting results. Now, we are not going to see exit polling data before the polls close on November 4, much less the actual election results, but we are getting some interesting details. Once again, I recommend everyone spend some time at his site to see the numbers for yourself.

OK, so looking at the numbers as of Saturday at 5:54 PM Texas time, we see that a total of 22,498,237 votes have been cast in early voting, known absentee and in-person votes combined. Now, if Gallup is right and 27% of the voters have done it already, that projects a total national vote of 83,326,804 voters, or a drop of 33% from 2004’s voting tallies. Dr. McDonald’s numbers come from the states’ official offices, so they’re as reliable as you will find. So, you have a choice of believing that only 83 million people are going to vote this year, or Gallup is wrong to claim that 27% of the voters voted early. If the actual tally is 130 million, then the early voters only made up about 17% of the total voters, and November 4 is going to be a madhouse.

And about that 8% who have not yet voted but plan to vote early? If we’re going to get to 130 million, then the 17% who have voted early did so over about a two-week period so far, or just about 8.5% a week. With that pace, three days of potential ‘early’ votes would project about another 3.6% of eligible voters will actually vote early, assuming the same early voting conditions exist.

So, Gallup’s assumptions about early voting may not be as big as they expected. Before I discuss what that means for November 4 conditions, let’s consider the black vote and the early voting so far.

Dr. McDonald shows that nine states are reporting voting by party affiliation, and three by racial demographic (only North Carolina is reporting results by age group, and as was reported earlier, the kids are not showing up this year either) . Among black voters, turnout where reported is indeed healthy.

Georgia is reporting that 35.1% of its early voters are black (versus 29.9% of the population and 25.7% of all registered voters), Louisiana is reporting that 36.3% of its early voters are black (versus 31.7% of its population and 31.2% of all registered voters), and North Carolina is reporting that 26.3% of its early voters are black (versus 21.7% of its population and 20.7% of all registered voters). So for those three states, early voting is averaging 4.8% ahead of population levels and 6.7% ahead of registration totals. Given the 11% representation of blacks relative to the total voter participation in 2004, an increase of 6.7% to that demographic would raise their portion of the total voter poll to 12%. Therefore, the demonstrated performance by blacks in early voting this year does not justify the heavy weighting used by Gallup.

Now, let’s look at that early voting number. Nine states are reporting participation by party affiliation. Here’s how that turns out so far:

Colorado: D 37.7%, R 35.9% (registration 32.8% D, 33.1% R)
Florida: D 45.6%, R 37.8% (registration 42.0% D, 36.1% R)
Iowa: D 47.3%, R 28.8% (registration 32.4% D, 27.8% R)
Louisiana: D 58.5%, R 28.4% (registration 52.5% D, 25.3% R)
Maine: D 42.9%, R 28.2% (registration 31.1% D, 28.1% R)
Nevada: D 49.6%, R 33.0% (registration 44.0% D, 35.6% R)
New Mexico: D 53.4%, R 32.9% (registration 50.1% D, 31.7% R)
North Carolina: D 51.8%, R 30.0% (registration 44.8% D, 34.3% R)
West Virginia: D 59.4%, R 31.5% (registration 55.7% D, 29.2% R)

For these nine states on average, the democrats are early voting at a rate 2.6 points higher than their registration, while republicans are early voting at a rate 3.4 points lower than their registration. Since the early voting currently represents 17% of the anticipated turnout this year, this works out to a total voting advantage by party of 1.02 points. Obviously, if the democrats enjoy a similar +2.6 to -3.4 turnout advantage in actual voting on November 4th, this would inflate their party advantage (assuming democrats support Obama in equal degree that republicans support McCain) by six points, which appears to explain Gallup’s sudden shift: Gallup has decided that the trend in early voting will be reflected in the November 4th turnout, which is a dangerous assumption, for the following reasons:

1. The 6-point advantage for democrats is reported in just 9 states out of 34 which have early voting; there is no clear information on party participation on the other 25 states which have early voting, and these numbers may be significantly different.

2. The record on early voting is too short to establish a statistically valid trend, but even the last two elections have shown significantly different levels of participation in voter turnout by party between early and election-day voting. There is no basis for presuming that early voting turnout will be reflected the same way on November 4.

3. Obama has urged his supporters all year long to vote early, while McCain has not made the same push. A slightly higher percentage of republicans this year than democrats have stated an intention to vote on November 4 rather than early.

4. Voters who participate in early voting will not also be participating in election-day voting. This datum is significant with regard to black voters. Black voters have been shown to be participating in the three states which release that detail, at a rate 6.7 points ahead of registration proportions. While increased participation overall by blacks may produce a modest increase (roughly 1 percent) to Obama’s support, the ceiling level of the black voter demographic necessarily means that black voter participation will decline significantly on November 4. Consequently, even if all other conditions are the same, republican participation on November 4 should be expected to improve measurably.

In conclusion, Gallup is assuming that because some democrats in some states are showing up strong in early voting, that this means a blow-out is coming. In truth the lower-than-expected totals of actual voting, combined with reports that no state so far is reporting blow-out numbers, demonstrates that the election is highly volatile and far from over, and depends as it has all along on the three key components of voter turnout, who wins the independent voter support, and which way the undecideds break. Don’t be fooled, this race is still red hot.

Friday, October 31, 2008

Things That Make Polls Go D’Oh

It should be obvious by now that I will never get a job offer from Gallup, Rasmussen, or Survey USA. I’ve been pretty hard on them regarding the way they’ve weighted their party affiliation demographics, and I have repeatedly pointed out that ALL of the major polls are failing to comply with NCPP standards for disclosure and transparent practices. Frankly, I once held polling groups in much higher respect than I can do right now. And besides reporting what the invalid polls mean for this election, I also feel compelled to warn readers that opinion polling in general has lost its ethical core. I hope it will return to its commitment to accuracy and honest reporting, but for now polling seems to have gone the way of responsible mainstream journalism.

Liberal critics of my articles, and those who still trust the polling groups because of past work which was accurate and appeared trustworthy, have asked a very legitimate question: What if I am wrong? Isn’t it possible that I just cannot accept that Obama is going to win this election, and I am grasping at straws for moral support? I would consider answering that they could be right and I could be wrong, but even then I’d have to start by asking for clarification on exactly what they mean to ask.

Do they mean the Associated Press/Gfk poll which says Obama will win by one, or the Pew Research poll which says Obama will win by fourteen?

Do they mean the Battleground poll which says Obama will win by three, or the CBS/NYT poll which says Obama will win by thirteen?

You get the idea. The polls simply do not agree with each other. And yes, those margins are significant evidence of invalidity. I read a professor’s blog earlier this week, who is assuming that since all the polls say Obama is going to win, then they really do agree with each other and the margins do not matter. He contends that the polls which show a close race are really just the low end of the range, the wide lead polls are the upper end, and the average is really how things are going now. These assumptions, however, are invalid because the confidence level tests show the polls do not agree closely enough to avoid evidence of collinearity, and if collinearity exists then the results of the poll cannot be accepted, regardless of whether they appear believable or not.

Also, each poll has its own margin of error, usually around three percent, which is to say that Obama and McCain could each be as much as three points lower or greater in support than the poll shows. As a result, any poll which shows less than a six point lead for Obama is, statistically, saying that McCain could possibly be winning. Whether or not McCain is shown to be in the lead is not statistically relevant, except that we can say the polls do not indicate a McCain lead outside the MOE. However, even then we have to be careful to note that because of the invalid range of poll results, no valid conclusions can be made at all. None.

We also need to observe what’s been going on with the poll trends. In the last ten days, for example, Rasmussen has shown swings of up to 5 points, or a half-point per day. He’s saying that more than a half-million people on average are changing their minds every day. Does this sound reasonable to you?

The latest Fox poll shows McCain closing six points in just a week. That’s 7.8 million voters changing their minds in that time. Has McCain’s campaign done anything different that would explain that shift to you? And if not, why is the poll changing so drastically now that the race is coming to an end?

Gallup is still admitting they are clueless, as they continue to publish three separate models of voter opinion. You really should ask yourself, if Gallup was on top of things this year, why did they trash the original model in favor of one using unprecedented demographic assumptions, then use that same data to backtrack and try to reflect a “traditional” model? What did they see that made it clear they were wrong? And having been wrong not once but twice in fundamental operations this year, why should you assume they got lucky on the third guess, which in any case is built on the same methodological decisions they have tacitly admitted were wrong before?

The first rule the NCPP says any journalist should ask about a poll, is who is paying for it. With that in mind, shouldn’t you be skeptical that the polls reporting the largest leads for Obama are sponsored by agencies known to be pro-Obama and anti-McCain, specifically CBS News, the New York Times, ABC News, the Washington Post, and Newsweek? And shouldn’t you wonder if the community of pollsters just might be letting itself be influenced by Obama’s big-dollar media machine? Half a billion dollars of media publicity is bound to have an effect, and why wouldn’t it affect people who run the polling groups? People like Zogby, who called the 2004 election for Kerry months before the actual voting? People like Scott Rasmussen, who is getting serious coin to sell the story of this election by subscription? One area where I can tell you I am clearly more worthy of your trust, is that no one is paying me anything for what I do on the blogs. Not a penny. So, while I’d like to be rich someday, it doesn’t look like I’m going to get there by blogging on polls, but that means that you will be getting my honest opinion, based on my reasoning and the evidence, not on what effect it will have on my bank account. Sorry, but a pollster who refuses to show internal data to the public is a mercenary, not a professional, and a pollster who lets any media outfit decide what questions will be asked, what order they will be in, and which respondents are appropriate and how/when they will be contacted, is a media whore and his analysis is inherently dishonest.

OK, that’s pretty harsh, and I want to emphasize that many polls are indeed trying to be professional and accurate, as much as the business will let them be. And even in the media whore groups, there are individuals who are honest and honorable (and probably miserable) and trying to put out a solid product. The problem comes from two directions. First, polling has become a business more than a profession, meaning that the guys directing the polls have become too willing to sell a story, even if that story is not exactly true. This becomes apparent when polls report shifts which are not caused by valid events, most easily seen in the phenomenon of convention ‘bounces’. It’s one thing to expect a party’s base to become energized when the nominee is finally known and he comes out formally in a way that shows confidence and capability, but in recent years the pollsters have also decided this somehow affects the opposing party’s support levels, a patently absurd notion on its face. I mean, what did Obama do at his convention that is supposed to have won over some Republicans, and just why should we believe that a number of Democrats, even briefly, supported McCain because he chose Sarah Palin for his running mate? That’s manipulation of the data, folks, and cannot be explained any other way. It’s been going one a while, that roller-coasting of the numbers, since polls in the media need to keep attention, and to do that they need to be exciting, even if it means being dishonest. They get away with it because they have a lot of time to worry about closing in on accuracy in the late weeks. Of course, some years they blow that, too. It needs to be said, repeated and repeated again, that polls blow the call by more than their published margin of error about 40% of the time.

The other problem is the Obama Machine. There are a lot of unprecedented conditions in this election, and I do not think the polling groups ever really sat down and thought about what the new conditions would be. Well, actually they did, but they did not test their conclusions, and as a result bought into some pretty tall tales from the Obama people. This year, the polls assumed the following things would be very different about this year:

1. Barack Obama being the first black to receive a major party nomination for President, black voters would be greatly motivated to register and vote, and this would swing decisively towards Obama. This led some polls to over-sample black voters, in the expectation that their influence would be more significant this year.

It’s true and false. Black voters have indeed become more motivated this year, but as a demographic group blacks have always been enthusiastic, and have always overwhelmingly supported the democrat’s nominee in presidential elections. As a result, it is mathematically impossible for black voters to significantly change the outcome of the election by supporting Obama. In a tight race, the increased participation could make the difference in some states, but nationally the effect is minimal and polling models should not be changed because of it.

2. Barack Obama would greatly inspire and motivate young voters to register and vote, and this demographic would swing decisively towards Obama. This led some polls to over-sample young voters and to count more newly-registered voters as likely voters.

This one has been difficult to prove, since only the actual election can confirm or disprove the theory. However, John Kerry saw a strong rise in democratic party registrations in 2004, in part due to the primary efforts of Howard Dean. This created an apparently significant advantage for the fall campaign, which was one of the reason that Zogby called the election for Kerry early in the summer. In the actual election, however, under-30 voters’ proportion of the vote did not change from the 2000 election, and many of the newly registered voters simply did not vote, which is also consistent with historical behavior. Accordingly, it is not reasonable to alter polling models to behave in a manner inconsistent with historical norms.

3. The combination of excitement over Obama’s campaign, coupled with the nation’s dissatisfaction with President Bush and the Economy would lead to a great increase in democrats’ participation relative to republicans, as more people would see themselves as democrats and republicans would be likely to stay home. This led almost all polls to report results which either left democrat-heavy respondent pools unweighted, or which weighted polls to reflect heavy democrat advantages.

As with rumor 2, this cannot really be confirmed or disproven until the election is finished. However, history indicates the rumor is unfounded. In 1976, the republicans were expected to be dis-spirited, Richard Nixon having resigned in disgrace just two years previously. This was one reason that just after the party conventions, Governor Carter of Georgia led Ford by 33 points, a blow out seemingly undeniable. Yet in the actual election, Carter won by only two percentage points, and some political experts believe that if the election had been held a week to ten days later, Ford would have won. Part of the reason was that republicans in 1976 did show up to vote, less than the democrats but in far greater numbers than pollsters had expected to show. The same thing happened in 1948, when democrats were supposed to have given up, yet the record shows something far different. If a poll’s model is based on known history rather than pure speculation, then that model should not deviate from historical norms.

In my opinion, the polling groups allowed themselves to believe unfounded myths in all three of the cases I just mentioned. But they also failed to consider the influence of the half-billion dollars being spent by the Obama campaign, the rock-star behavior of his cadre (and a comparable level of professional knowledge and interest in middle America) in influencing and intimidating the media and public image (‘vote for Obama or you’re a racist’), and the heavily-urbanized character of his campaign and publicity efforts. The polling groups failed to note the dichotomy between the tone of Obama’s early primary victories and the voter response as the campaign wore on, failed to adjust their weighting to reflect actual results from primary elections and track with historical norms in each state and nationally. A massive effort by the Obama campaign to cast this election as unprecedented resulted in every major polling group abandoning historical models to create unproven models based on assumptions. What we are seeing now is the result of these models failing as key assumptions fail.

More Fun With Poll Numbers

The election is coming to a close, or at least we hope so (thank you Al Gore for proving that sometimes the nightmare just continues). All along, I have been saying that the poll numbers are invalid on their own standards, and once again I found another reason to repeat that claim: The state polls contradict many of the national polls.

The claim made by those who like the polls, has generally run along the lines that they cannot all be wrong, and that a consensus of the polls should be trusted. I hardly agree, because of a factor in statistics known as collinearity. Here’s the formal definition from statistics.com: “In regression analysis , collinearity of two variables means that strong correlation exists between them, making it difficult or impossible to estimate their individual regression coefficients reliably.”

Informally, collinearity is a warning to statisticians to make sure that they are using data which is truly independent of other data. When data is redundant or co-related, using the additional data gives an invalid additional weight to the data used, corrupting the results. Tests have been created to detect multicollinearity, such as the Farrar-Glauber test (most commonly used in econometrics), but it does not appear that vector testing is commonly practiced in opinion poll analysis.

The math in that line of testing tends to get a bit complex for a casual discussion, so for here I will come back to another point of opinion polling: the statistical level of confidence. That is a critical test for an opinion poll, and what it means is a quick reference on whether the poll is valid. “Valid” does not mean right or wrong, it means the poll’s method is considered trustworthy. “Invalid” means that whatever the poll says, you should not rely on it. Again, I refer the reader to the National Council on Public Polls (NCPP), and their criteria for polling and their principles of disclosure. In short, when a poll will not tell you who paid for the poll, hides how many people refused to take the poll when contacted, or refused to release internal demographics used in the poll and from the response pool, that poll is in direct violation of NCPP rules and should not be taken seriously, even if you find their results believable. The bad news there, is that almost none of the publicly-released polls are in full compliance with NCPP standards.

Going back to the question of the confidence level, though, it’s a simple test for validity. All of the major polls use – or claim to use – similar methodologies and demographic weighting, with the exception of party affiliation weighting. Some of these groups insist that party affiliation is not a static demographic, and therefore should not be weighted at all, so for here we will use their logic in applying the numbers. The polls all claim a 95% confidence level. In statistics, they are saying that if the same method is used, polls should produce results within the margin of error 19 times or more out of every 20 polls. So, it should not be difficult to test that claim.

Here are the polls listed at Real Clear Politics for the last ten days (where a poll has been done more than once in that period, the most recent results are used) . I am listing these in descending order of support for Barack Obama, then in support for John McCain, noting a 3% claim for MOE and how many polls agree or disagree with the stated poll:

Pew Research – Oct 26 – Obama 53% (agree 8, disagree 4) FAIL
Newsweek – Oct 23 – Obama 53% (agree 8, disagree 4) FAIL
ABC News/WaPo – Oct 29 – Obama 52% (agree 9, disagree 3) FAIL
CBS News/NYT – Oct 29 – Obama 52% (agree 9, disagree 3) FAIL
Rasmussen - Oct 30 - Obama 51% (agree 11, disagree 1)
Gallup (Expanded) – Oct 29 – Obama 51% (agree 11, disagree 1)
Reuters/C-SPAN/Zogby - Oct 30 - Obama 50% (agree 12, disagree 0)
Gallup (Traditional) – Oct 29 – Obama 50% (agree 12, disagree 0)
Ipsos/McClatchey – Oct 27 - Obama 50% (agree 12, disagree 0)
GWU/Battleground – Oct 30 – Obama 49% (agree 10, disagree 2) FAIL
Diageo/Hotline – Oct 29 – Obama 48% (agree 8, disagree 4) FAIL
IBD/TIPP – Oct 29 – Obama 48% (agree 8, disagree 4) FAIL
FOX News – Oct 29 – Obama 47% (agree 6, disagree 6) FAIL

Rasmussen - Oct 30 - McCain 47% (agree 7, disagree 5) FAIL
GWU/Battleground – Oct 30 – McCain 45% (agree 9, disagree 3) FAIL
Gallup (Traditional) – Oct 29 – McCain 45% (agree 9, disagree 3) FAIL
Ipsos/McClatchey – Oct 27 – McCain 45% (agree 9, disagree 3) FAIL
FOX News – Oct 29 – McCain 44% (agree 11, disagree 1)
Gallup (Expanded) – Oct 29 – McCain 44% (agree 11, disagree 1)
ABC News/WaPo – Oct 29 – McCain 44% (agree 11, disagree 1)
IDB/TIPP – Oct 29 – McCain 44% (agree 11, disagree 1)
Reuters/C-SPAN/Zogby - Oct 30 - McCain 43% (agree 10, disagree 2) FAIL
Diageo/Hotline – Oct 29 – McCain 42% (agree 10, disagree 2) FAIL
CBS News/NYT – Oct 29 – McCain 41% (agree 8, disagree 4) FAIL
Newsweek – Oct 23 – McCain 41% (agree 8, disagree 4) FAIL
Pew Research – Oct 26 – McCain 38% (agree 2, disagree 10) FAIL

Note that every polling agency fails one side or the other of this validity test. Every one of them.

But let’s move on. We can look at the RCP averages from one of two perspectives. The RCP folks take the polls from the last week by polling date (not release date) and average them. That gives a claim that Obama is leading McCain 49.7% to 43.8%, with a 3 point MOE. If we extend that back to polls taken October 20 or later, then it becomes Obama 50.3%, McCain 43.3%. So, RCP’s national polls, if aggregated as they like it, show a 5.9% lead or a 7.0% lead.

OK, now let’s take a look at the RCP state polling. There are dozens of polling groups which have put out state polls, and I cannot speak here to their total authenticity. That, of course, is also a problem with some of the national polls, but for consistency we can use the RCP numbers. Now, if each state’s aggregate claimed level of support for Obama or McCain is applied to the state’s proportional level of the national vote (using 2004 voting statistics), we find that if the state aggregations are right for RCP’s state averages, plugging those numbers in gives Obama 46.9% of the popular vote, to 43.9% for McCain. The aggregation of the state polls, is we are going to accept them as valid, shows that the national polls are overstating Obama’s support. Once again, a simple check for validity shows that the confidence level test fails for the national polls.

One last thing. The state polls have assumed a significant shift from 2006 towards increased democratic participation, but even if that happens, the state polling indicates that Obama will still fail to reach 50% support. If those polls are reweighted according to 2006 turnout proportions and then plugged in to project national numbers, it becomes Obama 46.3% and McCain 47.1%, with 6.6% undecided. Take from that what you will.

Wednesday, October 29, 2008

Thoughts About the Early Voting

There was a time when we would count down to election day. In fact, a lot of media is doing just that. But this year, by the start of ‘election day’, perhaps more than thirty percent of the voters will have already voted. Recent changes in absentee and early voting laws have created an opportunity for voters to have a much more convenient chance to vote. As a matter of fact, I voted over a week ago myself, because Texas opened early voting back on October 20th. A lot of pundits and media have been talking about early voting, which makes sense, but there has also been a lot of opinion tossed about which turns out not to have firm foundation under it. For example, I have read and heard about a supposed historical tendency for republicans to have an advantage over democrats in early voting. That’s true to a degree, because historically more seniors vote early and they have tended to be republicans. However, that trend was established with the restricted absentee votes, and since no-excuse absentee and early voting have begun, that trend evaporated. 34 states offer early voting this year, many for the first time, which is one reason why there is so little history for the practice as a national exercise. In 2000, roughly 14% of voters voted before the designated election day. In 2004, that portion rose to 22%, and this year election officials expect that portion to climb above 30%. Barack Obama has repeatedly urged his supporters to vote early and not wait for election day.

A lot of talk has focused on the results from early voting. That data is necessarily limited, by law as well as ethical rationale. It’s been long noted, for example, that some folks like to vote for a winner, and if they are persuaded that a candidate has locked up the win, they will go along rather than feel that they backed a loser. As a result, election results – especially vote tallies – are not supposed to be released until after all the polls close in a state. Poll results are often used to hint at the results, which may or may not be cheating, depending on whom you ask and how that information is presented, and we’re seeing a version of that in looking at the early voting results so far. Before we look at those results, I need to emphasize that there is no established standard to prove the meaning of a particular event in early voting. I had thought, myself, early on that it would be a good sign for McCain if republicans voted in numbers equal to democrats. It turns out that I had not thought that one through.

I read an interesting paper on the significance of early voting, by Kate Kenski writing about the Annenburg Election Survey for the 2000 and 2004 presidential elections. For example, Kenski noted that early voting by black voters was low (2.8%) in 2000, but more than quadrupled that response in 2004. From that trend, it should not be surprising that black voters continue to increase participation in early voting, especially with Obama on the ticket. Another point of interest was that in 2000, Bush earned a much higher percentage of the vote in early voting, but in 2004 the percentages were much closer to election-day voters, possibly due to the extraordinary turnout in the overall election. 2008 will provide a lot of useful information about early voting demographics, but for now we are limited in what we can say from the existing record.

Dr. Michael McDonald at George Mason University has a website up for easy reference on early voting. It shows that already, more than sixteen million early votes have been cast. Party-specific references can be found for just nine of the thirty-four states offering early voting, so we should be careful about assuming the information is true for the whole nation, but so far it does indicate that democrats have been better-organized so far than republicans, from the following state results:

West Virginia: 59.4% democrats, 31.5% republicans
North Carolina: 54.0% democrats, 28.6% republicans
New Mexico: 55.1% democrats, 32.3% republicans
Nevada: 53.7% democrats, 29.6% republicans
Maine: 44.5% democrats, 28.6% republicans
Louisiana: 58.4% democrats, 28.5% republicans
Iowa: 48.9% democrats, 28.5% republicans
Florida: 45.4% democrats, 39.0% republicans
Colorado: 38.6% democrats, 37.9% republicans

Except for Colorado, the states which are reporting results by party affiliation show a strong showing by democrats relative to republicans. One possible reason for this is the strong proportion of black voters. The following states have reported the following percentages of black early voters to all early voters so far:

North Carolina: 27.6%
Louisiana: 36.0%
Georgia: 35.2%

This news is likely to be taken, indeed has already been reported by some media, as evidence of a wave of Obama support. To some degree and reasoning this is true, since it is quite reasonable to expect that the heavy majority of democrats will vote for Obama, and therefore a large proportion of democrats means a lot of Obama votes. However, the reader should be reminded that each of these voters in the early count is a voter who will not be voting on election day; the high proportion of black voters now will, mathematically, require a lower proportion on election day, since no demographic can exceed the ceiling of its representative total. It benefits Obama insofar that a voter who has submitted their ballot represents the surest kind of voter turnout, but it should be remembered that 59 million votes was not enough for John Kerry to win in 2004, so the 16 to 17 million votes submitted so far can provide a head start for a candidate, but is far from all he will need.

Also worth considering, is the behavior of voters. Gallup has a nice article up on its site, and while it tilts a bit towards Obama, it notes that except for the West, most voters still plan to vote on Election Day, and it should be noted that in Kenski’s paper, she observed that most early voters vote less than seven days before election day, meaning that we could see a wild finish to early voting, one that could significantly change what we are seeing now in demographic terms. Also, while it is the only state which released early voting behavior by age group, I found it interesting to see that in North Carolina, only 12.3% of the voters were under 30, with 20.7% coming from the 30-44 group, 41.7% from the 45-64 age group, and 25.3% from the 65 and older group.

In conclusion, you can expect the Obama supporters to use this early information to claim they are winning easily, but there’s still several more days of early voting, and even if it’s record-setting in scale, the numbers from November 4 will still be the ones which do the most to decide the election.

To see why McCain supporters could still take hope, let’s play a little bit with the numbers we have available. I emphasize that these are not hard numbers nationally, but merely using the same extrapolation that Obama supporters would use for their own encouragement, but taken here to encourage McCain supporters. The nine states which are reporting party affiliation numbers are indicating an average of 48.1% of democrats among early voters, and 28.6% of republicans among early voters. The three states reporting black voter participation are reporting an average of 31.5% participation. Projected nationally, that would be 5,200,862 votes placed by black voters out of 16,514,867 total early votes. Since the polling data says that essentially all black voters are democrats this year, that means that there may have been 11,314,005 votes by non-black voters so far, of which 24.2% would be democrats and 41.9% are republicans. Further, if we assume that there will be roughly 130 million voters this year and that black voters represent about 11% of those voters, then we project that 14.3 million black voters will vote this year. With 5.2 million already having voted out of 16.3 million early votes so far, that would project the remaining black vote would be 9.1 million out of 113.7 million, or 8.0% of the remaining vote. Accordingly, the overall democrat percentage is going to drop as the vote progresses, as will the republican portion of the remaining non-black vote. As I have said before, the numbers may seem heavy in one direction now, but in the end the independents and late-deciders will make the difference.

The Secret Poll October 29 2008 edition

Hello again, and time for another edition of the Secret Poll. The election remains winnable by either John McCain or Barack Obama, the keys still coming down to turnout, the independents, and just plain not giving up. This being the final week, it is unlikely that either candidate could say or do anything to significantly improve his profile, although a badly-timed gaffe or surprise piece of bad news could influence the remaining undecideds, still over 10% of all voters. With early voting projected to represent more than 30% of all votes this year, Obama’s night-before election infomercial is unlikely to change anyone’s mind, although Obama is likely doing this as a last-step effort to maintain the high emotion on which his campaign has run.

So once again, here’s the recap of where I think the true numbers have played out, and where we are now:

August 31: McCain 41.77%, Obama 41.06%

September 7: McCain 42.45%, Obama 42.04%

September 14: McCain 45.71%, Obama 39.62%

September 21: McCain 44.48%, Obama 42.06%

September 28: McCain 42.73%, Obama 41.62%

October 5: McCain 44.09%, Obama 43.96%

October 12: McCain 42.68%, Obama 45.31%

October 19: McCain 43.49%, Obama 46.03%

October 26: McCain 44.50%, Obama 44.48%

McCain gained more support among independents and is eating into ‘soft democrat’ territory, the first possible indications of PUMA support. However, republicans are still less active than democrats in GOTV efforts, so winning may well depend on genuine last-minute efforts to encourage republicans, to focus on Sarah Palin’s future and the consequences of an all-Democrat government. Independents appear to be responsive to tax and stability issues, especially as Obama has neglected answering them in any depth, apparently believing they can only hurt him if he addresses them. The undecided portion has increased slightly to 11.02%, indicating that the emotion-based voters have begun to lose excitement, especially as these numbers appear to have fallen directly off Obama’s support, which had been at a campaign-high 46.03% last week.

The keys, again, are the following:

Turnout – if one party clearly does a better job getting its base to vote, that party will clearly win. More than ever, your vote matters.

Independents – Right now, the Independent vote is essentially tied, with about 25% of Independents still undecided. Whoever wins the most of that vote will win the election.

Undecideds – Overall, 11.02% of voters are still undecided. It’s slowly resolving itself, but there will still be a large pool of voters waiting to be convinced just before election day. Finishing strong could make all the difference.

States and Shadow

Earlier this season, I wrote about the statistical effect of what I call “shadow”, the combination of a poll’s margin of error and the undecideds. In today’s article, I apply this again to the state polls and address the errors of aggregation and over-simplification.

The national polls are showing a tightening race in several polls, notably Gallup, Battleground, and the AP-Gfk poll. Of course, other polls are claiming a large lead for Obama, notably Pew and Newsweek. The state polls are also showing some movement, although its not as rapid for a number of reasons, not the least being that state polling is not done as regularly as national polling. Obama supporters have greatly enjoyed the RCP aggregate numbers for Obama, which they would, since the RCP aggregates at both the national and state level indicate Obama is winning easily over McCain. The problem, of course, comes when you start to look closely at the support for that belief.

If the methodology is sound, national and state polls should track in similar fashion. This does not mean that every state poll will reflect national support to the same degree, but if a national poll is done properly, it will include proportionate responses from every region of the country, ideally from every state, and so the national numbers will reflect the sum of the state supports. So, the tightening of the national race has to mean – assuming the polls are valid – that McCain is gaining support in some large states or in enough small to medium states to be reflected in national numbers. But as I said, major polling is done less often at the state level; most state-level polling is done less than once a month by polling agencies. Survey USA, for example, who has done more state polls than any other agency, has not done a state poll in the last two weeks in 35 states, and has not done a state poll in the last 10 days in 41 states. That’s important to keep in mind.

I need to address the problem of aggregation in polling now. Aggregates are popular because they are easy to read, and seem to be helpful in telling how much someone is ahead. After all, you don’t want to be fooled by paying attention to an outlier, and there is a sense that if most of the polls say the same thing, that’s most likely what’s really going on. The problem with that, is the assumption that all of the polls in an aggregate are valid, that all can be accepted with equal confidence. But that would be erroneous. First of all, not every polling group is really professional at what they do. Remember the disastrous early exit polling in 2004? In that case, a lot of brand new pollers were hired and hustled out without proper training, orientation, or supervision. Does anyone really think that was the only occasion where that happened? The fact is, a lot of polling errors get made without the public ever hearing about it, for a number of reasons, not the least being that if their results are what is expected, the error is not obvious. Also, even professional polling groups may look for different characteristics, such as polling adults, registered voters, people who have voted in recent elections for their ‘likely voter’ category, people who simply claim they are ‘likely’ to vote, and so on. Take a look at some of these state polls, and you will also find that it can be difficult to see how they arrived at their numbers; many simply do not provide access to the raw data or their internal demographics. As a result, a significant portion of the state polls are likely to be flawed in a functional manner, and aggregating such polls tends to magnify such errors, not eliminate them.

The next problem is over-simplification. This shows up most often in the way that polls are reported. Whether you like the results from a poll or not, it’s very important to understand that polls are sometimes just plain wrong, and even if a poll is valid, it’s only valid to the extent that it demonstrates a trend against its earlier report using consistent questions and methods, and polls have never predicted the surprise results, because they are modeled in a way which reflects the public’s assumptions far more often than the actual condition. Polls are opinion polls, after all, not predictors of future events. Polls only “predict” the results of an election to the degree that the voters behave in line with the poll’s assumptions.

So, with that said, I am addressing the state polls with respect to the statistical phenomenon of shadow. ‘Shadow’ is the total amount of uncertainty in a poll, the combination of the undecideds plus two times the published margin of error. For example, let’s say candidate A is leading candidate B in a poll, 51-44 with a published margin of error of 4%. Game over, it seems. But that 4% MOE means that either candidate could be as much as 4 points stronger or weaker, meaning its candidate A at 47 to 55, and candidate B at 40 to 48. Also, there are 5% undecideds in the poll, so while B looks to be out of it, it’s mathematically possible for the actual condition to be A 47, B 53. It could also end up being A 60, B 40, with the same level of probability as the other extreme. And of course, this does not consider the possibility of some voters changing their minds. I do not think that happens as wildly as the polling groups seem to claim, but it is a valid factor. Considering that, we can now examine the state polling condition.

I took the RCP aggregates (I know, I know, but I do not have the time or space to examine each and every state poll for validity, I don’t need to have anyone whining about ‘cherry picking’ polls, and I can make my point even by using the aggregate reports) and applied the percentages claimed to the 2004 voting results as a two-party vote split. If we count all of the states according to who leads according to the RCP aggregates, Barack Obama would take 50.2% of the popular vote to 43.0% for John McCain, and 364 electoral votes to 174. However, even using those aggregates, the numbers change considerably if we consider the effect of shadow. Applying the shadow rule (undecided plus double MOE), it becomes 200-118, Obama still in good shape but with 220 electoral votes still to be decided.

Before ending this article, I also looked at the trends and outliers in the polling I have seen, especially given certain key internals. I will not call it definitive, but in my opinion if the demographic weighting is corrected the popular vote becomes Obama 46.9%, McCain 46.6%, but with McCain taking the electoral vote 278-260. When the shadow effect is applied, the electoral numbers change to 147-71 McCain, with 320 to be decided. The message is clear then, that the race remains to be decided.

Tuesday, October 28, 2008

Demographic Thresholds

I have been saying all along, that McCain was closer to Obama in the election campaign than the polls were indicating. However, I have been doing so by focusing on party affiliation, the demographic most fiddled with by the polls. Those critiquing my analysis have sometimes pointed to internal demographics which show problems for McCain. With just a week until the end of the season, let’s see where the thresholds for election are, with regard to demographics. That is, what is the minimum performance in each area which was enough to get the win? Here are the numbers:

In 1992, Clinton won with just 41% of the vote from male voters. Right now Barack Obama is tracking 40% of the male vote in Battleground, and 47% in Gallup, while McCain is tracking 44% in Battleground and 46% in Gallup. Before I go further, I want to note that the party skewing could affect this numbers, and also that the numbers in a poll may end up being a bit different in the actual election results. Therefore, all we are doing is seeing whether the candidates are roughly where they want to be.

In 1968, Nixon won with just 43% of the vote from women voters. Right now Obama is tracking 54-55% with Battleground and Gallup, while McCain is tracking 35-39%. As I said, however, while this is a problem area it may be artificially low, as republican women have much higher support for McCain than do democratic women polled.

In 1992, Clinton won with just 39% of the White vote. Right now Obama is tracking 39-44% with White voters, while McCain is tracking 47-50% with White voters.

In 2000, Bush won with just 9% of Black voters. Right now Obama is tracking 82-91% with Black voters, but McCain is tracking only 3% with Black voters in both Battleground and Gallup. It’s very unlikely that McCain will reach the 9% mark, so either this measure will prove to be meaningless, or it will be a key demographic since Obama has locked it up.

In 1968, Nixon won with just 38% of the under-30 vote. Right now Obama is tracking 56-59% of that demographic, while McCain is tracking 29-38% with that group.

In 1968, Nixon won with just 41% of the 30-49 vote. Right now Obama is tracking 43-50% with that group while McCain is tracking 43-45%.

In 2000, Bush won with just 45% of the 50+ vote. Right now, Obama is tracking 35-45% with that group while McCain is tracking 44-50%.

In 1980, Reagan won with just 86% support from republicans. Right now McCain is tracking at 83-92% from republicans.

In 1992, Clinton won with just 82% support from democrats. Right now Obama is tracking at 80-89% from democrats.

In 1960, Kennedy won with just 5% support from republicans. Right now Obama is tracking at 5-7% support from republicans.

In 2000, Bush won with 10% support from democrats. Right now McCain is tracking at 7% support from democrats (PUMA influence not known).

In 2000, Bush won with 42% support in the East. Right now Obama is tracking at 52-58% and McCain is tracking at 35-36% in the East.

In 1992, Clinton won with 44% support in the Midwest. Right now Obama is tracking at 46-52% there and McCain is tracking at 37-39% in the Midwest.

In 1968, Nixon won with 38% support in the South. Right now Obama is tracking at 36-42% there, while McCain is tracking at 51% support in the South.

In 2000, Bush won with 47% support in the West (Clinton won with the same level in 1992). At this time Obama is tracking at 44-54% in the West, while McCain is tracking at 38-40% support.

As I wrote at the beginning, these numbers are comparing poll numbers to election results, and at this time in 2004 both Bush and Kerry were more than 5 percentage points away from their final results in many categories. The undecideds play a key role in the final tallies, and they will do so again this year. Also worth noting at this time are comparisons in these additional demographics:

Urban voters: 55-31 Obama
Suburban voters: 48-39 McCain
Rural voters: 41-40 Obama or 44-40 McCain depending on the poll

Single voters: 61-24 Obama
Married voters: 47-39 McCain

And finally, for some reason no major poll seems to be releasing any internal demographics for Asian voters. Sure, we’re talking between 1 and 2 percent, but in some places they could matter.

Monday, October 27, 2008

From Hillbuzz, Why Democrats Should Support McCain

1. A new kind of politics = voting for the best candidate, regardless of party

2. McCain won’t raise taxes

3. Experience and accomplishments matter

4. Bipartisan record of working with Democrats

5. In 35 years, no Republican president has threatened Roe v. Wade

6. The president can only nominate judges, while a Democrat-controlled Congress will appoint them

7. Energy independence for the United States, using all means available

8. 100% open government and unfettered press access

9. Risked political career to do what he thought was best for the country

10. Never earmarked a single pork barrel project

11. Sarah Palin for Vice President

Gallup In The Tank?

Back in 2004, I jumped pretty hard on John Zogby. Zogby did two things which I considered, and still do, to be unacceptable conduct for a pollster. First, was that Zogby flat-out called the election for Kerry back in May of 2004, a prediction he hung onto through the rest of the campaign. The second reason was that Zogby started mixing results from his telephone polls with his online polls, which invalidates the results from both methods. I would also point out to the reader that in 2004 and 2005, I was unhappy with political affiliation weighting at the time, and had adjusted my own expectations by reversing the bias from polls. My point is that even four years ago I was challenging poll methodology when it deviated from NCPP guidelines, and even if Zogby is publishing prettier headlines now, that does not change my wariness from past experience. I will challenge any behavior at odds with valid practices.

This year, all of the major polls show Obama ahead in the presidential campaign right now, some saying he is well ahead. I found serious problems in their fundamental assumptions, not the least being the heavy weighting of democrats in the polls (and let’s not mince words – any poll weights by party affiliation, the ones which simply accept what is called in are just accepting the raw data as demographically accurate, which is just as absurd in terms of party affiliation, as it would be if they assumed that race, gender, age, or educational demographics did not need to be reweighted). I have wondered two things as the campaign moved along – what would I say if I turned out to be completely wrong, and what would these polling groups say if I turned out to be right and they were the ones who blew it? For my case, I intend to review the election from a statistical standpoint, and if Obama wins in a landslide because the nation really did decide it was 48-25-27 DRI, then I will admit it plainly and take my lumps. I suspect the polling groups will have a harder time being forthright if my argument turns out to be correct. One reason for that is today’s polling discussion from Gallup.

Gallup has noted the strength of early voting this year. The most significant points from that article are these; early voting is stronger than expected this year, and so far republicans have been just as eager to vote early as democrats. The third point is the most important signal of all. Says Gallup; “Early voting ranges from 14% of voters 55 and older (in aggregated data from Friday through Wednesday) to 5% of those under age 35. Plus, another 22% of voters aged 55 and up say they plan to vote early, meaning that by Election Day, over a third of voters in this older age group may already have cast their ballots.”

The last two statements are very good news for McCain and bad news for Obama. This is because it demonstrates that enthusiasm to actually vote by republicans is equal to enthusiasm to vote by democrats. This runs directly against claims made in polling up to now, demonstrating that participation in polls is not directly related to voting this year. Second, the higher participation by senior voters and weaker participation by younger voters is directly in line with historical norms, again running against the poll expectations that this year would see a wave of young people voting but seniors staying at home. Gallup’s own data proves this is not happening as they predicted, and the polls are therefore invalid in those respects, in addition to obvious flaws in the party weighting. The reasonable expectation from these facts, would be for Gallup to back down and correct its weighting to match the observed behavior. As of yet, Gallup has not taken that step. They did, I note, tacitly admit that the “expanded voter” model they introduced this year is invalid, but now they are running no less than three models of polling, which makes me wonder if they are going to wait to see which one comes out the closest (or the least embarrassing) and call that one their ‘official’ call – when a major polling group throws out three guesses instead of just one judgment, you can be sure they have lost confidence in their system.

Stolen Thunder