Friday, I posted a review of where things stand right now in the state races. As I wrote at the time, I do not believe they will actually shake out exactly that way in the end, for a number of reasons, but the numbers are valid, even if they are a bit unexpected in places.
Some states have been big surprises for people, which in turn leads to polite requests to explain how I got there, as well as some less than polite suggestions that because the numbers are sometimes surprising, that makes them wrong. So, today’s article is a brief discussion about reweighting polls and the difficulties in finding the “true” numbers.
Let’s start with the fact that there are a lot of polls out there. Besides all the people doing national polls, there are over four dozen groups doing state polling. Some are professional and thorough, others much less so. The trouble is, people generally do not know how to read poll results and so what usually happens is that the polls are all thrown together and an average is taken and that’s what gets fed to the public as the true standings. The problem there, is that this is statistically invalid.
Statistics is a mathematical science, and opinion polls are some of the trickiest statistics to manage, since human behavior is sometimes erratic and even deliberately disruptive. As a result, effective polling is a work still very much in progress. You will hear some people say that polling has gotten very accurate, but that is not really true. As I have written many times, there are a lot of polls before the end of a political campaign which careen all over the place, making moves which appear to have no connection to the actual events. Worse, there are many instances where even the final poll from a respected group turned out to be very wrong. Take 1936, for instance. In that day, the most respected presidential opinion poll was the one done by the Literary Digest. The LD poll was mailed out to literally millions of people at a time, and had correctly predicted the results of federal elections in every year from 1916 to 1932. So, when LD picked Alf Landon to beat Franklin Roosevelt with 57% of the popular vote, using the results from over 10 million respondents (and to think, we are impressed when more than a thousand people answer a poll these days!), it was considered a very bad bit of news for Roosevelt. But a new company, formed by George Gallup, used a different system, using demographic weighting for his projections, predicted that FDR would win with 54% of the vote, even though his polls only consulted about a thousand people at a time. When Roosevelt did actually win with 61% of the vote, it was a stunning blow for the Literary Digest, and it launched the long career for George Gallup as a polling authority. It was, in fact, the Gallup Organization which first understood that a respondent pool had to be reweighted to match demographic norms in order to produce a valid reflection of the national mood.
But Gallup is a business, and polls discovered that reports which had no surprises, no drama, did not sell well to the media. So, while I cannot prove it, I find it intriguing that as Gallup’s revenues became significant, the polling trends became more of a roller coaster, including the odd claim of convention “bounces”. The problem of bounces, is that it mistakes enthusiasm by a party base following a party convention as an increase in voter support. This is not really the case, but the polling groups continue to report this false “bounce”, which always fades and a more nominal trend if seen again. I don’t want to sound cynical, but this is a case of what Dr. Heisenberg warned, that in observing human behavior, the observer often forgets that he too is affected by the event and this creates bias in his reports. Polling groups fall prey to media spin just like anyone else, and there are numerous cases where they missed the real trend because they were not looking for it. A good example of this can be found in the 1948 and 1976 elections, where the president in the White House was unpopular and tied to a scandal, and the polls reflected the media expectation that the president was doomed. In 1948 the polls wrote off Harry Truman, and in 1976 they wrote off Gerald Ford. The 1948 win by Harry Truman is political legend, but many people do not remember that Ford came back from a 33-point poll deficit to briefly lead the race, before narrowly losing to Jimmy Carter. Similar polling blunders have also occurred in 1980, 1988, and 2000 in presidential races.
My point for this article, is that if national polls by well-respected polling groups can blunder, we should hardly be surprised to find that state polls can also be inaccurate. The mistake made this year, is the assumption that when all the polls say the same thing, they must be right. However, if all the polls are making the same mistake, they can be in complete agreement yet still be completely wrong. That’s a bold statement to make, but it is the heart of this year’s polling condition.
Let’s go back to the national polls for just a moment again. Some polls show Barack Obama way ahead, while others show the race growing very tight again, in some places well within the published margin of error. My point here is not to say which is right, but to point out that the very fact that such a range exists is evidence of statistical invalidity. Gallup recognized this, which is why they have backed off from the “expanded voter” model they were using all summer but never before this year; they have finally recognized that the range of results cannot comply with the validity tests for a 95% confidence level (the 95% confidence level is the commonly used standard of opinion polls; in simple terms it promises reproducibility, that under the same conditions and method, results will be within the margin of error at least 95% of the time); evidence of collinearity has also been discovered, which further damages Gallup’s satisfaction with the model. So Gallup is actually publishing poll results from three different models using the same data, which is essentially the same thing as admitting they have no idea what is really the true condition. If Gallup is admitting this collapse of the system, however tacitly, then it may safely be assumed that all polling this year using non-historical models is statistically invalid. The reason they will not say so plainly, is because they do not want to have to pay back their clients and subscribers. To the point, however, knowing that there serious errors in the models used in national polls, it is reasonable to expect to see similar errors in the state polls, especially as these polls use smaller respondent pools, are taken less often by each polling group, and often involve small-budget operations which cannot afford to go back and validate past operations.
So what went wrong? In short, the political affiliation weighting. There are three schools of thought about party weighting. Some would argue that there should be no weighting. That runs into problems, however, when you consider that there are clearly places where strong political bias exists. For example, I could set up a poll which filled a lot of the criteria for demographic variety by polling a political party’s national convention. I could get both genders, all races, folks from urban and suburban locations, young and old, and so on. On paper it would look just fine, but of course such a poll would be a crock. Well, the same problems exist anywhere that you do not make sure to use historically valid weighting. That’s the reason why some of those polls are way out of line, they are not paying attention to ridiculously unbalanced political participation. The second group uses an even worse idea, what one poll calls “dynamic weighting”. They take the average response from a number of polls, assume that the average of those responses represents the “new” political affiliation, and weight future polls for a while by that weighting. To see how absurd that is, imagine if you had a series of polls in which black respondents were only 5% of the raw data. Would you conclude that the Census figures were wrong, and that the percentage of blacks in the United States was only 5%? Of course not, yet that is the rationalization used by the polls which play that stunt. A similar and even more ridiculous tactic is the subjective weighting used by some polls, a weight which has no basis whatsoever in past election participation, nor even in poll participation, but is artificially determined by the polling group’s management. Such behavior is every bit as dishonest, as if they had run short of respondents to reach a target pool, and they had made up responses to fill the rest. The third option, used by very few polls but it is the one defensible method, is to take actual political affiliation from previous elections, especially since it has been shown that the proportional split between democrats, republicans, and independents has been remarkably consistent for over forty years. When polls refuse to use the numbers known to be valid from a generation of actual elections, it should be no surprise when their projected results fail to match the actual election results.
For my analysis, I had to make certain assumptions. First, I had to assume that there would be no deliberate attempt to publish false reports or to mislead respondents. Second, I had to assume that the historical norms which have been valid for forty years would also be valid in 2008. And third, I had to assume that backing out the polls to get back to the original numbers and reworking the data using historical weights, would result in valid and useful information. The procedure was simple enough. I looked up the most recent state polls which also made their internal data available, then backed out the numbers to unweighted versions, then reweighted them using the historical value for each state, collected from CNN’s archive. That produced the results in Friday’s article.
Some of the states, however, were big surprises with the reweights. Vermont going to McCain? West Virginia within the Margin of Error? New York closer than Pennsylvania? Illinois less than six points apart? That just looked screwy, but I published them as the results displayed them. This is not to say that I believe the states will play out this way, it actually speaks to the quality of the polls. Remember, the numbers I used were not mypolling, but merely the reweighted results from the state polls themselves. There are two reasons which come to mind right away as to why the numbers changed so much. First, the state polls used some very weird party numbers in some states, and so the correction to historical bases made for very different results. It’s a basic rule of statistics that large variance lowers reliability, so the effect of changing part affiliation by a large number, in either direction, results in a less reliable result. It does not mean that the original published results were correct though, only that they skewed results to a degree that could not be corrected by simply backing out and using a more valid affiliation factor. Second, in past elections I have seen states that never panned out the way the polls expected. Bush did much better in Oregon’s 2004 polling, for example, than he did in the election results. The third reason is the simple fact that polls are sometimes just plain wrong, and there is not enough information for an analyst to demonstrate what the correct value should be. There are a myriad number of possible causes for polling error, including question wording, order, time of contact, RDD failure, training of poll takers (one reason the 2004 exit polls were so bad early on, was that many of the poll takers had received little training or supervision), influence by the sponsor, data corruption, and poor response rates.
It also helps to know a state’s history. New Hampshire, for example, is a bit contrarian to the rest of New England, which is one reason George W Bush took New Hampshire in 2004. In other states, the large proportion of independent voters means that the results at any one time may be much more volatile than expected. This is due to the effect of the undecided. To understand that, let’s say we have 1,000 voters, of whom 260 are independents. Let’s say that the candidates are split 50-50 among their party bases for the 740 voters, 370 to 370. If the independents are all decided voters, they will decide the election but each independent voter will have about 0.38% of the decision. If candidate A has a 50-voter advantage among the independents, he has 525 to 475 advantage or 52.5% of the vote. Now let’s say that we are polling the group, and now only 200 of the independents have made up their mind. That would mean that each independent voter has 0.50% of the decision. A 50-voter advantage now becomes a 495 to 445 advantage, or 52.7%. But if the 200 who have made up their mind are split, then the last 60 will make the difference and each of their votes is worth 1.67% of the decision, or 4 and one-third times what it was to start with. So whether or not you count the undecideds in a poll is important to its results, and we have not even touched on what a refusal to participate does to a poll.
Saturday, October 18, 2008
Friday, October 17, 2008
Where Things Stand
I try not to get into predictions, seeing as I argue so much that you cannot do that sort of thing accurately without the help of God, and while the Lord gives me many blessings for which I am grateful and try to note, I am pretty sure that foretelling the future is not in my package. However, I am still seeing a lot of talk about how we cannot win, or on the other hand how McCain is going to crush Obama, and I do have some information which could help put some perspective on those points.
Before I go into it, I will have to make clear that this projection is not based on any information not available to the public, and that the conclusions here – while I believe them to be sound – are my opinion of things right now, and there is a lot which could change between now and the end of the election (I use that phrase because early voting has begun in many places). It’s intended to correct what I believe in an inaccurate portrayal of the election due to over-weighting of democrats in the state polls, and reweights those polls according to the actual participation from 2004 and 2006.
First, the states where Obama’s lead is significant (10 points or greater), alphabetically:
Connecticut (19.71 pts)
District of Columbia (82.00 pts)
Delaware (14.84 pts)
Hawaii (35.86 pts)
Maine (10.00 pts)
Maryland (16.57 pts)
Massachusetts (14.285 pts)
Rhode Island (50.925 pts)
That’s 7 states plus DC, for 47 EV for Obama. Now McCain’s states where he has a 10 point lead or more:
Alabama (22.40 pts)
Alaska (25.73 pts)
Arizona (13.505 pts)
Arkansas (21.14 pts)
Georgia (14.34 pts)
Idaho (42.78 pts)
Indiana (12.28 pts)
Kansas (17.75 pts)
Kentucky (19.44 pts)
Louisiana (22.72 pts)
Mississippi (20.30 pts)
Montana (10.06 pts)
Nebraska (28.71 pts)
North Dakota (17.39 pts)
Oklahoma (36.81 pts)
South Carolina (11.42 pts)
South Dakota (22.02 pts)
Tennessee (25.39 pts)
Texas (25.305 pts)
Utah (49.475 pts)
Wyoming (48.14 pts)
That’s 21 states for 169 EV for McCain. Moving now to states where Obama has a lead of 6.00 to 9.99 points, what I call ‘double margin’ leads:
California (9.98 pts)
Michigan (9.21 pts)
Minnesota (7.82 pts)
Oregon (7.44 pts)
Washington (6.315 pts)
Wisconsin (9.87 pts)
That’s another 6 states for another 110 EV, bringing Obama up to 13 states plus DC and 157 EV. Now, the double-margin states for McCain:
Vermont (7.945 pts) yeah, I know that won’t really happen, but that’s what the numbers say
1 more state for 3 EV, bringing McCain’s total up to 22 states and 172 EV. That brings us to the ‘single margin’ states, where Obama has at least a 3.41 point lead (outside the 3.4 percent MOE), but no more than a 5.99% lead:
Illinois (5.53 pts)
New Jersey (5.01 pts)
New Mexico (5.46 pts)
New York (4.785 pts)
Pennsylvania (5.785 pts)
That’s 5 more states and 93 more EV for Obama, bringing him to 18 states plus DC and 240 EV. For McCain, he can add one more state to his tally as a single-margin state:
Ohio (4.380 pts)
That brings McCain’s total up to 23 states and 192 EV.
The remaining 9 states are within the margin of error:
Colorado
Florida
Iowa
Missouri
Nevada
New Hampshire
North Carolina
Virginia
West Virginia
There we are; McCain has five more states, but Obama has a 48-point electoral vote lead if things played out this way, with everything riding on how those last nine states are decided.
Playing a bit with the numbers, if McCain were to lose 5% of the support he has now, Iowa would move to clear Obama, Ohio would move to within the MOE, Virginia would move to clear Obama, creating a 260-172 Obama electoral lead. On the other hand, if McCain were to gain an additional 5% support, then Colorado would move to clear McCain, Florida would move to clear McCain, Illinois would move to within the MOE, Minnesota would move to within the MOE, Missouri would move to clear McCain, Nevada would move to clear McCain, New Hampshire would move to clear McCain, New Jersey would move to within the MOE, New Mexico would move to within the MOE, New York would move to within the MOE, North Carolina would move to clear McCain, Oregon would move to within the MOE, Pennsylvania would move to within the MOE, Virginia would move to clear McCain, Washington with move to within the MOE, and West Virginia would move to clear McCain, creating a 286-120 electoral advantage for McCain. As we have seen in past elections, a five-point movement in one week is very doable, so such movement in the remaining 18 days of this election must be considered well within the range of scenarios.
Before I go into it, I will have to make clear that this projection is not based on any information not available to the public, and that the conclusions here – while I believe them to be sound – are my opinion of things right now, and there is a lot which could change between now and the end of the election (I use that phrase because early voting has begun in many places). It’s intended to correct what I believe in an inaccurate portrayal of the election due to over-weighting of democrats in the state polls, and reweights those polls according to the actual participation from 2004 and 2006.
First, the states where Obama’s lead is significant (10 points or greater), alphabetically:
Connecticut (19.71 pts)
District of Columbia (82.00 pts)
Delaware (14.84 pts)
Hawaii (35.86 pts)
Maine (10.00 pts)
Maryland (16.57 pts)
Massachusetts (14.285 pts)
Rhode Island (50.925 pts)
That’s 7 states plus DC, for 47 EV for Obama. Now McCain’s states where he has a 10 point lead or more:
Alabama (22.40 pts)
Alaska (25.73 pts)
Arizona (13.505 pts)
Arkansas (21.14 pts)
Georgia (14.34 pts)
Idaho (42.78 pts)
Indiana (12.28 pts)
Kansas (17.75 pts)
Kentucky (19.44 pts)
Louisiana (22.72 pts)
Mississippi (20.30 pts)
Montana (10.06 pts)
Nebraska (28.71 pts)
North Dakota (17.39 pts)
Oklahoma (36.81 pts)
South Carolina (11.42 pts)
South Dakota (22.02 pts)
Tennessee (25.39 pts)
Texas (25.305 pts)
Utah (49.475 pts)
Wyoming (48.14 pts)
That’s 21 states for 169 EV for McCain. Moving now to states where Obama has a lead of 6.00 to 9.99 points, what I call ‘double margin’ leads:
California (9.98 pts)
Michigan (9.21 pts)
Minnesota (7.82 pts)
Oregon (7.44 pts)
Washington (6.315 pts)
Wisconsin (9.87 pts)
That’s another 6 states for another 110 EV, bringing Obama up to 13 states plus DC and 157 EV. Now, the double-margin states for McCain:
Vermont (7.945 pts) yeah, I know that won’t really happen, but that’s what the numbers say
1 more state for 3 EV, bringing McCain’s total up to 22 states and 172 EV. That brings us to the ‘single margin’ states, where Obama has at least a 3.41 point lead (outside the 3.4 percent MOE), but no more than a 5.99% lead:
Illinois (5.53 pts)
New Jersey (5.01 pts)
New Mexico (5.46 pts)
New York (4.785 pts)
Pennsylvania (5.785 pts)
That’s 5 more states and 93 more EV for Obama, bringing him to 18 states plus DC and 240 EV. For McCain, he can add one more state to his tally as a single-margin state:
Ohio (4.380 pts)
That brings McCain’s total up to 23 states and 192 EV.
The remaining 9 states are within the margin of error:
Colorado
Florida
Iowa
Missouri
Nevada
New Hampshire
North Carolina
Virginia
West Virginia
There we are; McCain has five more states, but Obama has a 48-point electoral vote lead if things played out this way, with everything riding on how those last nine states are decided.
Playing a bit with the numbers, if McCain were to lose 5% of the support he has now, Iowa would move to clear Obama, Ohio would move to within the MOE, Virginia would move to clear Obama, creating a 260-172 Obama electoral lead. On the other hand, if McCain were to gain an additional 5% support, then Colorado would move to clear McCain, Florida would move to clear McCain, Illinois would move to within the MOE, Minnesota would move to within the MOE, Missouri would move to clear McCain, Nevada would move to clear McCain, New Hampshire would move to clear McCain, New Jersey would move to within the MOE, New Mexico would move to within the MOE, New York would move to within the MOE, North Carolina would move to clear McCain, Oregon would move to within the MOE, Pennsylvania would move to within the MOE, Virginia would move to clear McCain, Washington with move to within the MOE, and West Virginia would move to clear McCain, creating a 286-120 electoral advantage for McCain. As we have seen in past elections, a five-point movement in one week is very doable, so such movement in the remaining 18 days of this election must be considered well within the range of scenarios.
Thursday, October 16, 2008
Not Over By A Long Shot
One of the commenters at StolenThunder was a troll playing his wicked-witch routine. You know, “surrender now the polls all say its over give up give up boooooooo …”. All he needed were the flying monkeys.
Well, I beg to differ. Loudly. As usual. So, while it amounts to repeating myself, here are the reasons why this election remains very much undecided, with a great deal left to be hammered down:
First, in a nutshell. a mathematical certainty of any event involving human behavior cannot possibly exist prior to that event.
Next is the idea that opinion polls are predictive. By definition, they are not, in the same way that charting a stock's past performance and present price is in no way predictive of its future value;
Next, it needs saying again the the history of polls shows instability and unreliability. The elections of 1936, 1948, 1968, 1976, 1988, and 2000 in particular were very different from what the polls predicted just a couple weeks before the election, sometimes even closer than that;
Also, a lot of media is pointing to sites which post aggregates for projections. The problem there is that this causes collinearity, which invalidates the conclusions. It's a common error but a significant one;
Further, it has to be noted that neither Obama nor McCain's campaigns are behaving in a manner consistent with the published conditions, particularly with regard to Pennsylvania;
Next, it needs saying that the political affiliation weights being used in major polls in no way match the historical participation at either the national or state levels. Those polls which take reponse levels without weighting to demographic norms create a circular logic which is inconsistent with NCPP guidelines and documented election results;
And finally, conditions this year are so unstable that Gallup, who has longer experience than anyone in opinion polling, has tacitly admitted it has no idea where the race stands, as it has developed no less than three weight models to try to capture a sense of what's going on. However, the fact that each is at variance with the other two to a degree byond their stated margin of error, demonstrates error beyond the boundaries established for the published level of significance (5%), which is to say, the math fails a 2-tailed validity test, and ergo all results are invalid by definition.
Turnout – if one party clearly does a better job getting its base to vote, that party will clearly win. More than ever, your vote matters.
Independents – Right now, the Independent vote is essentially tied, with about 28% of Independents still undecided. Whoever wins the most of that vote will win the election.
Undecideds – Overall, 12.01% of voters are still undecided. It’s slowly resolving itself, but there will still be a large pool of voters waiting to be convinced just before election day. Finishing strong could make all the difference.
Well, I beg to differ. Loudly. As usual. So, while it amounts to repeating myself, here are the reasons why this election remains very much undecided, with a great deal left to be hammered down:
First, in a nutshell. a mathematical certainty of any event involving human behavior cannot possibly exist prior to that event.
Next is the idea that opinion polls are predictive. By definition, they are not, in the same way that charting a stock's past performance and present price is in no way predictive of its future value;
Next, it needs saying again the the history of polls shows instability and unreliability. The elections of 1936, 1948, 1968, 1976, 1988, and 2000 in particular were very different from what the polls predicted just a couple weeks before the election, sometimes even closer than that;
Also, a lot of media is pointing to sites which post aggregates for projections. The problem there is that this causes collinearity, which invalidates the conclusions. It's a common error but a significant one;
Further, it has to be noted that neither Obama nor McCain's campaigns are behaving in a manner consistent with the published conditions, particularly with regard to Pennsylvania;
Next, it needs saying that the political affiliation weights being used in major polls in no way match the historical participation at either the national or state levels. Those polls which take reponse levels without weighting to demographic norms create a circular logic which is inconsistent with NCPP guidelines and documented election results;
And finally, conditions this year are so unstable that Gallup, who has longer experience than anyone in opinion polling, has tacitly admitted it has no idea where the race stands, as it has developed no less than three weight models to try to capture a sense of what's going on. However, the fact that each is at variance with the other two to a degree byond their stated margin of error, demonstrates error beyond the boundaries established for the published level of significance (5%), which is to say, the math fails a 2-tailed validity test, and ergo all results are invalid by definition.
Turnout – if one party clearly does a better job getting its base to vote, that party will clearly win. More than ever, your vote matters.
Independents – Right now, the Independent vote is essentially tied, with about 28% of Independents still undecided. Whoever wins the most of that vote will win the election.
Undecideds – Overall, 12.01% of voters are still undecided. It’s slowly resolving itself, but there will still be a large pool of voters waiting to be convinced just before election day. Finishing strong could make all the difference.
The Secret Poll October 16 2008
Hello again, and time for another edition of the Secret Poll. The eleciton remains winnable by either John McCain or Barack Obama, the keys coming down to turnout, the independents, and just plain not giving up. Last night was the third debate between the presdiential candidates, but it will take at least 10-14 days for results to show up in the polls; anything showing up in the near future will be in response to prior activities and statements.
So, once again, here’s the recap of where I think the true numbers have played out, and where we are now:
August 31: McCain 41.77%, Obama 41.06%
September 7: McCain 42.45%, Obama 42.04%
September 14: McCain 45.71%, Obama 39.62%
September 21: McCain 44.48%, Obama 42.06%
September 28: McCain 42.73%, Obama 41.62%
October 5: McCain 44.09%, Obama 43.96%
October 12: McCain 42.68%, Obama 45.31%
McCain lost support among both republicans and independents, possibly a matter of voters turning away from a perceived loser, but also likely influenced by the economic news and the lack of a clear answer in the near future. That said, the 12.01% undecided portion makes this race still very much in doubt, although it is also clear that McCain is losing ground.
The keys, again, are the following
Turnout – if one party clearly does a better job getting its base to vote, that party will clearly win. More than ever, your vote matters.
Independents – Right now, the Independent vote is essentially tied, with about 28% of Independents still undecided. Whoever wins the most of that vote will win the election.
Undecideds – Overall, 12.01% of voters are still undecided. It’s slowly resolving itself, but there will still be a large pool of voters waiting to be convinced just before election day. Finishing strong could make all the difference.
So, once again, here’s the recap of where I think the true numbers have played out, and where we are now:
August 31: McCain 41.77%, Obama 41.06%
September 7: McCain 42.45%, Obama 42.04%
September 14: McCain 45.71%, Obama 39.62%
September 21: McCain 44.48%, Obama 42.06%
September 28: McCain 42.73%, Obama 41.62%
October 5: McCain 44.09%, Obama 43.96%
October 12: McCain 42.68%, Obama 45.31%
McCain lost support among both republicans and independents, possibly a matter of voters turning away from a perceived loser, but also likely influenced by the economic news and the lack of a clear answer in the near future. That said, the 12.01% undecided portion makes this race still very much in doubt, although it is also clear that McCain is losing ground.
The keys, again, are the following
Turnout – if one party clearly does a better job getting its base to vote, that party will clearly win. More than ever, your vote matters.
Independents – Right now, the Independent vote is essentially tied, with about 28% of Independents still undecided. Whoever wins the most of that vote will win the election.
Undecideds – Overall, 12.01% of voters are still undecided. It’s slowly resolving itself, but there will still be a large pool of voters waiting to be convinced just before election day. Finishing strong could make all the difference.
Wednesday, October 15, 2008
One Obvious Reason the Polls Are Biased
I have said and will say again, that the opinion polls this year are simply wrong. They have fiddled with weighting and wording and various pieces of the demographics to create a false impression. You can either believe them or not, but as I have shown in the numbers for weeks now, believing the polls would be naïve at the very least.
But if the polls have been so biased, one may reasonably ask why that is so. I myself have commended groups like Gallup for a very professional job over many years, and even though I strongly disagree with the conclusions published by groups like CBS News, I applaud their open way of reporting at least some of the significant internal data. In fact, it is CBS News which reveals how this bias is operating, and how even well-intentioned pollsters can make major blunders in their assumptions.
I disagree with CBS News because of how it weights its respondent pool. And lately, what I have seen is a trend, verging on the ridiculous, of far too many Democrats in the pool to make any sense at all. This has been happening in both national polls and in state polls. For national polls, I mentioned some weeks back how Gallup managed to show Obama declining or staying steady in every political affiliation group over a week while McCain was steady or gained in every such group, yet Gallup’s headline claimed Obama was gaining support overall, a mathematical impossibility without manipulating the proportionate weights.
For the states, Survey USA’s polls also show a strong pro-Democrat bias, as shown in the following states (2004 and 2006 DRI splits come from actual elections, SUSA’s 2008 split is arbitrary):
Ohio – 2004 DRI split was 35%/40%/25%, 2006 was 40%/37%/23%
SUSA in 2008 is using 46%/33%/20%
North Carolina – 2004 DRI split was 39%/40%/21%, 2006 was 39%/40%/21%
SUSA in 2008 is using 42%/37%/18%
Virginia – 2004 DRI split was 35%/39%/26%, 2006 was 36%/39%/26%
SUSA in 2008 is using 39%/30%/25%
Pennsylvania – 2004 DRI split was 41%/39%/20%, 2006 was 43%/38%/19%
SUSA in 2008 is using 54%/35%/10%
Florida – 2004 DRI split was 37%/41%/23%, 2006 was 36%/39%/25%
SUSA in 2008 is using 40%/42%/16%
Survey USA is using weights which have no historical validity whatsoever in their state polling. “Garbage” is not too strong a word to describe their published results.
So what’s the deal? Something is happening to skew the polling groups’ perception of how they think voters will turn out, and in publishing invalid conclusions as they have, they are – intentionally or not – misleading the public about the election conditions. Since the reputation of the polling group is essential in attracting future business clients, it hardly seems reasonable to consider these blunders to be deliberate. Although I have written that polls fall into the unethical habit of selling a roller coaster story which they know is not accurate, polls do try to stay close enough to be plausible. One must conclude that they have come to believe their own hype, forgetting Heisenberg’s warning that observing a behavioral event not only influences the event, but also affects the observer as well.
So, in looking around for a cause, I found something all major polls have in common. Look at their headquarters locations:
Poll Headquarters
ABC News 77 W 66th St, #13, New York City, New York
CBS News 524 W 57th St, New York City, New York
FOX News 1211 Avenue of the Americas, New York City, New York
Gallup 901 F St NW, Washington DC
Hotline 88 Pine St, 32nd floor, New York City, New York
IBD 12655 Beatrice St. Los Angeles, California
LA Times 202 W 1st St, Los Angeles California
Marist 3399 North Rd, Poughkeepsie, New Jersey
Mason-Dixon 1250 Connnecticut Ave #200, Washington DC
Newsweek 251 W 57th St, New York City, New York
NY Times 1 City Hall, New York City, New York
Pew 1615 L St NW, #700, Washington DC
Quinnipiac 275 Mount Carmel Ave., Hamden Connecticut
Rasmussen 625 Cookman, #2, Asbury Park, New Jersey
Reuters 3 Times Square, New York City, New York
Survey USA 15 Bloomfield Ave., Verona New Jersey
TIPP 690 Kinderkamack Rd, Oradell, New Jersey
WaPo 1150 15th St NW, Washington DC
Zogby 901 Broad St, Utica, New York
All of them deep in “blue” territory, many packed together up on the northeast corner of Obama territory. The only non-east-coast member of this group is the LA Times, located in the most liberal section of California, also solid blue in perspective. This is not a coincidence, all of the major polling organizations are based in locations where liberals are strongest and conservatives weakest, where ‘democrat’ and ‘republican’ take on meanings wildly different from the rest of the country. As a result, it is obvious that the prevailing culture in this limited part of the country has an undue influence on the focus applied by these polling groups. Democrats, especially liberal democrats, are over-represented in the poll reports because the culture of New York and Northeast America over-represents liberals. Republicans, especially conservative republicans, are suppressed in the poll reports because the culture at the polling groups’ headquarters suppresses republican opinion.
I learned long ago, that when a manager displays certain personality traits, they are soon reflected by the employees at that company. A relaxed manager who is confident tends to improve the mood of his staff, while a tense micro-manager creates the same attitude in his employees. Knowing this, it’s not at all hard to imagine the conversations between headquarters and the staff at these polling groups. They like Obama and expect him to win, so – what a surprise! – the polls they control reflect that same attitude.
Polls are useful for investigating trends and movement within a specific demographic, provided the polling group is ethical enough to publish its internals. But trusting them for an honest topline report amounts to trusting Obama’s campaign to honestly report how the election is really going.
But if the polls have been so biased, one may reasonably ask why that is so. I myself have commended groups like Gallup for a very professional job over many years, and even though I strongly disagree with the conclusions published by groups like CBS News, I applaud their open way of reporting at least some of the significant internal data. In fact, it is CBS News which reveals how this bias is operating, and how even well-intentioned pollsters can make major blunders in their assumptions.
I disagree with CBS News because of how it weights its respondent pool. And lately, what I have seen is a trend, verging on the ridiculous, of far too many Democrats in the pool to make any sense at all. This has been happening in both national polls and in state polls. For national polls, I mentioned some weeks back how Gallup managed to show Obama declining or staying steady in every political affiliation group over a week while McCain was steady or gained in every such group, yet Gallup’s headline claimed Obama was gaining support overall, a mathematical impossibility without manipulating the proportionate weights.
For the states, Survey USA’s polls also show a strong pro-Democrat bias, as shown in the following states (2004 and 2006 DRI splits come from actual elections, SUSA’s 2008 split is arbitrary):
Ohio – 2004 DRI split was 35%/40%/25%, 2006 was 40%/37%/23%
SUSA in 2008 is using 46%/33%/20%
North Carolina – 2004 DRI split was 39%/40%/21%, 2006 was 39%/40%/21%
SUSA in 2008 is using 42%/37%/18%
Virginia – 2004 DRI split was 35%/39%/26%, 2006 was 36%/39%/26%
SUSA in 2008 is using 39%/30%/25%
Pennsylvania – 2004 DRI split was 41%/39%/20%, 2006 was 43%/38%/19%
SUSA in 2008 is using 54%/35%/10%
Florida – 2004 DRI split was 37%/41%/23%, 2006 was 36%/39%/25%
SUSA in 2008 is using 40%/42%/16%
Survey USA is using weights which have no historical validity whatsoever in their state polling. “Garbage” is not too strong a word to describe their published results.
So what’s the deal? Something is happening to skew the polling groups’ perception of how they think voters will turn out, and in publishing invalid conclusions as they have, they are – intentionally or not – misleading the public about the election conditions. Since the reputation of the polling group is essential in attracting future business clients, it hardly seems reasonable to consider these blunders to be deliberate. Although I have written that polls fall into the unethical habit of selling a roller coaster story which they know is not accurate, polls do try to stay close enough to be plausible. One must conclude that they have come to believe their own hype, forgetting Heisenberg’s warning that observing a behavioral event not only influences the event, but also affects the observer as well.
So, in looking around for a cause, I found something all major polls have in common. Look at their headquarters locations:
Poll Headquarters
ABC News 77 W 66th St, #13, New York City, New York
CBS News 524 W 57th St, New York City, New York
FOX News 1211 Avenue of the Americas, New York City, New York
Gallup 901 F St NW, Washington DC
Hotline 88 Pine St, 32nd floor, New York City, New York
IBD 12655 Beatrice St. Los Angeles, California
LA Times 202 W 1st St, Los Angeles California
Marist 3399 North Rd, Poughkeepsie, New Jersey
Mason-Dixon 1250 Connnecticut Ave #200, Washington DC
Newsweek 251 W 57th St, New York City, New York
NY Times 1 City Hall, New York City, New York
Pew 1615 L St NW, #700, Washington DC
Quinnipiac 275 Mount Carmel Ave., Hamden Connecticut
Rasmussen 625 Cookman, #2, Asbury Park, New Jersey
Reuters 3 Times Square, New York City, New York
Survey USA 15 Bloomfield Ave., Verona New Jersey
TIPP 690 Kinderkamack Rd, Oradell, New Jersey
WaPo 1150 15th St NW, Washington DC
Zogby 901 Broad St, Utica, New York
All of them deep in “blue” territory, many packed together up on the northeast corner of Obama territory. The only non-east-coast member of this group is the LA Times, located in the most liberal section of California, also solid blue in perspective. This is not a coincidence, all of the major polling organizations are based in locations where liberals are strongest and conservatives weakest, where ‘democrat’ and ‘republican’ take on meanings wildly different from the rest of the country. As a result, it is obvious that the prevailing culture in this limited part of the country has an undue influence on the focus applied by these polling groups. Democrats, especially liberal democrats, are over-represented in the poll reports because the culture of New York and Northeast America over-represents liberals. Republicans, especially conservative republicans, are suppressed in the poll reports because the culture at the polling groups’ headquarters suppresses republican opinion.
I learned long ago, that when a manager displays certain personality traits, they are soon reflected by the employees at that company. A relaxed manager who is confident tends to improve the mood of his staff, while a tense micro-manager creates the same attitude in his employees. Knowing this, it’s not at all hard to imagine the conversations between headquarters and the staff at these polling groups. They like Obama and expect him to win, so – what a surprise! – the polls they control reflect that same attitude.
Polls are useful for investigating trends and movement within a specific demographic, provided the polling group is ethical enough to publish its internals. But trusting them for an honest topline report amounts to trusting Obama’s campaign to honestly report how the election is really going.
Tuesday, October 14, 2008
Battlegrounds
Yesterday’s article explored the way state polls can verify the accuracy (or its lack) in national polls. But the reverse is also true; once we have a reasonable idea of how the national race looks, we can address where the states really stand in terms of the race. With just three weeks to go, it’s no surprise that a number of races have been decided, but what’s interesting are the number still to be decided. Now, I’m an accountant not a psychic, and I have been frustrated by the obvious fact that the campaigns’ inner circles have some delicious statistical information that nobodies like me are not allowed to glimpse, but here’s how I see the states playing out at this time.
Let’s start with the states which Senator Obama has locked up, that is, states which are extremely unlikely to be anything but blue come November 5:
District of Columbia (3 EV)
Hawaii (4 EV)
Illinois (21 EV)
Maryland (10 EV)
Massachusetts (12 EV)
New York (31 EV)
Rhode Island (4 EV)
Vermont (3 EV)
Sub-tally: seven states plus DC, 88 EV
Now, let’s look at the states which McCain has locked up by the same definition:
Alabama (9 EV)
Alaska (3 EV)
Arizona (10 EV)
Georgia (15 EV)
Idaho (4 EV)
Indiana (11 EV) [it may be close but it will be red]
Kansas (6 EV)
Kentucky (8 EV)
Mississippi (6 EV)
Montana (3 EV)
Nebraska (5 EV)
North Dakota (3 EV)
Oklahoma (7 EV)
South Carolina (8 EV)
South Dakota (3 EV)
Tennessee (11 EV)
Texas (34 EV)
Utah (5 EV)
Wyoming (3 EV)
Sub-tally: nineteen states, 154 EV
If we stopped there, that would indicate McCain was in much stronger shape than Obama, but that image would be misleading. To see why, let’s look next at states where the state is not locked up, but the odds are at least 7 to 1 in favor of one candidate.
Obama heavy advantage states:
California (55 EV) [32 to 1 odds]
Connecticut (7 EV) [49 to 1 odds]
Delaware (3 EV) [11 to 1 odds]
Iowa (7 EV) [7 to 1 odds]
Maine (4 EV) [7 to 1 odds]
Minnesota (10 EV) [24 to 1 odds]
New Jersey (15 EV) [7 to 1 odds]
Oregon (7 EV) [7 to 1 odds]
Washington (11 EV) [11 to 1 odds]
Sub-tally: nine states, 119 EV
Running total: sixteen states plus DC, 207 EV
McCain heavy advantage states:
Arkansas (6 EV) [7 to 1 odds]
Louisiana (9 EV) [19 to 1 odds]
North Carolina (15 EV) [7 to 1 odds]
West Virginia (5 EV) [7 to 1 odds]
Sub-tally: four states, 35 EV
Running total: twenty-three states, 189 EV
Do the math, and this leaves us with eleven states for our battleground. Here are those twelve states, and where they appear to stand with the state polls reweighted to match historical norms:
Colorado (9 EV) : Range is Obama +5 to McCain +7, probability is 55% Obama at this time
Florida (27 EV): Range is Obama +5 to McCain +8, probability is 65% McCain at this time [turnout especially crucial here]
Michigan (17 EV): Range is Obama +9 to McCain +3, probability is 82% Obama at this time
Missouri (11 EV): Range is Obama +3 to McCain +5, probability is 65% McCain at this time
Nevada (5 EV): Range is Obama +4 to McCain +8, probability is 70% McCain at this time
New Hampshire (4 EV): Range is Obama +11 to McCain +7, probability is 61% Obama at this time
New Mexico (5 EV): Range is Obama +8 to McCain +4, probability is 71% Obama at this time
Ohio (20 EV): Range is Obama +4 to McCain +6, probability is 57% McCain at this time [I keep hearing how McCain is much, much stronger here, but I have no hard data]
Pennsylvania (21 EV): Range is Obama +15 to McCain +3, probability is 80% Obama at this time [however, both campaigns are spending a lot of resources and time here, indicating things are very tight here]
Virginia (13 EV): Range is Obama +8 to McCain +10, probability is 67% McCain at this time
Wisconsin (10 EV): Range is Obama +10 to McCain +4, probability is 78% Obama at this time
If all eleven states fall in according to present probabilities, Obama collects six states and 66 EV, to reach twenty-two states plus DC and 273 electoral votes, while McCain collects five states and 76 EV, to reach twenty-eight states and 265 electoral votes. That’s good news for Obama, except for a few details.
[] First, all of the eleven battleground states are in play and could go to either candidate. Present conditions are very likely to change.
[] Second, in all of these states turnout will be very important, and in Florida, Colorado, and Pennsylvania, turnout will be the most important factor.
[] And last, there is a significant representation of independent voters in all of the eleven battleground states. And at last check, just about one-third of the independent voters have not yet decided if they will vote, and for whom they would cast their ballot.
Let’s start with the states which Senator Obama has locked up, that is, states which are extremely unlikely to be anything but blue come November 5:
District of Columbia (3 EV)
Hawaii (4 EV)
Illinois (21 EV)
Maryland (10 EV)
Massachusetts (12 EV)
New York (31 EV)
Rhode Island (4 EV)
Vermont (3 EV)
Sub-tally: seven states plus DC, 88 EV
Now, let’s look at the states which McCain has locked up by the same definition:
Alabama (9 EV)
Alaska (3 EV)
Arizona (10 EV)
Georgia (15 EV)
Idaho (4 EV)
Indiana (11 EV) [it may be close but it will be red]
Kansas (6 EV)
Kentucky (8 EV)
Mississippi (6 EV)
Montana (3 EV)
Nebraska (5 EV)
North Dakota (3 EV)
Oklahoma (7 EV)
South Carolina (8 EV)
South Dakota (3 EV)
Tennessee (11 EV)
Texas (34 EV)
Utah (5 EV)
Wyoming (3 EV)
Sub-tally: nineteen states, 154 EV
If we stopped there, that would indicate McCain was in much stronger shape than Obama, but that image would be misleading. To see why, let’s look next at states where the state is not locked up, but the odds are at least 7 to 1 in favor of one candidate.
Obama heavy advantage states:
California (55 EV) [32 to 1 odds]
Connecticut (7 EV) [49 to 1 odds]
Delaware (3 EV) [11 to 1 odds]
Iowa (7 EV) [7 to 1 odds]
Maine (4 EV) [7 to 1 odds]
Minnesota (10 EV) [24 to 1 odds]
New Jersey (15 EV) [7 to 1 odds]
Oregon (7 EV) [7 to 1 odds]
Washington (11 EV) [11 to 1 odds]
Sub-tally: nine states, 119 EV
Running total: sixteen states plus DC, 207 EV
McCain heavy advantage states:
Arkansas (6 EV) [7 to 1 odds]
Louisiana (9 EV) [19 to 1 odds]
North Carolina (15 EV) [7 to 1 odds]
West Virginia (5 EV) [7 to 1 odds]
Sub-tally: four states, 35 EV
Running total: twenty-three states, 189 EV
Do the math, and this leaves us with eleven states for our battleground. Here are those twelve states, and where they appear to stand with the state polls reweighted to match historical norms:
Colorado (9 EV) : Range is Obama +5 to McCain +7, probability is 55% Obama at this time
Florida (27 EV): Range is Obama +5 to McCain +8, probability is 65% McCain at this time [turnout especially crucial here]
Michigan (17 EV): Range is Obama +9 to McCain +3, probability is 82% Obama at this time
Missouri (11 EV): Range is Obama +3 to McCain +5, probability is 65% McCain at this time
Nevada (5 EV): Range is Obama +4 to McCain +8, probability is 70% McCain at this time
New Hampshire (4 EV): Range is Obama +11 to McCain +7, probability is 61% Obama at this time
New Mexico (5 EV): Range is Obama +8 to McCain +4, probability is 71% Obama at this time
Ohio (20 EV): Range is Obama +4 to McCain +6, probability is 57% McCain at this time [I keep hearing how McCain is much, much stronger here, but I have no hard data]
Pennsylvania (21 EV): Range is Obama +15 to McCain +3, probability is 80% Obama at this time [however, both campaigns are spending a lot of resources and time here, indicating things are very tight here]
Virginia (13 EV): Range is Obama +8 to McCain +10, probability is 67% McCain at this time
Wisconsin (10 EV): Range is Obama +10 to McCain +4, probability is 78% Obama at this time
If all eleven states fall in according to present probabilities, Obama collects six states and 66 EV, to reach twenty-two states plus DC and 273 electoral votes, while McCain collects five states and 76 EV, to reach twenty-eight states and 265 electoral votes. That’s good news for Obama, except for a few details.
[] First, all of the eleven battleground states are in play and could go to either candidate. Present conditions are very likely to change.
[] Second, in all of these states turnout will be very important, and in Florida, Colorado, and Pennsylvania, turnout will be the most important factor.
[] And last, there is a significant representation of independent voters in all of the eleven battleground states. And at last check, just about one-third of the independent voters have not yet decided if they will vote, and for whom they would cast their ballot.
Monday, October 13, 2008
Using State Poll Performance Metrics to Verify National Support
Last week in the comment section to one of my articles at Wizbang, reader Andrew Byler made an important observation regarding the over/underperformance of state polls. I think it got lost by most readers, so I am posting today’s article on that subject.
The more observant readers have noticed that the election depends more heavily on state results than the national tally, so they have repeatedly asked about what the state polls say. There is a range of opinion coming out of those polls, but one thing that should be observed, is that there is a clear relationship between the state and national performance. For example, in the state of Ohio, it has long been said that no Republican can win without it. That is the case, it turns out, because Ohio tends to track relatively close to national performance; if a republican does poorly in Ohio, he will likely do poorly across the nation. Over the course of election history, most states have demonstrated a tendency to favor the republican or democrat candidate, and by a certain range of support. This allows us to use the state polls to test the veracity of the national polls, and vice versa.
Let’s have a look at a few states to see how this plays out:
Let’s start with California. RCP’s average shows Obama ahead by 14.5 points in California. RCP also says Obama is ahead 7.1 points nationally. Looking at California, we see that on average, California favors the democrat candidate by an average of 4.0 points more than the national average, based on past election results. Accordingly, California’s poll numbers suggest that Obama’s real lead nationally is 11.1 points;
The next state is Texas. RCP’s average shows McCain ahead by 12.7 points in Texas. On average, Texas favors the republican by 7.0 points more than the national average, suggesting that McCain is leading the race, by 5.7 points.
Next up is New York. RCP’s average shows Obama ahead by 18.0 points in New York. On average, New York favors the democrat by 8.8 points more than the national average, suggesting that Obama is leading the race nationally by 9.2 points.
Next up is Alabama. RCP’s average shows McCain ahead by 23.8 points in Alabama. On average, Alabama favors the republican by 7.9 points more than the national average, suggesting that McCain has the national lead, by 15.9 points.
Next up is Florida. RCP’s average shows Obama up by 3.8 points in Florida. On average, Florida favors the republican by 2.8 points more than the national average, suggesting that Obama is leading the race nationally by 6.6 points.
Next up is Tennessee. RCP’s average shows McCain up by 15.7 points in Tennessee. On average, Tennessee favors the republican by 3.2 points more than the national average, suggesting that McCain is leading the race nationally by 12.5 points.
As you can see, there is no clear consensus from these results. Taken altogether, the 50 states project an Obama lead of 0.99 points right now. When the range of skewing is considered, the range could be anywhere nationally from Obama by 3.05 to McCain by 4.71 points.
Once again, the indications from the math are that the race is both close and fluid.
The more observant readers have noticed that the election depends more heavily on state results than the national tally, so they have repeatedly asked about what the state polls say. There is a range of opinion coming out of those polls, but one thing that should be observed, is that there is a clear relationship between the state and national performance. For example, in the state of Ohio, it has long been said that no Republican can win without it. That is the case, it turns out, because Ohio tends to track relatively close to national performance; if a republican does poorly in Ohio, he will likely do poorly across the nation. Over the course of election history, most states have demonstrated a tendency to favor the republican or democrat candidate, and by a certain range of support. This allows us to use the state polls to test the veracity of the national polls, and vice versa.
Let’s have a look at a few states to see how this plays out:
Let’s start with California. RCP’s average shows Obama ahead by 14.5 points in California. RCP also says Obama is ahead 7.1 points nationally. Looking at California, we see that on average, California favors the democrat candidate by an average of 4.0 points more than the national average, based on past election results. Accordingly, California’s poll numbers suggest that Obama’s real lead nationally is 11.1 points;
The next state is Texas. RCP’s average shows McCain ahead by 12.7 points in Texas. On average, Texas favors the republican by 7.0 points more than the national average, suggesting that McCain is leading the race, by 5.7 points.
Next up is New York. RCP’s average shows Obama ahead by 18.0 points in New York. On average, New York favors the democrat by 8.8 points more than the national average, suggesting that Obama is leading the race nationally by 9.2 points.
Next up is Alabama. RCP’s average shows McCain ahead by 23.8 points in Alabama. On average, Alabama favors the republican by 7.9 points more than the national average, suggesting that McCain has the national lead, by 15.9 points.
Next up is Florida. RCP’s average shows Obama up by 3.8 points in Florida. On average, Florida favors the republican by 2.8 points more than the national average, suggesting that Obama is leading the race nationally by 6.6 points.
Next up is Tennessee. RCP’s average shows McCain up by 15.7 points in Tennessee. On average, Tennessee favors the republican by 3.2 points more than the national average, suggesting that McCain is leading the race nationally by 12.5 points.
As you can see, there is no clear consensus from these results. Taken altogether, the 50 states project an Obama lead of 0.99 points right now. When the range of skewing is considered, the range could be anywhere nationally from Obama by 3.05 to McCain by 4.71 points.
Once again, the indications from the math are that the race is both close and fluid.
Subscribe to:
Posts (Atom)