Stolen Thunder: statistics

Showing posts with label statistics. Show all posts

Thursday, December 01, 2016

Modern Bunk

The Presidential election of 2016 is over, even if millions of people seem to be in serious denial about that fact. An appalling example of this trend-to-tantrum is evident in the recount efforts in Wisconsin, Michigan, and to a much lesser degree Pennsylvania. A number of pablum excuses have been offered up to explain the recount demands, usually based on some assumption that any hacking can be discovered and ‘corrected’ by the recount, that recounts mean improving ballot validity, or that denying the recount is somehow proof that Trump is unethical. This article is written to address those canards.

Let me start with the stated reason for the recount demands – in laughable hypocrisy, the Clinton campaign has said they will “participate” in the recounts, even as they assure people that they are not disputing the election results, and are not the parties asking for the recounts in the first place.

Bunk.

Jill Stein, for example, only raised $3.5 million by the end of October for her campaign, yet by last Friday she found $5 million for the recount effort.

http://www.foxnews.com/politics/2016/11/25/jill-stein-raises-more-funds-for-recount-than-entire-presidential-campaign.html

There is no realistic way her supporters suddenly decided to become that generous. The obvious truth is that someone funded the recount through Stein, to hide the true purpose of the recount.

Let’s also look closer at the ‘hacker’ claim that was used to excuse demanding a recount. Speaking for former Senator Clinton, Jill Stein said that ‘hacking concerns’ were the reason for the recount demand, which claim does not begin to stand up to inspection.

http://www.foxbusiness.com/politics/2016/11/28/jill-stein-cites-hacking-concerns-as-reason-for-recounts-not-election-outcome.html

Especially amusing to me is that even as she demands recounts, Ms. Stein admits she has no evidence to support her claims. She wants multiple states to do recounts just because she claims there was possible hacking, even though she has no evidence, experts agree the recounts won’t change the election results, and even though it wastes time and resources.

But let’s play along for a little bit. Let’s look at the three states where Stein wants recounts; Pennsylvania, Michigan and Wisconsin. If someone was to hack the election in those three states, how would they do it?

It’s not like they show on TV. TV is there to tell you a story, not get into the hard facts of how something is done, especially illegal activities. Let’s skip the question of why someone would want to hack an election (besides the candidates) and look at the logistics.

The recount crowd insists that there is something wrong with those vote totals in Pennsylvania, Michigan and Wisconsin. Unfortunately for them, their argument is classic circular fallacy – ‘we lost, so someone must have cheated’ . A lot of folks on the left seem to be claiming people’s votes were changed in the machine, so let’s start with the machines used. If the election was hacked, it’s a big point for them to hack the same kind of machines, since machines from different companies will use different operating codes and processes, meaning what works on one type won’t work somewhere else.

Hmm.

Pennsylvania uses the Sequoia AVC Advantage voting machine.

http://www.montcopa.org/1388/Voting-Machines

OK so far, let’s see what the next two states use.

Michigan uses the Automark ES&S M100.

http://www.michigan.gov/sos/0,4670,7-127-1633_8716_45458---,00.html

So already our hackers have to know coding for two different kinds of voting machines. On to Wisconsin.

Wisconsin does not use voting machines in all places, and the state uses six different kinds of machines – the Optech Command Central Eagle, the Sequoia Insight, the Automark ES&S M100, the Automark ES&S DS200, the Premier Accuvote US, and the Dominion Imagecast Evolution.

http://elections.wi.gov/elections-voting/voting-equipment/voting-equipment-use

So now our poor hackers are up to five different makers of voting machine, and eight different specific models, each with its own process code.

That pretty much kiboshes the idea that precinct-level machines were hacked. And here’s a fun piece of trivia: Sequoia, the company that makes all those voting machines used in Pennsylvania, also supplies voting machines to fifteen other states, but for some reason Stein and Clinton don’t want to recount votes in other states using Sequoia machines, probably because that would mean recounting states Clinton already won.

The next argument would be that someone must have hacked the votes at the state level. You just need three access points, right?

That doesn’t work either. Before each state certifies its results, they manually verify the precinct totals with the state tallies, so even if someone got in and changed the totals at the state level, they‘d have to fudge the reported numbers from the precincts … which would be caught when the manual review was done.

But this all becomes plain when you go all the way back to the argument made by those ‘computer scientists’, who claimed that the votes looked irregular when compared to the polling data. I went to Real Clear Politics and looked up the state polls. These are the state polls reported by RCP over the last month before the election.

http://www.realclearpolitics.com/epolls/latest_polls/state/

I then counted just the most recent poll from each agency for a state. In all there were 204 ‘final’ polls reported for 37 contests. Here are the results for six states, three of which are the states for which Stein/Clinton want recounts, and three are for states Clinton won:

State A:

Trump poll averages 7.1 points below his election result

Clinton poll averages 2.2 points below her election result

Undecideds make up between 0% and 15% of poll results, average 6.7%

State B:

Trump poll averages 0.8 points below his election result

Clinton poll averages 7.3 points below her election result

Undecideds make up between 4% and 8% of poll results, average 6.3%

State C:

Trump poll averages 2.0 points below his election result

Clinton poll averages 2.4 points below her election result

Undecideds make up between 3% and 10% of poll results, average 5.8%

State D:

Trump poll averages 7.2 points below his election result

Clinton poll averages 5.1 points below her election result

Undecideds make up between 7% and 11% of poll results, average 8.7%

State E:

Trump poll averages 7.3 points below his election result

Clinton poll averages 0.7 points above her election result

Undecideds make up between 0% and 10% of poll results, average 6.5%

State F:

Trump poll averages 5.9 points below his election result

Clinton poll averages 0.2 points below her election result

Undecideds make up between 2% and 13% of poll results, average 5.0%

Note that in all six states, Trump outperformed his poll averages, and there was significant average undecided response to polls even at the end of the campaign. Note that in five of six states, Clinton also outperformed her poll averages. The data is consistent with two candidates with high unfavorability numbers whose supporters may not want to confirm support for the candidate. Note also that the vote to poll performance is consistent within standard margins of error used in polls, debunking the claim that the vote results were incompatible with the state polls. In fact, Trump’s performance against state polls was very consistent; in 204 polls Trump’s performance beat poll numbers by an average of 5.69 points, and Undecideds made up an average of 7.11% of all poll responses.

The noise about recounts is just more whining from Democrats and Socialists who cannot accept that they did, in fact, lose the election.

* State A is Michigan, Trump 47.6% to Clinton 47.4% in vote results

* State B is California, Trump 32.8% to Clinton 61.6% in vote results

* State C is Nevada, Trump 45.5% to Clinton 47.9% in vote results

* State D is Massachusetts, Trump 33.5% to Clinton 60.8% in vote results

* State E is Wisconsin, Trump 47.3% to Clinton 46.5% in vote results

* State F is Pennsylvania, Trump 48.4% to Clinton 47.3% in vote results

Thursday, November 10, 2016

Pollsters Ignored Their “Check Assumptions” Lights

Back in 2000 and again in 2004, I enjoyed a small piece of influence through political opinion poll analysis. Statistics is an intriguing science, all the more because it tries to quantify and predict human behavior. But that same human behavior also skews how people think, including analysts, and in 2008 and 2012 it caused me to miss important trends in American politics. I was embarrassingly wrong in predicting the Presidential elections, especially missing the energy of Obama’s 2008 run. So I backed off, paid more attention to my regular job and family, and paid less attention to statistics. Others enjoyed the attention of poll mavens, especially Nate Silver, who turned his statistical devotion to baseball into political success with Obama’s success. But Silver made the same mistake I did, and in his case the embarrassment is greater because as a professional statistician, he really ought to have known better. Silver let his enthusiasm for Democrat opinion cause him to ignore warning signs until it was too late to avoid a face plant.

Let’s have a quick review of how polls saw the 2016 Presidential Election, and also how polls work, and finally how predictive analysis is created.

Hillary Clinton announced her decision to run for the White House on April 12, 2015. This is important because Clinton already enjoyed significant name recognition and with the roles of First Lady, Senator and Secretary of State on her resume, she would start as an obvious front-runner for the Democrats’ nomination. Nate Silver gave her a 59.9% chance of winning the party nomination at the beginning (I’m using Silver here for two reasons – first, his projections are built from aggregates of major national polls, and second, Silver was the most prominent poll analyst quoted in the media). She enjoyed media support through the end of 2015 as the presumptive front-runner, but by the end of October 2015 Clinton’s lead over Sanders in Silver’s chart was down to 46.8% to 26.1%, notable not for Sanders’ strength but Hillary’s weakness. By February 2016, Silver put the race at 49.6% Clinton to 39.1% Sanders – note that Hillary’s campaign was failing to win over most of the undecideds, losing them to Sanders more than four to one. By April 23, 2016 Silver had the race 49.6% Clinton to 41.5% Sanders; note two important factors apparent, first that Hillary appeared to have a lead bigger than Sanders could close, but second that Sanders had more momentum than Clinton, and had enjoyed higher energy for some months. By the end of June, Silver showed the race 55.4% Clinton to 36.5% Sanders, essentially a done deal for the Democratic Party nomination.

http://projects.fivethirtyeight.com/election-2016/national-primary-polls/democratic/

Donald Trump announced his candidacy for the office of the President on June 16, 2015. At that time Silver counted his support at a 3.6% chance of winning the GOP nomination. Let’s stop there and consider that this meant the polls showed Hillary Clinton’s chances of winning her party’s nomination were more than sixteen times greater than Donald Trump’s chances of winning his party’s nomination. Part of this was due to the heavy number of candidates for the Republican nod, but also Donald Trump – while known as a face and name – was unknown as a political contender, so he had to establish his bonafides with both the GOP and the voters. Trump’s campaign quickly gained support, however, as he passed the 20% threshold on July 26, 2015, and the 40% threshold on March 21, 2016. This means that Donald Trump had not won over most voters until after his Super Tuesday wins in Alabama, Arkansas, Georgia, Massachusetts, Tennessee, Virginia and Vermont. On March 22, Trump claimed another 58 delegates by winning the Arizona primary. By the end of May, Trump had essentially locked up the GOP nomination.

http://projects.fivethirtyeight.com/election-2016/national-primary-polls/republican/

Both Clinton and Trump finished the win-the-nomination part of their campaigns with damage, however. Trump’s problems were obvious – to energize his base, Trump attacked establishment Republicans and demographics aligned with opponents of populist theory, and this cost him nationally in polls. In early June, polls showed Trump’s support at 38.1%, compared to 42.1% for Clinton. But Clinton had obvious problems, too. The way Clinton won the Democrats’ nomination left many Sanders supporters convinced the primary had been rigged, which may be one reason Trump made similar claims as the General Election reached its resolution. But also, given the many demographic groups Trump had – allegedly – attacked, a four-point lead for Clinton was a clear warning sign that something was not as described.

Call it a poll version of that annoying “check engine” light on your dashboard. Until you have someone get under the hood, you don’t know what exactly has gone wrong, but you can’t ignore it unless you don’t mind spending hours on the side of the road beside your smoking vehicle, at the mercy of passing traffic. There is science behind a poll that is put together and analyzed properly, but laziness or assumptions in your data or procedures can invalidate your conclusions, and make you look a fool in public.

By the way, Nate Silver uses an aggregate of polls, but he is also guilty of some subjectivity in his source selection. For example, Silver’s aggregate shows Clinton had a wire-to-wire lead over Trump in polling, with Trump never enjoying a lead in the aggregate polling at any time:

http://projects.fivethirtyeight.com/2016-election-forecast/national-polls/

Real Clear Politics, however, which also uses an aggregate of polls, showed Donald Trump with an aggregate lead on May 24 and from July 25 through July 28 of this year.

http://www.realclearpolitics.com/epolls/2016/president/us/general_election_trump_vs_clinton-5491.html

That’s not to say one aggregate is ‘better’ than the other, but to illustrate the fact that any aggregate is subjective and contains implicit bias. Ironically, Silver was aware of this bias and tried to correct for it – he calls this “trend line adjustment” – but in the end Silver’s own bias still influenced his conclusions.

http://www.huffingtonpost.com/entry/nate-silver-election-forecast_us_581e1c33e4b0d9ce6fbc6f7f

It’s important to remember that Silver was wrong about Trump winning the GOP nomination. After trump won the GOP nomination, Silver admitted “we basically got the Republican race wrong.”

http://fivethirtyeight.com/features/why-republican-voters-decided-on-trump/

There was no evidence that Silver went back to find the evidence he overlooked in his initial analyses, which could have corrected his results in the General Campaign. But here is, at least, evidence that Silver knew something in the numbers was wrong. Just before the final day of the election, Silver put out his “final election update”, giving Clinton a 71% chance of winning.

http://fivethirtyeight.com/features/final-election-update-theres-a-wide-range-of-outcomes-and-most-of-them-come-up-clinton/?ex_cid=2016-forecast

This ran contrary to far more aggressive posts from the New York Times, which gave Clinton an 82% probability of winning,

http://www.nytimes.com/elections/forecast/president

the Princeton Election Consortium gave Clinton a 93% chance to win the White House,

http://election.princeton.edu/2016/11/08/final-mode-projections-clinton-323-ev-51-di-senate-seats-gop-house/

left-leaning pundit Larry Sabato did not offer a probability, but called for Clinton to win 347 Electoral Votes,

http://ijr.com/2016/08/667335-famed-election-predictor-with-97-100-track-record-reveals-his-trump-vs-hillary-2016-results/

and of course the Huffington Post posted that Clinton had a 98% chance to win the Oval Office.

http://elections.huffingtonpost.com/2016/forecast/president

Anyone who turned on ABC, NBC, CBS, CNN, or Fox was also flooded with assurances that Clinton was poised to win by large margins. That all of these analysts were wrong, and to such a large degree, is amusing given their hubris, but concerning given their prominence in media coverage of the election.

The last week of the election, Nate Silver’s concerns about the polling data caused him to scale back his probability for Clinton (he initially had Clinton at 89%, but as the election approached he walked it back to 71%), while Ryan Grim of the Huffington Post kept Clinton at a 98% chance to win. This led to some ill-advised words on Twitter between the two men about each other’s methodology.

http://www.vox.com/2016/11/6/13542328/nate-silver-huffpo-polls

Ironically, while Silver was correct that weighting Clinton’s advantage beyond anything supported by poll data was foolish, he failed to properly test the underlying assumptions installed in his own model.

I found it intriguing to notice that neither Gallup nor Pew published polls for the Presidential election, each focusing instead on issues rather than candidates. A business reason was provided,

http://time.com/4067019/gallup-horse-race-polling/

but given the long history and prominence Gallup and Pew enjoyed in polling Presidential races, the reason given rings false. A more likely explanation is the difficulty in addressing behavior changes in the voting public. In addition to the shift from landline phones to cell phones, voters are more likely to discuss opinions on line than in a phone interview, but there is no statistically sound means to randomly contact respondents online and the results of online polls are as varied as there are opinions reported by them. Pew observed that online polls are “non-probability” polls, which eliminates by definition the random nature of polls, and therefore calls into question any political conclusion presented by such a poll.

http://www.pewresearch.org/fact-tank/2014/07/28/qa-what-the-new-york-times-polling-decision-means/

Pew also posted an article yesterday about why the polls were essentially wrong, but was wrong to pretend weighting mistakes were not a big part of blunder.

http://www.pewresearch.org/fact-tank/2016/11/09/why-2016-election-polls-missed-their-mark/

Forbes boasted that analysts predicting a Hillary win “used the most advanced aggregating and analytical modeling techniques available”

http://www.forbes.com/sites/startswithabang/2016/11/09/the-science-of-error-how-polling-botched-the-2016-election/#4d6c04257da8

but that is a false claim on its face. What happened was not a “statistical error”, but human error. Weighting for party affiliation or other demographics, is risky at best and often leads to unreliable results. To see what I mean, let’s start with the exit poll from the 2012 Presidential Election, by party affiliation, gender, race, and age:

Party Affiliation: Democrats 38%, Republicans 32%, Independents 29%

Gender: Women 53%, Men 47%

Race: White 72%, African American 13%, Hispanic 10%, Asian 3%, Other 2%

Age: 45-64 38%, 30-44 27%, 18-29 19%, 65 & over 16%

http://ropercenter.cornell.edu/polls/us-elections/how-groups-voted/how-groups-voted-2012/

And from 1984 through 2014:

Party Affiliation: Democrats 38.6%, Republicans 32.6%, Independents 27.5%

Gender: Women 53%, Men 47%

Race: White 76%, African American 13%, Hispanic 7%, Asian 2%, Other 1%

Age: 45-64 33%, 30-44 28%, 18-29 14%, 65 & over 25%

http://www.electproject.org/home/voter-turnout/demographics

http://ropercenter.cornell.edu/polls/us-elections/how-groups-voted/

Any poll with demographics different from these numbers is fiddling with the numbers out of clear bias. Without wasting time going through them this skewing invalidates polls from ABC News, the Wall Street Journal, Fox News, NBC News, CNN, and CBS. If you want to check for yourself, simply find one of their polls and drill down to the demographics which are usually included at the end of the topline detail.

Weighting is not supposed to produce the “right” answer, but to line information up according to known population demographics. Sadly, a lot of polls screw up the results by trying to sell a message, rather than accurately report the current situation. This is not an attempt to “rig” an election, I believe, but simple human laziness and a habit of using assumptions instead of due diligence.

This becomes ever more salient, when you realize that the aggregates used by analysts like Silver and Grim incorporate these biased reports, which invalidates their own analyses. Aggregation is really just group-think, even if some people publish such results with impressive names like “meta-sampling”. Everything that goes into an analysis should be tested for its own veracity, and while this is very difficult for a national report, at the very least you should be candid if you are trusting someone else’s report as a source for your own analysis. Yes, Silver claims he ‘unskews’ polls by other agencies, but that’s kind of like a guy admitting someone spit into your drink but he scooped it out and it’s fine for you to drink. If you know the source is biased, it does not belong in your own work, none of it.

One last thought on polling. The Presidential Election is not a national race, no matter what the media tells you. It’s actually fifty-one different races, which results are summed up and produce the champion, in this case the President-Elect of the United States. So the polls you ought to have watched are the state polls, especially according to the respective electoral vote value of each state. Most media ignored the state-level polling, and when it was reported it was usually just from a single source that the media found reliable. I will be publishing a report on the accuracy of the state polls for the 2016 Election when I have all the data, but for now it’s important to know the limits of what analysts even can tell you, and keep in mind that most media people are there to sell you entertainment, not facts.

Sunday, August 02, 2009

A Clear Sign, But What Does It Say?

I have not written much at all about President Obama’s Approval Ratings in the polls since he was inaugurated in January, but noting the recent trend it seems appropriate to do so now.

I have said many times that for me, the gold standard in opinion polling is the Gallup Organization. This is due not only to Gallup’s long history, but also because Gallup follows a very consistent methodology and set of questions. This allows interested researchers the opportunity to track support within a poll over a period of time, to better gauge the actual cause and effect of his policies and decisions.

The people at Real Clear Politics provide a very useful resource, where general polling support can be easily tracked.

Looking at these polls, the following polls provide a track of Obama’s job approval since January:

Gallup: 68% when sworn in, 56% now, loss of 12 points
Rasmussen: 62% when sworn in, 50% now, loss of 12 points
CBS/NYT: 63% February 22, 58% now, loss of 5 points
NBC/WSJ: 60% March 1, 53% now, loss of 7 points
Pew: 64% February 8, 54% now, loss of 10 points
NPR: 59% March 14, 53% now, loss of 6 points
FOX: 65% when sworn in, 54% now, loss of 11 points

In every case of long-term tracking, President Obama’s levels of job approval are the lowest overall he has seen since taking office, across the board.

But a closer look shows the problem may be more serious, nothing to worry about, or paradoxically, both.

In addition to a high-level overview, the Gallup Organization also publishes support by demographic groups.

An examination of those 28 demographic groups, determined by gender, age, geographic region, race, education, wages, political affiliation and orientation shows that in 22 of 28 demographic categories, support for President Obama is at its lowest or tried for the lowest level since he took office. The six demographic areas where support for President Obama is not at its nadir, are Non-White voters (85% support highest on April 26, 75% lowest on April 5, presently at 79%), Black voters (96% highest on July 5, May 4, and March 8, lowest at 86% on January 25, presently at 95%), Hispanic voters (85% highest on April 26, 70% lowest on April 5 and March 22, presently at 72%), Voters making below $24,000 a year (76% highest on May 4, 66% lowest on June 21, presently at 68%), Republicans (41% highest on January 25, 20% lowest on July 12, presently at 21%), and Liberals (90% highest on June 28, May 31, May 24, and April 26, 83% lowest on January 25, presently at 86%). All of those demographics are relative minorities to the voting population at this time.

Even with the loss of support, however, President Obama still enjoys support levels above 50% across the board, indicating that his personal popularity is strong and the general theme of his administration is well-received. Therefore, it may be reasonable to consider the loss of support nothing more than a shaking out of the fair-weather support, and displaying a strong core of support for the President. That is, of course, assuming his numbers do not continue to fall.

It should, however, be noted that President Obama has lost significant support among major demographic groups. Between February 1 and July 26, President Obama lost twelve points of support from female voters, who were the dominant gender in the 2008 election. White voters made up 74% of the electorate in the 2008 election, and since taking office President Obama’s support among whites has fallen sixteen points according to Gallup. The largest demographic age group in the 2008 election was the 30-49 age group; among this group President Obama has lost twelve points since taking office. Among moderates, the largest political philosophy demographic, President Obama has lost ten points since taking office. The South was the most important geographic region in the 2008 election, and among Southern voters, President Obama has lost twelve points of support since taking office. In the 2008 election, the largest demographic by education was the ‘Some College’ category, and in that category President Obama has lost fifteen points of support since taking office. And among voters earning between sixty thousand and ninety thousand dollars a year, again the largest demographic in their section, President Obama has seen his support fall by twenty points since taking office. The conclusion is unavoidable that, if this loss of support is not rebuilt and assuming the Republicans can present a credible candidate, that at this time President Obama has seriously damaged his re-election chances, since every dominant demographic group from the 2008 election has significantly reduced support for the President since his Inauguration.

Stolen Thunder

Thursday, December 01, 2016

Modern Bunk

Thursday, November 10, 2016

Pollsters Ignored Their “Check Assumptions” Lights

Sunday, August 02, 2009

A Clear Sign, But What Does It Say?

Meter

Blog Archive

The ST Blogroll (great with Coffee!)

About Me