Tuesday, September 02, 2008

Weighting and Other Poll Controversies

I have always found opinion polls to be fascinating, and yet I often mistrust the way in which polls are built and their results reported. It also occurs to me, that most folks do not really understand polling as a science, and so take it as, well, the political version of a horoscope. When I extrapolate that to the election, it explains quite a lot of just how some folks vote, but that’s beside the point for here. What I want to do today, is to explain just how it is that someone can take the opinions of a few to portray the opinion of the many, and what factors are the most influential in how a poll’s results are reported.

Let’s say you want to know how a group of a thousand people are likely to vote on an issue, say shareholders considering a potential merger with another company. If you ask one person, you will get one opinion, and that opinion would obviously represent 1/1000th or 0.1% of the group, so while you might be interested in that person’s opinion, you would not take it as a solid indicator of how the whole group feels. But at the same time, you really don’t want to have to ask all one thousand people, just to get an idea of what they would say now. If you ask a second person, you have still only addressed 0.2% of the whole, but at the same time your respondent pool has doubled in size and therefore increased in accuracy. To put it another way, let’s say that 665 of the 1,000 people would vote in favor of a proposed merger. It’s possible that you might get through all 335 people opposed to the merger before you get to anyone in favor of it, which would falsely indicate strong opposition, but if you make sure your queries are random, you are likely to start approaching a representative sample by the time you get to just ten people. Why? The key is partly how many people you ask – if you do it right, each person who answers your question lowers the statistical probability of error in your result by a relevant factor, a factor determined by the proportion of your respondent pool but also by the category of interest. In the example of the stockholders, for example, regional location, length of experience with the company, and preference for stock price or dividends might be relevant to how they would vote. That is, all of the stockholders who prefer a higher stock value to a higher dividend payment would be likely to vote the same way on the merger decision, and so the opinion of a relative few who have similar characteristics can reasonably represent the opinion of everyone in their group. Therefore, if the respondent pool includes a proportional representation of the whole population concerned, then statistically the small group may be expected to reflect the larger group’s opinion in scale. Over the course of the last seventy years, polling groups have found that once a respondent pool reaches eight hundred or more, the margin of error in a national contest is generally below four percent, meaning that in a two-candidate race the polling results for the candidate is within a four-point radius; if ‘A’ and ‘B’ poll at 42% and 48%, for example, A’s true level of support could actually be anywhere from 38 to 46 percent, while B could be anywhere from 44 percent to 52 percent in support. Frankly, in most elections this margin of error means that no clear message can or should be taken in terms of who is winning or by how much. The poll, however, is a valid tool for measuring development of support, when the questions and methodology used in the poll are consistent, and when the weighting used is consistent with Census norms.

This brings us back to weighting. By now it should be obvious that the weighting of a poll is critical to its determination. For example, let’s say you have a poll with exactly one thousand respondents. However, you have 700 Whites responding, with 150 Asians, 100 Hispanics, and 250 Blacks. The 2000 US Census reports that the racial breakdown is 71.6% White, 12.3% Black, 12.5% Hispanic, and 3.6% Asian. To match these demographic statistics, the polling data would then be weighted in the following manner:

The results from White respondents would be divided by 70.0 then multiplied by 71.6;
The results from Black respondents would be divided by 25.0 then multiplied by 12.3;
The results from Hispanic respondents would be divided by 10.0 then multiplied by 12.5; and
The results from Asian respondents would be divided by 15.0 then multiplied by 3.6.

This, of course, is only the racial weighting. Similar actions would be taken to adjust the statistical values of male and female responses to match Census norms, and responses would also be adjusted to match other relevant demographics, like age, geographic location, education, job category, military experience, and so on. The intent is to create an image aligned as correctly with the national model as much as possible. The problem, of course, is that every national poll is therefore manipulated to some degree.

There are three key problems to weighting polls. First, polls are driven by budget and time constraints, and as a result the weighting is often generalized, and not by the same method in each case. Some political polls, for example, start their age category with a broad “18-34” category, while others use a more narrow “18-24” or even “18-22” category to show college-age support. Worse, the range values sometimes fluctuate even by the same polling group, so that consistent methodology is lost, making the poll significantly less valid. Next, some polls have been known to fudge their weighting to match a different standard than the last Census. CBS and the New York Times, for example, have often ignored Census norms in favor of some arbitrary measure, which also violates the standard used in legitimate polling. And then there are the categories which defy clear definition. Almost no two major polls agree exactly, about what proportions of Republican and Democrat and Independent respondent should be used. Part of this is the fact that many states do not register political affiliation, and therefore the federal Census does not break down the population by party affiliation. So far, that doesn’t really bother me, except that the reader had better be aware that different polling groups will use different proportions in the way they weight political responses, because even though there is no official and firm balance of Republican-to-Democrat-to-Independent-to-Something Else, polls do indeed weight poll responses according to their party affiliation,. What’s worse, some of them will change the proportions from time to time, on no evidence beyond their belief that the mood has changed. This, of course, immediately invalidates the poll as an indicator of growing or lessening strength of support.

A poll is a useful indicator of trends and individual development of support by a candidate, provided the standards, methodology, and weighting remain constant. Otherwise, an opinion is absolutely worthless. Caveat Emptor, and then some.

No comments: