Distortions Are Common When Numbers Make News
INDIANAPOLIS - "There are lies, there are damned lies, and there are statistics," said Benjamin Disraeli.
The 19th-century British statesman was slightly ahead of his time. Readers and television watchers everywhere today are pummeled with news about "rate of inflation" and "incidence of cancer," and surveys that list a "margin of error" or "level of confidence."
Just recently, Indiana Republican gubernatorial hopefuls Rex Early and Stephen Goldsmith squared off on crime statistics in Indianapolis.
Statistics show that Indianapolis is less safe than New York City, said Early, Indiana's former GOP state chairman.
No, they don't, countered Goldsmith, the mayor of Indianapolis and a former county prosecutor. Statistics prove that Indianapolis is one of the safest cities in the nation.
The public will be forgiven for scratching its collective head.
A check with several math experts shows that distortions are common when numbers make the news:
Washington, D.C., for years was pegged as the murder capital of the nation because of its high murder rate. But the murder rate for Washington always was much lower if based on the metropolitan area, not just the city limits.
Household income is going down, reports say, which proves that Americans are becoming poorer. But household size is going down, too. Income may actually be rising on a per capita (per person) basis.
Children of divorce do worse in school than do children from two-parent households. This one's true, but children of divorce are more likely to live in poverty, and poverty may be the important variable here.
The number of people with cancer and the number of deaths caused by cancer overall remain steady, or are going up, in spite of the billions of dollars spent on cancer research and treatment, some researchers have noted. But people live longer today, and cancer most commonly is a disease of old age.
Statistics that can lead to inferences about populations - opinions of voters, or whether a new drug really is better than an old drug - are based on the theory of probability, explains Bay Chotlos, a math professor at Butler University in Indianapolis.
For example, probability dictates that if you flip a coin, chances are it will come up "heads" half the time, and "tails" the other half.
But if "heads" comes up 60 percent of the time after many trials, you have every right to suspect that something other than chance is at work here - such as, perhaps the coin has been weighted or otherwise fixed.
Similarly, in statistics, always look for deviations that are not due to chance.
Many studies seek to link two variables, such as smoking and lung cancer. If an unusually high number of smokers develop lung cancer, for example, scientists may conclude it's not due to chance, but that smoking actually may cause cancer.
Is this valid?
"Statistics don't prove anything," said Chotlos. "They show a relationship exists, but not a cause-and-effect relationship. You've just got to dig deeper."
Nevertheless, most responsible parties agree that smoking does cause lung cancer and other diseases.
But it would be better, says Chotlos, if you could design a carefully controlled study in which you allowed half of a randomly selected group of people to smoke tobacco, but not the other half.
Then, after the passage of time, you'd look for a higher incidence of lung cancer and other diseases in the first group. Chotlos notes that ethically this kind of human experimentation can't be done, and tobacco company executives have seized on this point to say there's no "proof" that smoking is dangerous.
Nadjib Bouzar, a math professor at the University of Indianapolis, cautions that many studies that seek to link two variables may ignore a third, or "lurking," variable. Consider the example of patient deaths at two competing hospitals.
Some people might assume that the hospital with a higher death rate is the worse hospital; but it may be that it treats more sickly patients to begin with. The lurking variable in this example would be the health status of the patients at the time of admission, Bouzar noted.
Have you ever wondered how pollsters can phone, say, several hundred registered voters and extrapolate from their answers just how the rest of the country is going to vote?
They're working on the principle of "random" samples. This simply means that every potential voter has an equal chance of being polled.
The more people surveyed, of course, the smaller the "margin of error." But it still takes only about 1,500 randomly selected respondents to guarantee a margin of error of 2 or 3 percentage points.
Having said that, there are problems with surveys. Most are done by phone, yet some poorer Americans don't have a phone, noted David Moore, a statistics professor at Purdue University. So not everyone has an equal chance of being polled.
"For the public, the most important question to ask is, `Where did the data come from?' " said Moore. "What you would like to see when you read someone's opinion poll is that the data came from a properly designed, randomized sample and that suitable efforts were made to contact each person in the sample."
Another game played with numbers is based on the premise that "you can't prove a negative," says Victor Cohn, former Washington Post science editor and author of News & Numbers (Iowa State University Press, 1989).
"If you want to say there are no little green people in America, all you can say is there is no evidence for the existence of them at this time," Cohn said.
Similarly, people who are short on evidence that could prove a Gulf War syndrome exists may say, "You can't prove that it doesn't exist," noted Cohn.