The Risky Business of Public Health Research
Copyright ©
1995 by Steven J. Milloy. All rights reserved. First edition. Published by the
Cato Institute, 1000 Massachusetts Avenue, N.W., Washington, D.C. 20002.
Library of Congress Catalog Number: 95-72177. International Standard Book
Number: 0-9647463-2-8.
Mining for
Statistical Associations
Once you've
collected your data, how do you find the risk that's your ticket to stardom?
There are two tried-and-true techniques virtually guaranteed to turn up
something.
Disease Clusters and the Texas Sharpshooter
One of the best
techniques is called the Texas Sharpshooter method. It goes something like
this: The Texas Sharpshooter sprays the side of an abandoned barn with gunfire.
He then draws a bull's eye target around a cluster of bullet holes that
occurred randomly. He then can say, "See what a good shot I am!"
Basically, you can be your own sharpshooter
if you find a cluster of disease and then shout "Aha!" or
"Eureka!" or something to denote you've discovered the mother lode.
Clusters are easy to find; they're everywhere, in fact. Epidemiologic studies
of hazardous waste sites and electromagnetic fields are famous for clustering
and the Texas Sharpshooter technique. For example, a study of a Woburn, Mass.,
site associated a cluster of 20 childhood leukemia cases with the site. It was
very convincing. It didn't even matter that none of the contaminants at the
site causes leukemia. That's the power of a cluster!
Consider, for example, one out of every
three people in the United States will develop cancer sometime during their
lifetimes. We call this the background risk or "natural" rate of
cancer. It's yours by virtue of your birth. Now, if you do an analysis of
cancer rates by geographic region or state or county or city or neighborhood,
you will likely find that some areas will have a cancer rate of exactly 1 in 3.
But most areas will have cancer rates that greater or less than 1 in 3.
Now, a real statistician will look at these
rates and say, "Well, just by chance some areas will have higher cancer
rates and some areas will have lower cancer rates. The differences average out
as the geographic area gets larger. So the differences in rates between areas
likely means nothing."
That may be, but you can't let that stop
you. You've got to grab those areas with higher cancer rates and insist there's
more to them than chance. Draw a bull's eye around the cluster you want and
take it to the bank.
Data dredging: I know there's an association in here somewhere
Sometimes,
clusters aren't obvious. You've got a river of data and nothing's making a
ripple. What do you do? Well, what do you do when you're looking for something
lost in a river? Simple, you get some dredging equipment and comb the river. By
turning up everything, you hope you'll turn up what you want. Or if "you
can't always get what you want...you just might find you get what you
need." (Just kidding!) So what do you when you're looking for something
lost in a river of data. Data dredge!
Conceptually, data dredging is like the
Texas Sharpshooter technique except clusters are harder to find. You have to
analyze your data forwards and backwards, from the top, bottom, and sides, from
the inside out, and from the outside in. You slice it, dice it, and pick it
apart any way you can to find an artifact (I mean risk) worth all this trouble.
All you need is a computer and a good
statistical analysis program that can go through your data and look at every
possible association. The computer does all the work — you get all the credit.
All you have to do is pick the association you think makes your case and write
it up. Let's look at a recent example.
A case-control study looked at risk factors
for childhood leukemia, including environmental chemicals, electric and
magnetic fields, past medical history, parental smoking and drug use, even
dietary intake of certain food items. For "dietary intake of certain food
items" alone, the study analyzed nine different foods, including breakfast
meats, hot dogs, luncheon meats, hamburgers, charbroiled meats, oranges and
orange juice, grapefruits and grapefruit juice, apple juice and cola drinks.
Obviously, right from the start, the
researchers had no idea what they were looking for; they were simply on a
fishing expedition. Amazingly enough, they caught a big one!
In examining the myriad of possible
statistical associations, the study identified associations between a number of
exposures and leukemia. These included breastfeeding, use of indoor pesticides,
children's use of hair dryers, children's use of black-and-white television
sets, incense use, father's occupation, mother's exposure to spray paints
during pregnancy, other chemical exposures and home electrical wiring
configurations. The association that received the most attention, however, was
the one between hot dogs (eating more than 12 dogs a month, that is) and
leukemia.
For this association, epidemiologists found
a relative risk of 9.5, indicating, in their study, that children consuming
more than 12 hot dogs per month were 9.5 times more likely to develop leukemia
than children who consumed no hot dogs. The authors determined this association
was biologically plausible because processed meats contain nitrites which may
be precursors of other chemical compounds that have been associated with
causing leukemia in rats and mice.
The researchers concluded their study
"suggests" that diet is important to leukemia risk and that reduced
consumption of hot dogs could reduce leukemia risk. A great result from a
fishing expedition.
My only criticism is that the authors included
in their writeup enough information for the careful reader to discern the study
failed to come up with associations between other types of processed meats
(including ham, bacon, sausage and luncheon meats) and leukemia. Given that
these foods also contain nitrites and, therefore, should also be associated
with leukemia risk, the authors should have omitted this information from their
report. It only detracts from their conclusions about hot dogs.
What if you don't
have the time or the money or the inclination to do your own epidemiologic
study? What if others have already published epidemiologic studies on your risk
but they didn't find anything convincing. Or some found something while others
haven't? Well, just be very creative.
You could take the existing studies, assume
that they are similar enough to be combined and, voila!, you have an
entirely new study. This technique is called meta-analysis. The best way to
demonstrate the power of meta-analysis is to show you the greatest masterpiece,
the Mona Lisa, of all meta-analyses: the Environmental Protection Agency's risk
assessment on environmental tobacco smoke (ETS). There simply is no better
example of this technique at work.
At the time the ETS risk assessment was
conducted, there were 30 published (and who knows how many unpublished)
epidemiologic studies on ETS conducted in a number of countries. Of the 30
published studies, eight reported statistically significant associations
between exposure to ETS and lung cancer; 22 other studies reported either no
association or no statistically significant association. Of the 11 studies that
examined U.S. populations, only one reported a statistically significant
association.
Realizing the difficulty of credibly associating
ETS with lung cancer based on conflicting studies, the ever-resourceful EPA
chose meta-analysis. Using this technique, EPA combined the 11 U.S. ETS
epidemiologic studies and came up with a relative risk of 1.19 that was
statistically significant at a 90 percent confidence level. (Note: Even though
their results weren't statistically significant at a 95 percent level they were
resourceful enough to claim statistical significance at a lower level. Another
clutch decision!) With this "statistically significant" relative
risk, EPA went on to estimate 3,000 lung cancer deaths can be attributed to ETS
every year.
What's so amazing about all this? Well, EPA
did such a good job picking a target for its risk assessment and meta-analysis
that the intrinsic characteristics of the target itself were strong enough to
overcome the scientific deficiency of the meta-analysis.
ETS was a classic target. The risk was
unprovable (any risk would be too small to find, a fact borne out when 10 out
of 11 U.S. studies turned up nothing). ETS is a common exposure. The
cause-and-effect relationship in question is intuitive. The tobacco industry is
easy to pick on. ETS is an involuntary risk. And, for non-smokers, there's no
personal sacrifice involved in forcing others to quit. The technical
deficiencies, while numerous and significant, were no match for these intrinsic
characteristics.
Now remember, meta-analysis depends on the
assumption that the studies are similar enough to be combined. Yet mixing the
different ETS studies is like mixing apples and oranges. You see, none of the
ETS studies contain real exposure information. All the "exposure"
data was derived from elderly women being prodded to remember their husbands'
smoking habits of decades earlier (like the diesel exhaust studies). Or they
came from the memories of other relatives.
None of this exposure data was ever
validated or verified for accuracy. The clincher, however, is that each ETS
study asked different types of study populations different questions about
different time frames. To combine these studies together is truly the
epidemiologic personification of the data processing acronym GIGO (garbage in,
garbage out).
But, in the
end, you've got to give credit where credit is due. EPA picked the right target
— and hit the bull's eye. The rest is risk assessment history. Maybe this is
really a lesson in picking a good target.
Chapter 7
Haven't got time
to do your own soup-to-nuts risk assessment? Then "instant" risk is
for you. No fuss, no muss and guaranteed results. The classic example of this
is risk assessment for ionizing radiation.
Everyone is exposed to ionizing radiation
every day. It's unavoidable and natural. The two main sources of ionizing
radiation are the earth and space. Soils and rock contain naturally occurring
radioactive elements that either give off radiation or emit radioactive
particles. Space is continually bombarding us with cosmic rays. You would not
consider either of these to be dangerous because they occur naturally. Even if
you lived the idyllic lifestyle in the Garden of Eden, you would still be
exposed to ionizing radiation from these sources.
Some human populations have had very, very,
very high exposures to ionizing radiation. Survivors of atomic bomb explosions.
Uranium miners. Women who, in the 1920s, painted watch dials and instrument
panels with radium paint and licked their brushes to get better points. Studies
have shown a generally accepted association between these very, very, very high
radiation exposures and cancer.
Notwithstanding what we know about high
levels of ionizing radiation, there is not a generally accepted association
between lower levels of ionizing radiation from manmade sources (like medical
X-rays) or environmental levels of ionizing radiation from naturally occurring
sources (like radon in the home).
Now ordinarily, you might conduct a
case-control epidemiologic study to try to identify such an association and
many folks have. But you don't need to. Just base your study on those of
the atomic bomb survivors, underground uranium miners and radium watch dial
painters, and you've got instant risk. How? Why?
Years ago, some genius came up with the
theory that if something (say radiation) can be harmful at very high exposure
levels, in the absence of knowledge to the contrary, it should be assumed it is
harmful at any exposure level. This theory is known in risk assessment circles
as the linear nonthreshold model.
Using a graph similar to that above, all you
need to do is measure or estimate the exposures to your population, find that
exposure level on the graph and follow it over to a risk level. What could be
easier? Just make believe that getting a medical X-ray is like surviving an
atomic bomb explosion. Or that playing ping-pong in your basement rec room is
like working in an underground uranium mine! Sounds silly, you say? Don't
worry; this is one of the most commonly accepted tenets in the public health
community.
You'll need to be prepared for real scientists
who might say the linear nonthreshold model flies in the face of everything we
know about risks from low levels of exposures. For example, studies of the
atomic bomb survivors report an increased incidence of cancer only at the very
highest exposures. Among those survivors with less than the highest levels of
exposures, a decreased incidence of cancer (as compared to the general
population) was observed.
Epidemiologic studies of workers show what
is called the "healthy worker effect." That means despite being
exposed to comparatively more "risks" on the job, workers are
typically healthier than nonworkers. Finally, vaccines (like those for polio,
measles, mumps, diphtheria and the like) intentionally expose humans to low
levels of toxins but keep individuals healthy, not sick.
But, as I said earlier, the linear
nonthreshold model is a public health mantra. It's not open to criticism.
A final word about the "instant
risk" technique. It can save you lots of headaches. Consider the following
story.
Not long ago, the National Cancer Institute
conducted a very large and well-designed study to look at risk factors for lung
cancer, including radon in the home. NCI's study failed to find an association
between radon in the home and lung cancer. But at the same time, the
Environmental Protection Agency was spending $20 million a year on its own
radon program. When NCI published its results, EPA got upset.
Study results threatening the existence of
the $20 million dollar radon program won't win friends or influence people in
the program. They immediately screamed, "Fix this or else!" To atone
for its sin, NCI repudiated its own epidemiologic study and published a new
study applying the linear nonthreshold model to the underground uranium miner
data. That produced an instantly acceptable risk assessment. And NCI and the
EPA radon program were on speaking terms again.
The moral of the story? If you go nonlinear,
you will be straightened out by your friends — or else!
You've calculated
your relative risk and you've made it statistically significant. Is that
enough? Can you just write up your results, get them published and start
filling out those federal grant applications?
You can, but you haven't yet maximized your
chances for success. There's one last thing to do and it's easy as pie. You
simply take the innocuous relative risk number and "morph" it into a
public health crisis.
You need to calculate a risk estimate for
some population, preferably a large population or, better yet, all 250 million
Americans. If you can figure the number of cancer cases or premature deaths
associated with your risk, you're sure to get instant national attention. But
how do you do this? Simple. Tell your statisticians you want to calculate an
attributable risk. They know how.
Attributable risk is intended to indicate
what percentage of deaths in a population are caused by a risk. For example,
saying that "16 percent of all deaths are due to being overweight" is
an attributable risk. You've attributed 16 percent of all deaths to obesity.
All you need to do then is figure out how many deaths there are annually (about
2.2 million in the U.S., according to 1991 statistics), then multiply the
number of annual deaths by the attributable risk (16 percent). Voila! A
public health crisis is born!
|
|
|
|
Obesity |
350,000 from all causes (Source: derived from 1995
Harvard University Study) |
|
Smoking |
390,000 from all
causes (Source: U.S. Surgeon General) |
|
Radon |
40,000 from lung
cancer (Source: U.S. EPA) |
|
Chlorinated tap
water |
10,000 from bladder
& rectal cancer (Source: Morris et al 1992) |
|
Environmental
tobacco smoke |
3,000 from lung
cancer (Source: U.S. EPA) |
Now your statisticians (if they are
competent and conscientious) should ask if you really want to calculate an
attributable risk. This query will be based on the following warning that appears
on the package of every statistical analysis program:
STATISTICIAN'S
WARNING: ATTRIBUTABLE RISK MAY NOT BE SCIENTIFICALLY JUSTIFIABLE. IT IS
CALCULATED FROM VERY UNCERTAIN STATISTICAL ASSOCIATIONS. THESE ASSOCIATIONS MAY
NOT REFLECT TRUE BIOLOGICAL CAUSE-AND-EFFECT. AT BEST, A STATISTICAL
ASSOCIATION IS A REPRESENTATION OF WHAT WAS OBSERVED IN A PARTICULAR POPULATION
STUDIED AND IS NOT APPLICABLE TO OTHER POPULATIONS NOT STUDIED.
You, of course, should ignore this warning.
source: http://www.junkscience.com/news/sws