Authors: Dan Fagin
He was right to be pessimistic. It took Finette’s research team five months to count HPRT mutations in white blood cells from the forty-nine Toms River case siblings and the forty-three control children from out of town. In early 2001, he reported his results to the lawyers: There was no appreciable difference in mutation frequency between the two groups.
18
The experiment, Finette thought, had been crippled by the cascade of assumptions built into it, especially the long gap between the years of peak pollution and the collection of the blood samples in 2000. If there had ever been a spike in mutation frequency in the Toms River children—and Finette still believed there had been—the study was too tardy to detect it. As usual, science had arrived too late to make a difference in Toms River.
Finette had other ideas for Toms River research that he would pursue for years to come, returning over and over to the plastic tubules in his lab’s freezer, like a pilgrim in search of enlightenment. He would go beyond merely counting mutations and instead look to see if specific genetic changes were present in cells of local children but not out-of-town controls. His ideas would take many years to test—far too long to influence the outcome of the legal case—and there was no good reason to think they would ultimately bear fruit. Molecular epidemiology, in its simplest form, was premised on the idea that diseases could be predictably associated with single, specific genetic variations, but most cancers did not play by those rules. Still, the families and their lawyers wanted Finette to keep going. They were interested in more than just leverage in the upcoming settlement negotiations. If there was any chance to someday learn something new about the cause of the cluster, they wanted to pursue it. So Finette and a dwindling team of assistants would keep working, quietly, as the Toms River drama built to a climax. And then they would keep working for years afterward, searching for faint clues in a dark sea of genetic data.
The families had put their faith in three kinds of case-control studies,
and now it was clear that two of them—Finette’s blood work and the National Toxicology Program’s rat study of SAN trimer—would take many more years to complete and were so weighed down by scientific complexity that they were unlikely to end in a meaningful result. Realistically, there was just one hope left.
At first, Jerry Fagliano could not tell whether there was a discernible message buried within the stacks of computer printouts crowding his office. It had taken five years to collect all of the information he had sought about children and chemical contamination in Toms River. By the beginning of 2001, he had everything he needed to see whether kids diagnosed with cancer were truly more likely to have been exposed to pollution than healthy children. But there were so many ways to cut the data that it was hard to avoid getting lost in the blizzard of numbers. Prenatal exposure or postnatal? Parkway well water or Holly Street? Chemical plant or nuclear plant? Interview study or birth record study? Infants or schoolchildren? Boys or girls? Which years? Which cancers?
With so many potential associations to analyze and so few cancer cases (just sixty-three, between the two studies), the data was unlikely to sort itself neatly. Instead, it was likely that some apparent links between exposure and illness would turn up for no reason other than chance, while others would stay forever hidden. This was a perpetual danger of small-number epidemiology: Even an association that passed a one-in-twenty test of statistical significance might still be a fluke. Fagliano knew that finding one or two isolated links between
exposure and disease would not be scientifically convincing; there would need to be a
pattern
of associations, a pattern consistent with a prior hypothesis about what might have caused the cluster. On the other hand, associations between exposure and disease that did
not
pass a statistical significance test could not necessarily be excluded as suspects. The numbers were too small to be definitive about anything.
Still, there
was
something that caught Fagliano’s attention almost immediately. A surprisingly high number of women who had been heavy consumers of Parkway well water while pregnant had children who developed cancer. No matter how he analyzed the data, children who were exposed to Parkway water after their birth did not face a large extra risk, but those who were exposed prenatally did. Similarly, mothers of case children were more likely to have drunk Parkway water than mothers of healthy control children. This observation was consistent with what Fagliano already knew from the interview results: There was a dose-response relationship between how much tap water women remembered drinking during their pregnancies and their risk of having a child with cancer. In other words, more glasses correlated with higher risk. That interview data was probably influenced by recall bias; mothers who had undergone the trauma of having an afflicted child were more likely to remember drinking a lot of water during pregnancy. But now the much more objective water dispersion computer model, which did not rely on anyone’s fuzzy or wishful memories, was backing those subjective interview results.
At a staff meeting in Trenton early 2001, Fagliano and his collaborators at the health department looked at all of the water data collectively for the first time and were startled by the apparent clarity of the results. Instead of the inconsistent findings they expected from a study of such a small population, the Parkway results seemed to tell a coherent story. “That was a ‘wow’ moment,” Fagliano remembered. “When we looked at the wells that were not contaminated, we didn’t see any differences between cases and controls, but for the Parkway wells a strong association was there. It was pretty dramatic.”
What was especially impressive about the Parkway well data was that there was an internal logic to the results. Fagliano had hypothesized that young children of women who had drunk a lot of Parkway
water while pregnant would be at greatest risk, and now he found that the more tightly he focused his analysis on those children, the greater the calculated risk, as expressed by a standard statistical measure called an adjusted odds ratio.
1
For instance, the cancer odds ratio for children in the interview study whose mothers, while pregnant, drank water that was mostly from the Parkway wells was 1.68, which meant that the odds that a child with cancer had been highly exposed prenatally to Parkway water were 68 percent greater than the odds for a healthy child of the same age and sex.
2
That was not a huge amount of extra risk, but what caught Fagliano’s attention was that the odds ratios kept rising as he zeroed in on those subgroups that logically would be at greater risk if prenatal exposure to Parkway water really were triggering cancer.
For example, when Fagliano looked only at prenatally exposed children who were diagnosed with cancer before age five, the odds ratio jumped to 2.51. That made sense, since cancers in older children were less likely to have been caused by exposures during pregnancy. The odds ratio jumped again to 3.01 when he added two other filters by taking into account how much water each woman remembered drinking during pregnancy and by counting only Parkway water consumed after 1981 in the “exposed” category (1982 was the year he assumed Union Carbide waste first reached the Parkway wells). This meant that the odds that young Toms River children with cancer had been prenatally exposed to post-1981 Parkway water were three times higher than the odds that healthy local children of the same age and sex had been. Finally, when he cut the data even finer by zeroing in on types of cancer known to have environmental triggers and also by separating the sexes, the associations grew still stronger: For girls diagnosed with leukemia or nervous system cancers before age five, the odds ratio was 4.60. For boys, it was 1.64. (There was no apparent explanation for the difference in genders.)
Other permutations of the data yielded even higher odds ratios. For example, when Fagliano changed his assumption of when the Parkway pollution began from 1982 to 1984 and looked at prenatally exposed girls who were diagnosed with leukemia before age twenty,
the odds ratio soared to 14.70. In other words, for the thirteen girls with leukemia born in Toms River between 1984 and 1996, the odds were almost
fifteen times
higher that they had been highly exposed prenatally to Parkway water, compared to the odds for healthy Toms River girls of the same ages.
3
Those were scary numbers, and they posed a dilemma very similar to the one Fagliano and Michael Berry had faced back in 1995 when Berry first identified the cluster in Toms River. The dilemma was this: Risk numbers were high, but so was the uncertainty. As in Berry’s 1995 analysis, each adjusted odds ratio in Fagliano’s studies came with a 95 percent confidence interval; the wider the interval, the more unreliable the result. Almost all of Fagliano’s confidence intervals were exasperatingly wide, just as Berry’s were. The problem was well illustrated by one of Fagliano’s most striking findings, concerning children in the interview study who were exposed prenatally to Parkway water between 1984 and 1996 and diagnosed with leukemia or nervous system cancer before age five. The results table looked like this:
According to the table, the odds that Toms River children with leukemia and nervous system cancers had been highly exposed prenatally to Parkway water were three times greater than for similar but cancer-free children. But because there were so few highly exposed children in the analysis—just six case children and eleven controls—random variability could be heavily influencing the results. The only thing Fagliano could confidently conclude is that if he re-conducted the study twenty times, nineteen of those times the odds ratio would be somewhere between 0.78 and 11.60. That confidence interval not only was very wide, it also dipped below 1.00, which meant that there was a small but noteworthy chance that mothers who drank a lot of Parkway water while pregnant might actually be
reducing
their odds of having a child with cancer. It was another unhelpful snowy-or-sunny forecast, just as in Berry’s 1995 cluster study. In fact, because the interval dipped below 1.00, it did not meet the traditional definition of statistical significance, despite the high odds ratio.
Almost all of the Parkway analyses that had so intrigued Fagliano and his colleagues shared this same problem: They dipped below 1.00 on their lower boundary. There were only a few exceptions, most notably when they looked specifically at leukemia risk in prenatally exposed girls in the interview study who were diagnosed before age twenty, getting these results:
Even here, with a very high odds ratio and a confidence interval that was entirely above 1.00 (with a sky-high upper bound of 31.70) there was a great deal of uncertainty, as indicated by the wide confidence interval. There were only eight highly exposed girls in this analysis, which meant that adding or subtracting just one case would drastically alter the results. Furthermore, the results of Fagliano’s other study, the birth record study, did not yield odds ratios for Parkway exposure as high as the interview study did. Nor did the birth record study results form a distinct pattern of rising odds ratios as Fagliano honed the data for the hypothesized greatest risks.