Â
“And will you see it?” is a refrain running all through Walter of Henley's works. Each problem was approached through experiment and scrupulous accounting. Plant two fields together; watch through the year; keep track of the overall costs “and you shall find that I say truth.”
Francis Bacon continued this same experimental tradition, testing to see how well seeds germinated in separate, revolting concoctions; the seeds soaked in urine showed a marked advantage over the untreated, or those soaked in wine. Nowadays, we could make an educated assumption about the roles of urea and nitrogen; but Bacon's experiment made it possible to decide what to do even
without
the fundamental knowledge.
Eighteenth-century agronomy was approaching ever closer to what we would now call the scientific method. Arthur Young's
Course of Experimental Agriculture
appeared in 1770: not only did he insist on split-field trials of any new technique or treatment, but he said that those trials should be repeated in several different fields to exclude the effects of variation in soil fertility or drainage. He measured value down to the farthing and tested it by real sales on the same day in the same market. Most of all, he deplored hypothesis: “adopting a favorite notion, and forming experiments with an eye to confirm it.” Each step toward discipline made agronomy more scientific: a modern researcher would find very little recognizable in a medical laboratory of 1820, but he could walk onto an experimental farm of the same date and feel entirely at home.
Â
Whenever a collective experiment is being planned; whenever researchers are collating and preparing data; whenever government agencies, pharmaceutical companies, or hospital authorities decide a result is “statistically significant”âthere stands the blinking, bearded, pipe-smoking spirit of Ronald Aylmer Fisher.
Fisher combined great abilities with great hatreds, collegial warmth with an ungovernable temper, broad interests with painstaking precision. He had such weak eyesight that his schoolmasters arranged for him to do as little reading and writing as possible: he learned mathematics not from blackboards and textbooks, but from conversation and the development of a precise imagination. This gave him an uncanny talent for inner visualization: the shape of a scatter of points in eight dimensions was as intuitively clear to him as if they had been in two. He studied mathematics at Cambridge, but always with an eye to its applications in astronomy, biology, and genetics.
Academic feuds, like the wars of hill tribes, are as tiresome as they are endless, except when they spur some unexpected creation: an epic poem, a theory. In 1917, Karl Pearson published, without prior warning, a paper criticizing Fisher's emerging ideas on likelihood, claiming that they were essentially the same as Laplace's inverse probability. The prickly Fisher felt snubbed and deliberately misunderstood. When, two years later, Pearson offered Fisher a position as his assistant, he spurned it and went off to be statistician at Rothamsted Agricultural Experimental Station. The student of genetic variation, heir to the statistical tradition of Galton, had come back to the land.
He found a spread of rolling, well-tended fields; at their center, a cluster of sturdy brick buildings; and within one of them, a room filled with leather-bound data: ninety years of daily rainfall, temperature, soil conditions, fertilizer application, and crop yields. The proprietors of Rothamsted, a family enriched by the invention of artificial guano, had understood the importance of raw numbers in the study of variation.
Fisher plunged into this multidimensional world, where every factor was at once separate and correlated, and set it running in that mental theater where eight dimensions seemed like two. He learned how to strip out variables one after the other: cycles of weather, exhaustion of land, regression to mean, annual rainfallâfiltering out the noise so that only the signal of interest remained. He was even able to isolate what had been until then an inexplicable phenomenon and determine its cause: there had been a deterioration of yield beginning in 1876, accelerating in 1880, suddenly improving in 1901, dropping off thereafter. Why? Because the Education Acts of 1876 and 1880 made attendance at school compulsory, so the little boys who had previously earned their pocket money by weeding disappeared; then, in 1901, the vigorous master of a local girls' school thought weeding would be a healthy outdoor activity for his chargesâbut they soon disagreed.
His progress in the analysis of real data prompted Fisher to look at how the experiments themselves were set up. There had been, since the days of Young, constant debate about the layout of field tests. Let's say you want to test your superphosphate fertilizer (a polite term for bird droppings) against a control. You might think that setting out your field in alternate strips, A|B|A|B, would be a reasonable proposalâand if your
B
strips outgrew your
A
strips you could confidently recommend superphosphate to your friends. But what if there was a natural gradient in fertility across the whole field from right to left? Each of your
B
strips would be just that bit more fertile than its neighboring
A
; if you added up the total yields, you might be seeing an effect that wasn't there. The recognized solution to this problem was the “Latin square,” an ingenious anagram that allowed many small areas of different treatment to be grown evenly across a field in such a way that no two adjacent plots received the same treatment, thus:
In Fisher's view, however, any repeated system, no matter how balanced and ingenious, introduced an element of bias that would make it difficult to separate out natural variation from the final dataâand without the natural variation, there would be nothing with which to compare the effects of treatment. What, on the other hand, was the only confounder that was easily washed out of results? Errorâthanks to the error curve. And how could you make sure that the extra variation expressed error and nothing else? Randomize. Fisher suggestedâand science took upâthe rule that the only way to assure an unbiased distribution of treatments to subjects is to flip a coin or roll a die.
Fisher's rule was tested in a remarkable nonexperiment by the New Zealander A. W. Hudson, who planted potatoes and then applied six entirely imaginary “treatments” in random or systematic patterns. Although nothing had been done to the potatoes, there was less variation in those “treated” systematically than in the random. Even when the only variation is natural, a regular system of observation can introduce a spurious appearance of order. Fisher was vindicated.
In his new kingdom of small, randomized plots, Fisher saw the opportunity to conduct several experiments at the same time: to investigate variation from different directions. If you study superphosphate as against no treatment, you have only one comparison; but if you study superphosphate, urea, superphosphate plus urea, and no treatment, you have two ways of looking at each treatment: compared with no treatment and compared with a combination, from which you can subtract the effect of the other treatment. This “analysis of variance” was one of Fisher's great gifts to science: he provided the mathematics to design experiments that could answer several inquiries simultaneously. Similar techniques allowed the experimenter to isolate and adjust for the unavoidable natural variations in subjects, like age, sex, or weight in a clinical trial.
Small blocks are small samplesâand Fisher could not have laid the foundations for the modern scientific method had it not been for the prior work of someone professionally tied to small samples: W. J. Gossett, who wrote under the characteristically modest pseudonym “Student.” Gossett worked for the Guinness brewery, an enterprise critically dependent on knowing the qualities of barley from each field it had contracted for. Guinness' agent might wander through, pulling an ear here and there, but there had to be some way of knowing how this small sample translated into the quality of the whole crop.
The mighty Karl Pearson had never bothered with small samples: he had his factory, cranking out thousands of observations and bending the great curves to fit them. He saw no real distinction between sample and population. Student, though, worked aloneâand if he tested the whole of every field, all the pubs in Poolbeg Street would run dry. So he developed “Student's
t
-test,” describing how far a sample could be expected to differ in mean value and spread from the whole populationâusing only the number of observations in the sample. This tool was all Fisher needed. Starting with Student's
t
-test, adding randomization of the initial setup, and subjecting his results to analysis of variance produced what science had long been waiting for: a method for understanding the conjoined effects of multiple causes, gauging not just if something produced an effect, but how large that effect was. All modern applied sciences, from physics to psychology, use terms like “populations” and “variance” because they learned their statistics from Fisher, a geneticist.
Â
Fisher was a tough man, and he presumed toughness in the researcher. His method requires that we start with the
null hypothesis,
that the observed difference is due only to chance: in the policeman's habitual phrase, “there's nothing to see here.” We then choose a measure, a statistic, and determine how
it
would be distributed if the null hypothesis were true. We define what might be an interesting value for this statistic (what we can call an “effect”) and we determine the probability of seeing this value, or one more extreme,
if the null hypothesis were true
. This probability, called a
p
-number, is the measure of statistical significance. So if, say, Doc Waughoo's Seminole Fever Juice reduces patients' fever by five degrees when variation in temperature without treatment is three, the probability that this effect has appeared by chance would be below a small
p
-value, suggesting there's more at work here than just alcohol and red food coloring.
For Fisher, reaching the measure of significance was the end of the line of inference. The number told you “either the null hypothesis is false and there is a real cause to this effectâor a measurably unusual coincidence has occurred.” That's all: either this man is dead or my watch has stopped.
Away from the pure atmosphere of Rothamsted, however, Fisher's methods revealed the problems of scaling up from barley to people. From the beginning, the two great issues were sampling and ethicsâtwo problems that became one in the Lanarkshire milk experiment of 1930. Lanark, you will remember, had been the county with the puniest chest measurements, and things were not much better now. A few well-nourished farm children shared their schoolrooms with the underfed sons and daughters of coal miners and unemployed wool spinners.
Twenty thousand children took part in the experiment: 5,000 were to receive three-quarters of a pint of raw milk every day; 5,000 an equivalent amount of pasteurized milk; 10,000 no milk at all. The choice of treatment was randomâbut teachers could make substitutions if it looked as though too many well- or ill-nourished children were included in any one group. Evidently, the teachers did exactly what you or I would do, since the final results showed that the “no milk” children averaged three months
superior
in weight and four months in height to the “milk” children. So either milk actually stunts your growth or a measurable coincidence has occurred,
or
. . .
A more successful test was the use of streptomycin against tuberculosis, conducted by Austin Bradford Hill in 1948. This was a completely randomized trial: patients of each sex were allocated “bed rest plus streptomycin” or “bed rest alone” on the basis only of chance. Neither the patients, their doctors, nor the coordinator of the experiment knew who had been put in which group. The results were impressive: only 4 of the 55 treated patients died, compared with 14 of the 52 untreated. This vindicated not just streptomycin but the method of trial.
Why did Hill stick to the rules when the Lanarkshire teachers bent them? Some say it was because there were very limited supplies of streptomycinâby no means enough to give all patients, so why not turn necessity into scientific opportunity? Others might feel that the experience of war had made death in the name of a greater good more bearable as an idea. One could also say that the new drugs offered a challenge that medicine had to accept. A jug of milk may be a help to the wretched, but the prospect of knocking out the great infections, those dreadful harvesters with so many lives already in their sacksâwell, that made the few more who died as controls unwitting heroes of our time.
Hill is famous for his later demonstration of the relation between smoking and lung cancer. Fisher never accepted these resultsâand not simply because he enjoyed his pipe. He didn't believe that the correlation shown proved causation, and he didn't like the use of available rather than random samples. Fisher always demanded rigorâbut science wanted to use his techniques, not be bound by his strictures.