Knocking on Heaven's Door (32 page)

Nonetheless, Nate told me that forecasters do make one other type of prediction. Many of them do metaforecasting—predicting what people will try to predict.

CHAPTER TWELVE

MEASUREMENT AND UNCERTAINTY

Familiarity and comfort with statistics and probability help when evaluating scientific measurements, not to mention many of the difficult issues of today’s complex world. I was reminded of the virtue of probabilistic reasoning when, a few years back, a friend was frustrated by my “I don’t know” response to his question about whether or not I planned to attend an event the following evening. Fortunately for me, he was a gambler and mathematically inclined. So instead of exasperatingly insisting on a definite reply, he asked me to tell him the odds. To my surprise, I found that question a lot simpler to deal with. Even though the probability estimate I gave him was only a rough guess, it more closely reflected my competing considerations and uncertainties than a definite yes or no reply would have done. In the end, it felt like a more honest response.

Since then I’ve tried this probabilistic approach out on friends and colleagues when they didn’t think they could reply to a question. I’ve found that most people—scientists or not—have strong but not irrevocable opinions that they frequently feel more comfortable expressing probabilistically. Someone might not know if he wants to go to the baseball game on the Thursday three weeks from now. But if he knows that he likes baseball and doesn’t think he has any work trips coming up, yet hesitates because it’s during the week, he might agree he is 80 percent likely to, even if he can’t give a definite yes. Although just an estimate, this probability—even one he makes up on the spot—more accurately reflects his true expectation.

In our conversation about science and how scientists operate, the screenwriter and director Mark Vicente observed how he was struck by the way that scientists hesitate to make definite unqualified statements of the sort most other people do. Scientists aren’t necessarily always the most articulate, but they aim to state precisely what they do and don’t know or understand, at least when speaking about their field of expertise. So they rarely just say yes or no, since such an answer doesn’t accurately reflect the full range of possibilities. Instead, they speak in terms of probabilities or qualified statements. Ironically, this difference in language frequently leads people to misinterpret or underplay scientists’ claims. Despite the improved precision that scientists aim for, nonexperts don’t necessarily know how to weigh their statements—since anyone other than a scientist with as much evidence in support of their thesis wouldn’t hesitate to say something more definite. But scientists’ lack of 100 percent certainty doesn’t reflect an absence of knowledge. It’s simply a consequence of the uncertainties intrinsic to any measurement—a topic we’ll now explore. Probabilistic thinking helps clarify the meaning of data and facts, and allows for better-informed decisions. In this chapter, we’ll reflect on what measurements tell us and explore why probabilistic statements more accurately reflect the state of knowledge—scientific or otherwise—at any given time.

SCIENTIFIC UNCERTAINTY

Harvard recently completed a curricular review to try and determine the essential elements of a liberal education. One of the categories the faculty considered and discussed as part of a science requirement was “empirical reasoning.” The teaching proposal suggested the university’s purpose should be to “teach how to gather and assess empirical data, weigh evidence, understand estimates of probabilities, draw inferences from the data when available [so far, so good], and also to recognize when an issue cannot be settled on the basis of the available evidence.”

The proposed wording of the teaching requirement—later clarified—was well intentioned, but it belied a fundamental misunderstanding of how measurements work. Science generally settles issues with some degree of probability. Of course we can achieve high confidence in any particular idea or observation and use science to make sound judgments. But only infrequently can anyone absolutely settle an issue—scientific or otherwise—on the basis of evidence. We can collect enough data to trust causal relationships and even to make incredibly precise predictions, but we can generally do it only probabilistically. As Chapter 1 discussed, uncertainty—however small—allows for the potential existence of interesting new phenomena that remain to be discovered. Rarely is anything 100 percent certain, and no theory or hypotheses will be guaranteed to apply under conditions where tests have not yet been performed.

Phenomena can only ever be demonstrated with a certain degree of precision in a set domain of validity where they can be tested. Measurements always have some probabilistic component. Many science measurements rely on the assumption that an underlying reality exists that we can uncover with sufficiently precise and accurate measurements. We use measurements to find this underlying reality as well as we can (or as well as necessary for our purposes). This then permits statements such as that an interval centered on a collection of measurements contains the true value with 95 percent probability. In that case, we might colloquially say we are confident with 95 percent confidence. Such probabilities tell us the reliability of any particular measurement and the full range of possibilities and implications. You can’t fully understand a measurement without knowing and evaluating its associated uncertainties.

One source of uncertainty is the absence of infinitely precise measuring instruments. Such a precise measurement would require a device calibrated with an infinite number of decimal places. The measured value would have an infinite number of carefully measured numbers after the decimal place. Experimenters can‘t make such measurements—they can only calibrate their tools to make them as accurate as possible with available technology, just as the astronomer Tycho Brahe did so expertly more than four centuries ago. Increasingly advanced technology results in increasingly precise measuring devices. Even so, measurements will never achieve infinite accuracy, despite the many advances that have occurred over time. Some
systematic uncertainty,
49
characteristic of the measuring device itself, will always remain.

Uncertainty doesn’t mean that scientists treat all options or statements equally (though news reports frequently make this mistake). Only rarely are probabilities 50 percent. But they do mean that scientists (or anyone aiming for complete accuracy) will make statements that tell what has been measured and what it implies in a probabilistic way, even when those probabilities are very high.

When scientists and wordsmiths are extremely careful, they use the words
precision
and
accuracy
differently. An apparatus is
precise
if, when you repeat a measurement of a single quantity, the values you record won’t differ from each other very much. Precision is a measure of the degree of variability. If the result of repeating a measurement doesn’t vary a lot, the measurements are precise. Because more precisely measured values span a smaller range, the average value will more rapidly converge if you make repeated measurements.

Accuracy,
on the other hand, tells you how close your average measurement is to the correct result. In other words, it tells whether there is
bias
in a measuring apparatus. Technically speaking, an intrinsic error in your measuring apparatus doesn’t reduce its precision—you would make the same mistake every time—though it would certainly reduce your accuracy.
Systematic uncertainty
refers to the unbeatable lack of accuracy that is intrinsic to the measuring devices themselves.

Nonetheless, in many situations, even if you could construct a perfect measuring instrument, you would still need to make many measurements to get a correct result. That is because the other source of uncertainty
50
is
statistical,
which means that measurements usually need to be repeated many times before you can trust the result. Even an accurate apparatus won’t necessarily give the right value for any particular measurement. But the average will converge to the right answer. Systematic uncertainties control the accuracy of a measurement while statistical uncertainty affects its precision. Good scientific studies take both into account, and measurements are done as carefully as possible on as large a sample as is feasible. Ideally, you want your measurements to be both accurate and precise so that the expected absolute error is small and you trust the values you find. This means you want them to be within as narrow a range as possible (precision) and you want them to converge to the correct number (accuracy).

One familiar (and important) example where we can consider these notions is tests of drug efficacy. Doctors often won’t say or perhaps they don’t know the relevant statistics. Have you ever been frustrated by being told, “Sometimes this medicine works; sometimes it doesn’t”? Quite a bit of useful information is suppressed in this statement, which gives no idea of how often the drug works or how similar the population they tested it on is to you. This makes it very difficult to decide what to do. A more useful statement would tell us the fraction of times a drug or procedure has worked on a patient with similar age and fitness level. Even in the cases when the doctors themselves don’t understand statistics, they can almost certainly provide some data or information.

In fairness, the
heterogeneity
of the population, with different individuals responding to drugs in different ways, makes determining how a medicine will work a complicated question. So let’s first consider a simpler case in which we can test on a single individual. Let’s use as an example the procedure for testing whether or not aspirin helps relieve your headache.

The way to figure this out seems pretty easy: take an aspirin and see if it works. But it’s a little more complicated than that. Even if you get better, how do you know it was the aspirin that helped? To ascertain whether or not it really worked—that is, whether your headache was less painful or went away faster than without the drug—you would have to be able to compare how you feel with and without the drug. However, since you either took aspirin or you didn’t, a single measurement isn’t enough to tell you the answer you want.

The way to tell is to do the test many times. Each time you have a headache, flip a coin to decide whether to take an aspirin or not and record the result. After you do this enough, you can average out over all the different types of headaches you had and the varying circumstances in which you had them (maybe they go away faster when you’re not so sleepy) and use your statistics to find the right result. Presumably there is no bias in your measurement since you flipped a coin to decide and the population sample you used was just yourself so your result will correctly converge with enough self-imposed tests.

It would be nice to always be able to learn whether drugs worked with such a simple procedure. However, most drugs are treating more serious illnesses than headaches—perhaps even ones that lead to death. And many drugs have long-term effects, so you couldn’t do repeated short-term trials on a single individual even if you wanted to.

So usually when biologists or doctors test how well a drug works, they don’t simply study a single individual, even though for scientific purposes at least they would prefer to do so. They then have to contend with the fact that people respond differently to the same drug. Any medicine produces a range of results, even when tested on a population with the same degree of severity of a disease. So the best scientists can do in most cases is to design studies for a population as similar as possible to any given individual they are deciding whether or not to give the drug to. In reality, however, most doctors don’t design the studies themselves, so similarity to their patient is hard for them to guarantee.

Doctors might want instead to try to use pre-existing studies where no one did a carefully designed trial but the results were based simply on observations of existing populations, such as the members of an HMO. They would then face the challenge of making the correct interpretation. With such studies, it can be difficult to ensure that the relevant measurement establishes causality and not just association or correlation. For example, someone might mistakenly conclude that yellow fingers cause lung cancer because they noticed many lung cancer patients have yellow fingers.

That’s why scientists prefer studies in which treatments or exposures are randomly assigned. For example, a study in which people take a drug based on a coin toss will be less dependent on the population sample since whether or not any patient receives treatment depends only on the random outcome of a coin flip. Similarly, a randomized study could in principle teach about the relationships among smoking, lung cancer, and yellow fingers. If you were to randomly assign members of a group to either smoke or refrain from smoking, you would determine that smoking was at least one underlying factor responsible for both yellow fingers and lung cancer in the patients you observed, whether or not one was the cause of the other. Of course, this particular study would be unethical.

Whenever possible, scientists aim to simplify their systems as much as possible so as to isolate the specific phenomena they want to study. The choice of a well-defined population sample and an appropriate control group are essential to both the precision and accuracy of the result. With something as complicated as the effect of a drug on human biology, many factors enter simultaneously. The relevant question is then how reliable do the results need to be?

THE OBJECTIVE OF MEASUREMENTS

Measurements are never perfect. With scientific research—as with any decision—we have to determine an acceptable level of uncertainty. This allows us to move forward. For example, if you are taking a drug you hope will mitigate your nagging headache, you might be satisfied to try it even if it significantly helps the general population only 75 percent of the time (as long as the side effects are minimal). On the other hand, if a change in diet will reduce your already low likelihood of heart disease by a mere two percent of your existing risk, decreasing it from five percent to 4.9 percent, for example, that might not worry you enough to convince you to forgo your favorite Boston cream pie.

Other books

The Chisholms by Evan Hunter
Beach Side Beds and Sandy Paths by Becca Ann, Tessa Marie
Savage Instinct by Anwar, Celeste
We Joined The Navy by John Winton
Snake Typhoon! by Billie Jones
A Deadly Love by Jannine Gallant
The House of the Whispering Pines by Anna Katherine Green
Rorey's Secret by Leisha Kelly
The Apartment by Debbie Macomber