Authors: Ian Ayres
The Super Crunching revolution is the rise of data-driven decision making. It's about letting your choices be guided by the statistical predictions of regressions and randomized trials. That's really what the EBM crowd wants. Most physicians (like just about every other decision maker we have and will encounter) still cling to the idea that diagnosis is an art where their expertise and intuition are paramount. But to a Super Cruncher, diagnosis is merely another species of prediction.
CHAPTER 5
Experts Versus Equations
The previous chapters have been awash with Super Crunching predictions. Marketing crunchers predict what products you will want to buy; randomized studies predict how you'll respond to a prescription drug (or a website or a government policy); eHarmony predicts who you'll want to marry.
So who's more accurate, Super Crunchers or traditional experts? It turns out this is a question that researchers have been asking for decades. The intuitivists and clinicians almost ubiquitously argue that the variables underlying their own decision making can't be quantified and reduced to a non-discretionary algorithm. Yet even if they're right, it is possible to test independently whether decision rules based on statistical prediction outperform the decisions of traditional experts who base their decisions on experience and intuition. In other words, Super Crunching can be used to adjudicate whether experts can in fact outpredict the equations generated by regressions or randomized experiments. We can step back and use Super Crunching to test its own strength.
This is just the thought that occurred to Ted Ruger, a law professor at the University of Pennsylvania, as he was sitting in a seminar back in 2001 listening to a technical Super Crunching article by two political scientists, Andrew Martin and Kevin Quinn. Martin and Quinn were presenting a paper claiming that, by using just a few variables concerning the politics of the case, they could predict how Supreme Court justices would vote.
Ted wasn't buying it. Ted doesn't look anything like your usual anemic academic. He has a strapping athletic build with a square chin and rugged good looks (think of a young Robert Redford with dark brown hair). As he sat in that seminar room, he didn't like the way these political scientists were describing their results. “They actually used the nomenclature of prediction,” he told me. “I am sitting in the audience as somewhat of a skeptic.” He didn't like the fact that all the paper had done was try to predict the past. “Like a lot of legal or political science research,” he said, “it was retrospective in nature.”
So after the seminar he went up to them with a suggestion. “In some sense, the genesis of this project was my talking to them afterwards and saying, well why don't we run the test forward?” And as they talked, they decided to run a horse race, to create “a friendly interdisciplinary competition” to compare the accuracy of two different ways to predict the outcome of Supreme Court cases. In one corner stood the Super Crunching predictions of the political scientists and in the other stood the opinions of eighty-three legal experts. Their assignment was to predict in advance the votes of the individual justices for every case that was argued in the Supreme Court's 2002 term. The experts were true legal luminaries, a mixture of law professors, practitioners, and pundits (collectively thirty-eight had clerked for a Supreme Court justice, thirty-three held chaired professorships, and five were current or former law school deans). While the Super Crunching algorithm made predictions for all the justices' votes in all the cases, the experts were called upon just to predict the votes for cases in their area of expertise.
Ted didn't think it was really a fair fight. The political scientists' model took into account only six factors: (1) the circuit court of origin;(2) the issue area of the case; (3) the type of petitioner (e.g., the United States, an employer, etc.); (4) the type of respondent; (5) the ideological direction (liberal or conservative) of the lower court ruling; and (6) whether the petitioner argued that a law or practice is unconstitutional. “My initial sense,” he said, “was that their model was too reductionist to capture the nuances of the decision making and thus legal experts could do better.” After all, detailed knowledge of the law and past precedent should count for something.
This simple test implicates some of the most basic questions of what law is. Justice Oliver Wendell Holmes created the idea of legal positivism by announcing, “The life of the law has not been logic; it has been experience.” For Holmes, the law was nothing more than “a prediction of what judges in fact will do.” Holmes rejected the view of Harvard's dean (and the champion of the Socratic method for legal education) Christopher Columbus Langdell that “law is a science, and that all the available materials of that science are contained in printed books.” Holmes felt that accurate prediction had a “good deal more to do” with “the felt necessities of the time, the prevalent moral and political theories, intuitions of public policy, avowed or unconscious, even the prejudices which judges share with their fellow-men.”
The dominant statistical model of political science is Holmesian in that it places almost exclusive emphasis on the judge's prejudices, his or her personal ideological views. Political scientists often assumed these political ideologies to be fixed and neatly arrayed along a single numeric spectrum from liberal to conservative. The decision trees produced by this kind of Super Crunching algorithm are anything but nuanced. Using historical data on 628 cases previously decided by these nine justices, Martin and Quinn first looked to see when the six factors predicted that the decision would be a unanimous affirmance or reversal. Then, they used the same historic cases to find the flowchart (a conditional combination of factors) that best predicted the votes of the individual justices in non-unanimous cases. For example, consider the following flowchart that was used to forecast Justice Sandra Day O'Connor's votes in the actual study:
SOURCE
: Andrew D. Martin et al., “Competing Approaches to Predicting Supreme Court Decision Making,” 2
Perspectives on Politics
763 (2004).
This predictive flowchart is incredibly crude. The first decision point predicts that O'Connor would vote to reverse whenever the lower court decision was coded as being “liberal.” Hence, in
Grutter v. Bollinger,
the 2002-term case challenging the constitutionality of Michigan Law School's affirmative action policy, the model erroneously forecasted that O'Connor would vote to reverse simply because the lower court's decision was liberal (in upholding the law school's affirmative action policy). With regard to “conservative” lower court decisions, the flowchart is slightly more complicated, conditioning the prediction on the circuit court origin, the type of respondent, and the subject area of the case. Still, this statistical prediction completely ignores the specific issues in the case and the past precedent of the Court. Surely legal experts with a wealth of knowledge about the specific issue could do better.
Notice in the statistical model that humans are still necessary to code the case. A kind of expertise is essential to say whether the lower court decision was “liberal” or “conservative.” The study shows how statistical prediction can be made compatible with and dependent upon subjective judgment. There is nothing that stops statistical decision rules from depending on subjective opinions of experts or clinicians. A rule can ask whether a nurse believes a patient looks “hinky.” Still, this is a very different kind of expertise. Instead of calling on the expert to make an ultimate prediction, the expert is asked to opine on the existence or absence of a particular feature. The human expert might have some say in the matter, but the Super Crunching equation limits and channels this discretion.
Ted's simple idea of “running the test forward” set the stage for a dramatic test that many insiders watched with interest as it played out during the course of the Court's term. Both the computer and the experts' predictions were posted publicly on a website before the decision was announced, so people could see the results come as opinion after opinion was handed down.
The experts lost. For every argued case during the 2002 term, the model predicted 75 percent of the Court's affirm/reverse results correctly, while the legal experts collectively got only 59.1 percent right. Super Crunching was particularly effective at predicting the crucial swing votes of Justices O'Connor and Kennedy. The model predicted Justice O'Connor's vote correctly 70 percent of the time while the experts' success rate was only 61 percent.
How can it be that an incredibly stripped-down statistical model outpredicted not just lawyers, but experts in the field who had access to detailed information about the cases? Is this result just some statistical anomaly? Does it have something to do with idiosyncrasies or the arrogance of the legal profession? These are the central questions of this chapter. The short answer is that Ted's test is representative of a much wider phenomenon. For decades, social scientists have been comparing the predictive accuracies of Super Crunchers and traditional experts. In study after study, there is a strong tendency for the Super Crunchers to come out on top.
Meehl's “Disturbing Little Book”
Way back in 1954, Paul Meehl wrote a book called
Clinical Versus Statistical Prediction.
This slim volume created a storm of controversy among psychologists because it reported the results of about twenty other empirical studies that compared how well “clinical” experts could predict relative to simple statistical models. The studies concerned a diverse set of predictions, such as how patients with schizophrenia would respond to electroshock therapy or how prisoners would respond to parole. Meehl's startling finding was that none of the studies suggested that experts could outpredict statistical equations.
Paul Meehl was the perfect character to start this debate. He was a towering figure in psychology who eventually became president of the American Psychological Association. He's famous for helping to develop the MMPI (the Minnesota Multiphasic Personality Inventory), which to this day is one of the most frequently used personality tests in mental health. What really qualified Meehl to lead the man-versus-machine debate was that he cared passionately about both sides. Meehl was an experimental psychologist who thought there was value to clinical thinking. He was driven to write his book by the personal conflict between his subjective certainty that clinical experience conferred expertise, and “the disappointing findings on the reliability and validity of diagnostic judgments and prognostications by purported experts.”
Because of his book's findings, some people inferred that he was an inveterate number cruncher. In his autobiography, Meehl tells about a party after a seminar where a group of experimental psychologists privately toasted him for giving “the clinicians a good beating.” Yet they were shocked to learn that he valued psychoanalysis and even had a painting of Freud in his office. Meehl believed the studies that showed that statisticians could make better predictions about many issues, but he also pointed to the interpretation of dreams in psychoanalysis as a “striking example of an inferential process difficult to actuarialize and objectify.” Meehl writes:
I had not then completed a full-scale analysis but I [told them I] had some 85 couch hours with a Vienna-trained analyst, and my own therapeutic mode was strongly psychodynamicâ¦. The glowing warmth of the gathering cooled noticeably. A well-known experimental psychologist became suddenly hostile. He glared at me and said, “Now, come on, Meehl, how could anybody like you, with your scientific training at Minnesota, running rats and knowing math, and giving a bang-up talk like you just gave, how could
you
think there is anything to that Freudian dream shit?”
Meehl continued to inhabit this schizophrenic space for fifty more years. His initial studyâwhich he playfully referred to as “my disturbing little book”âwas only an opening salvo in what would become not only a personal lifelong passion, but also a veritable industry of man-versus-machine studies by others.
Researchers have now completed dozens upon dozens of studies comparing the success rates of statistical and expert approaches to decision making. The studies have analyzed the relative ability of Super Crunchers and experts in predicting everything from marital satisfaction and academic success to business failures and lie detection. As long as you have a large enough dataset, almost any decision can be crunched. Studies have suggested that number crunchers using statistical databases can even outperform humans in guessing someone's sexual orientation or in constructing a satisfying crossword puzzle.