Read Superintelligence: Paths, Dangers, Strategies Online
Authors: Nick Bostrom
Tags: #Science, #Philosophy, #Non-Fiction
A more fundamental issue with MR is that even if can be implemented, it might not give us what we want or what we would choose if we were brighter and better informed. This is of course the essential feature of MR, not an accidental bug. However, it might be a feature that would be extremely harmful to us.
26
One might try to preserve the basic idea of the MR model while reducing its demandingness by focusing on
moral permissibility
: the idea being that we
could let the AI pursue humanity’s CEV so long as it did not act in ways that are morally impermissible. For example, one might formulate the following goal for the AI:
Among the actions that are morally permissible for the AI, take one that humanity’s CEV would prefer. However, if some part of this instruction has no well-specified meaning, or if we are radically confused about its meaning, or if moral realism is false, or if we acted morally impermissibly in creating an AI with this goal, then undergo a controlled shutdown.
27
Follow the intended meaning of this instruction.
One might still worry that this moral permissibility model (MP) represents an unpalatably high degree of respect for the requirements of morality. How big a sacrifice it would entail depends on which ethical theory is true.
28
If ethics is
satisficing
, in the sense that it counts as morally permissible any action that conforms to a few basic moral constraints, then MP may leave ample room for our coherent extrapolated volition to influence the AI’s actions. However, if ethics is
maximizing
—for example, if the only morally permissible actions are those that have the morally best consequences—then MP may leave little or no room for our own preferences to shape the outcome.
To illustrate this concern, let us return for a moment to the example of hedonistic consequentialism. Suppose that this ethical theory is true, and that the AI knows it to be so. For present purposes, we can define hedonistic consequentialism as the claim that an action is morally right (and morally permissible) if and only if, among all feasible actions, no other action would produce a greater balance of pleasure over suffering. The AI, following MP, might maximize the surfeit of pleasure by converting the accessible universe into hedonium, a process that may involve building computronium and using it to perform computations that instantiate pleasurable experiences. Since simulating any existing human brain is not the most efficient way of producing pleasure, a likely consequence is that we all die.
By enacting either the MR or the MP proposal, we would thus risk sacrificing our lives for a greater good. This would be a bigger sacrifice than one might think, because what we stand to lose is not merely the chance to live out a normal human life but the opportunity to enjoy the far longer and richer lives that a friendly superintelligence could bestow.
The sacrifice looks even less appealing when we reflect that the superintelligence could realize a nearly-as-great good (in fractional terms) while sacrificing much less of our own potential well-being. Suppose that we agreed to allow
almost
the entire accessible universe to be converted into hedonium—everything except a small preserve, say the Milky Way, which would be set aside to accommodate our own needs. Then there would still be a hundred billion galaxies devoted to the maximization of pleasure. But we would have one galaxy within which to create wonderful civilizations that could last for billions of years and in which humans and nonhuman animals could survive and thrive, and have the opportunity to develop into beatific posthuman spirits.
29
If one prefers this latter option (as I would be inclined to do) it implies that one does not have an unconditional lexically dominant preference for acting morally permissibly. But it is consistent with placing great weight on morality.
Even from a purely moral point of view, it might be better to
advocate
some proposal that is less morally ambitious than MR or MP. If the morally best has no chance of being implemented—perhaps because of its frowning demandingness—it might be morally better to promote some other proposal, one that would be near-ideal and whose chances of being implemented could be significantly increased by our promoting it.
30
We might feel unsure whether to go for CEV, MR, MP, or something else. Could we punt on this higher-level decision as well, offloading even more cognitive work onto the AI? Where is the limit to our possible laziness?
Consider, for example, the following “reasons-based” goal:
Do whatever we would have had most reason to ask the AI to do.
This goal might boil down to extrapolated volition or morality or something else, but it would seem to spare us the effort and risk involved in trying to figure out for ourselves which of these more specific objectives we would have most reason to select.
Some of the problems with the morality-based goals, however, also apply here. First, we might fear that this reasons-based goal would leave too little room for our own desires. Some philosophers maintain that a person always has most reason to do what it would be morally best for her to do. If those philosophers are right, then the reason-based goal collapses into MR—with the concomitant risk that a superintelligence implementing such a dynamic would kill everyone within reach. Second, as with all proposals couched in technical language, there is a possibility that we might have misunderstood the meaning of our own assertions. We saw that, in the case of the morality-based goals, asking the AI to do what is right may lead to unforeseen and unwanted consequences such that, had we anticipated them, we would not have implemented the goal in question. The same applies to asking the AI to do what we have most reason to do.
What if we try to avoid these difficulties by couching a goal in emphatically nontechnical language—such as in terms of “niceness”:
31
Take the nicest action; or, if no action is nicest, then take an action that is at least super-duper nice.
How could there be anything objectionable about building a
nice
AI? But we must ask what precisely is meant by this expression. The lexicon lists various meanings of “nice” that are clearly not intended to be used here: we do not intend that the AI
should be
courteous and polite
nor
overdelicate or fastidious
. If we can count on the AI recognizing the intended interpretation of “niceness” and being motivated to pursue niceness in just that sense, then this goal would seem to amount to a command to do what the programmers meant for the AI to do.
32
An injunction to similar effect was included in the formulation of CEV (“… interpreted as we wish that interpreted”) and in the moral-permissibility criterion as rendered earlier (“… follow the intended meaning of this instruction”). By affixing such a “Do What I Mean” clause we may indicate that the other words in the goal description should be construed charitably rather than literally. But saying that the AI should be “nice” adds almost nothing: the real work is done by the “Do What I Mean” instruction. If we knew how to code “Do What I Mean” in a general and powerful way, we might as well use that as a standalone goal.
How might one implement such a “Do What I Mean” dynamic? That is, how might we create an AI motivated to charitably interpret our wishes and unspoken intentions and to act accordingly? One initial step could be to try to get clearer about what we mean by “Do What I Mean.” It might help if we could explicate this in more behavioristic terms, for example in terms of revealed preferences in various hypothetical situations—such as situations in which we had more time to consider the options, in which we were smarter, in which we knew more of the relevant facts, and in which in various other ways conditions would be more favorable for us accurately manifesting in concrete choices what we mean when we say that we want an AI that is friendly, beneficial, nice …
Here, of course, we come full circle. We have returned to the indirect normativity approach with which we started—the CEV proposal, which, in essence, expunges all concrete content from the value specification, leaving only an abstract value defined in purely procedural terms: to do that which we would have wished for the AI to do in suitably idealized circumstances. By means of such indirect normativity, we could hope to offload to the AI much of the cognitive work that we ourselves would be trying to perform if we attempted to articulate a more concrete description of what values the AI is to pursue. In seeking to take full advantage of the AI’s epistemic superiority, CEV can thus be seen as an application of the principle of epistemic deference.
So far we have considered different options for what content to put into the goal system. But an AI’s behavior will also be influenced by other design choices. In particular, it can make a critical difference which decision theory and which epistemology it uses. Another important question is whether the AI’s plans will be subject to human review before being put into action.
Table 13
summarizes these design choices. A project that aims to build a superintelligence ought to be able to explain what choices it has made regarding each of these components, and to justify why those choices were made.
33
| |
---|---|
Table 13 Component list | |
Goal content | What objective should the AI pursue? How should a description of this objective be interpreted? Should the objective include giving special rewards to those who contributed to the project’s success? |
Decision theory | Should the AI use causal decision theory, evidential decision theory, updateless decision theory, or something else? |
Epistemology | What should the AI’s prior probability function be, and what other explicit or implicit assumptions about the world should it make? What theory of anthropics should it use? |
Ratification | Should the AI’s plans be subjected to human review before being put into effect? If so, what is the protocol for that review process? |
We have already discussed how indirect normativity might be used in specifying the values that the AI is to pursue. We discussed some options, such as morality-based models and coherent extrapolated volition. Each such option creates further choices that need to be made. For instance, the CEV approach comes in many varieties, depending on who is included in the extrapolation base, the structure of the extrapolation, and so forth. Other forms of motivation selection methods might call for different types of goal content. For example, an oracle might be built to place a value on giving accurate answers. An oracle constructed with domesticity motivation might also have goal content that disvalues the excessive use of resources in producing its answers.
Another design choice is whether to include special provisions in the goal content to reward individuals who contribute to the successful realization of the AI, for example by giving them extra resources or influence over the AI’s behavior. We can term any such provisions “incentive wrapping.” Incentive wrapping could be seen as a way to increase the likelihood that the project will be successful, at the cost of compromising to some extent the goal that the project set out to achieve.
For example, if the project’s goal is to create a dynamic that implements humanity’s coherent extrapolated volition, then an incentive wrapping scheme might specify that certain individuals’ volitions should be given extra weight in the extrapolation. If such a project is successful, the result is not necessarily the implementation of humanity’s coherent extrapolated volition. Instead, some approximation to this goal might be achieved.
34
Since incentive wrapping would be a piece of goal content that would be interpreted and pursued by a superintelligence, it could take advantage of indirect normativity to specify subtle and complicated provisions that would be difficult for a human manager to implement. For example, instead of rewarding programmers according to some crude but easily accessible metric, such as how many hours they worked or how many bugs they corrected, the incentive wrapping could specify that
programmers “are to be rewarded in proportion to how much their contributions increased some reasonable
ex ante
probability of the project being successfully completed in the way the sponsors intended.” Further, there would be no reason to limit the incentive wrapping to project staff. It could instead specify that
every
person should be rewarded according to their just deserts. Credit allocation is a difficult problem, but a superintelligence could be expected to do a reasonable job of approximating the criteria specified, explicitly or implicitly, by the incentive wrapping.
It is conceivable that the superintelligence might even find some way of rewarding individuals who have died prior to the superintelligence’s creation.
35
The incentive wrapping could then be extended to embrace at least some of the deceased, potentially including individuals who died before the project was conceived, or even antedating the first enunciation of the concept of incentive wrapping. Although the institution of such a retroactive policy would not causally incentivize those people who are already resting in their graves as these words are being put to the page, it might be favored for moral reasons—though it could be argued that insofar as fairness is a goal, it should be included as part of the target specification proper rather than in the surrounding incentive wrapping.