Understanding Computation (50 page)

Read Understanding Computation Online

Authors: Tom Stuart

Tags: #COMPUTERS / Programming / General

In particular, it’s impossible to remove features (e.g.,
while
loops) from a programming language in a way that prevents us from writing
nonhalting programs while keeping the language powerful enough to be universal. If removing
a particular feature makes it impossible to write a program that loops forever, it must also
have made it impossible to implement
#evaluate
.

Languages that have been carefully designed to ensure that their
programs must always halt are
called
total programming languages
,
as opposed to the more conventional
partial programming
languages
whose
programs sometimes halt with an answer and sometimes
don’t. Total programming languages are still very powerful and capable
of expressing many useful computations, but one thing they can’t do is
interpret themselves.

Note

That’s surprising, since the equivalent of
#evaluate
for a total programming language
must by definition always halt, yet it still can’t be implemented in
that language—if it could, we’d be able to use the
does_it_say_no.rb
technique to make it loop
forever.

This gives us our first tantalizing glimpse of an impossible
program: we can’t write an interpreter for a total programming
language in itself, even though there’s a perfectly respectable,
guaranteed-to-halt algorithm for interpreting it. In fact it’s so
respectable that we
could
write it in another,
more sophisticated total programming language, but that new total
language wouldn’t be able to implement its own interpreter
either.

An interesting curiosity, but total languages are designed to
have artificial limits; we were looking for things that
no
computer or programming language could do.
We’d better keep going.

Programs Can Refer to Themselves

The self-referential trick used by
does_it_say_no.rb
hinges on
our ability to construct a program that can read its own source code, but
perhaps it seems a bit like cheating to assume that this will always be possible. In our
example, the program received its own source as an explicit input, thanks to functionality
provided by the surrounding environment (i.e., the shell); if that hadn’t been an option, it
could also have read the data directly from disk with
File.read(__FILE__)
, taking advantage of Ruby’s filesystem API and the special
__FILE__
constant that always contains the name of the
current file.

But we were supposed to be making a general argument that depended only on the
universality of Ruby, not on the capabilities of the operating system or the
File
class. What about compiled languages like Java and C that
may not have access to their source at runtime? What about JavaScript programs that get
loaded into memory over a network connection, and may not be stored locally on a filesystem
at all? What about self-contained universal systems like Turing machines and the lambda
calculus, where the notions of “filesystem” and “standard input” don’t exist?

Fortunately, the
does_it_say_no.rb
argument can
withstand this objection, because having a program read its own source from standard input
is just a convenient shorthand for something that all universal systems can do, again
regardless of their environment or other features. This is a consequence of a fundamental
mathematical result called
Kleene’s second recursion theorem
, which
guarantees that any program can be converted into an equivalent one that is able to
calculate its own source code. The recursion theorem provides reassurance that our shorthand
was justified: we could have replaced the line
program =
$stdin.read
with some code to generate the source of
does_it_say_no.rb
and assign it to
program
without having to do any I/O at all.

Let’s see how to do the conversion on a simple Ruby program. Take
this one, for example:

x
=
1
y
=
2
puts
x
+
y

We want to transform it into a program that looks something like
this…

program
=
'

'
x
=
1
y
=
2
puts
x
+
y

…where
program
is assigned a
string containing the source of the complete program. But what should
the value of
program
be?

A naïve approach is to try to concoct a simple string literal that can be assigned to
program
, but that quickly gets us into trouble, because
the literal would be part of the program’s source code and would therefore have to appear
somewhere inside itself. That would require
program
to
begin with the string
'program ='
followed by the value
of
program
, which would be the string
'program ='
again followed by the value of
program
, and so on forever:

program
=
%q{program = %q{program = %q{program = %q{program = %q{program = %q{

}}}}}}
x
=
1
y
=
2
puts
x
+
y
Note

Ruby’s
%q
syntax allows us to quote noninterpolated
strings with a pair of delimiter characters, in this case curly brackets, instead of
single quotes. The advantage is that the string literal can contain unescaped instances of
the delimiters as long as they’re correctly balanced:

>>
puts
%q{Curly brackets look like { and }.}
Curly brackets look like { and }.
=> nil
>>
puts
%q{An unbalanced curly bracket like }
is
a
problem
.
}
SyntaxError: syntax error, unexpected tIDENTIFIER, expecting end-of-input

Using
%q
instead of single
quotes helps to avoid character-escaping headaches in strings that
contain their own delimiters:

program
=
'program = \'program = \\\'program = \\\\\\\'

\\\\\\\'\\\'\''

The way out of this infinitely deep hole is to take advantage of
the fact that a value used by a program doesn’t
have
to appear literally in its source code; it can
also be computed dynamically from other data. This means we can
construct the converted program in three parts:

  1. Assign a string literal to a variable (call it
    data
    ).

  2. Use that string to compute the current program’s source code and assign it to
    program
    .

  3. Do whatever other work the program is supposed to do (i.e., the code we originally
    started with).

So the structure of the program will be more like this:

data
=
'

'
program
=

x
=
1
y
=
2
puts
x
+
y

That sounds plausible as a general strategy, but it’s a bit light
on specific details. How do we know what string to actually assign to
data
in part A, and how do we use it
in part B to compute
program
? Here’s
one solution:

  • In part A, create a string literal that contains the source code of parts B and C,
    and assign that string to
    data
    . This string won’t
    need to “contain itself,” because it’s not the source of the full program, only the
    section of the program that comes after part A.

  • In part B, first compute a string that contains the source code of part A. We can do
    that because part A mostly consists of a big string literal whose value is available as
    data
    , so we just need to prefix
    data
    ’s value with
    'data
    ='
    to recreate part A’s source. Then simply concatenate the result with
    data
    to get the source of the entire program (since
    data
    contains the source of parts B and C) and
    assign it to
    program
    .

This plan still sounds circular—part A produces the source of part
B, and part B produces the source of part A—but it narrowly avoids an
infinite regress by ensuring that part B just
computes
the source of part A without having to
literally contain it.

We can start making progress by filling in the bits we already
know about. We have most of the source of parts B and C already, so we
can partially complete the value of
data
:

data
=
%q{
program =

x = 1
y = 2
puts x + y
}
program
=

x
=
1
y
=
2
puts
x
+
y
Note

data
needs to contain newline
characters. By representing these as actual newlines inside an
uninterpolated string literal, rather than as interpolated
\n
escape sequences, we are able to include
the source of parts B and C verbatim without any special encoding or
escaping.
[
71
]
This straightforward copy-and-paste makes the source of
part A easier to compute.

We also know that the source of part A is just the string
'data
= %q{

}'
with the value of
data
filling the gap between the curly braces, so we can partially complete the
value of
program
too:

data
=
%q{
program =

x = 1
y = 2
puts x + y
}
program
=
"data = %q{
#{
data
}
}"
+

x
=
1
y
=
2
puts
x
+
y

Now all that’s missing from
program
is the source code of parts B and C,
which is exactly what
data
contains,
so we can append the value of
data
to
program
to finish it off:

data
=
%q{
program =

x = 1
y = 2
puts x + y
}
program
=
"data = %q{
#{
data
}
}"
+
data
x
=
1
y
=
2
puts
x
+
y

Finally, we can go back and fix up the value of
data
to reflect what part B actually looks
like:

data
=
%q{
program =
"data = %q{#{data}}" + data
x = 1
y = 2
puts x + y
}
program
=
"data = %q{
#{
data
}
}"
+
data
x
=
1
y
=
2
puts
x
+
y

And that’s it! This program does the same thing as the original, but now it has an extra
local variable containing its own source code—an admittedly hollow victory, since it doesn’t
actually
do
anything with that variable. So what if we convert a
program that expects a local variable called
program
and
does something with it? Take the classic example:

puts
program

This is a program that is trying to print its
own source code,
[
72
]
but it’s obviously going to fail in that form, because
program
is an undefined variable. If we run it through the self-referencing
transformation we get this result:

data
=
%q{
program = "data = %q{#{data}}" + data
puts program
}
program
=
"data = %q{
#{
data
}
}"
+
data
puts
program

That’s a bit more interesting. Let’s see what this code does on
the console:

>>
data
=
%q{
program = "data = %q{#{data}}" + data
puts program
}
=> "\nprogram = \"data = %q{\#{data}}\" + data\nputs program\n"
>>
program
=
"data = %q{
#{
data
}
}"
+
data
=> "data = %q{\nprogram = \"data = %q{\#{data}}\" + data\nputs program\n}\n
program = \"data = %q{\#{data}}\" + data\nputs program\n"
>>
puts
program
data = %q{
program = "data = %q{#{data}}" + data
puts program
}
program = "data = %q{#{data}}" + data
puts program
=> nil

Sure enough, the line
puts
program
really does print out the source code of the whole
program.

Hopefully it’s clear that this transformation doesn’t depend on
any special properties of the program itself, so it will work for any
Ruby program, and the use of
$stdin.read
or
File.read(__FILE__)
to read a program’s own
source can therefore always be eliminated.
[
73
]
It also doesn’t depend on any special properties of Ruby
itself—just the ability to compute new values from old ones, like any
other universal system—which implies that any Turing machine can be
adapted to refer to its own encoding, any lambda calculus expression can
be extended to contain a lambda-calculus representation of its own
syntax, and
so on.

Other books

Immaculate by Katelyn Detweiler
Awakening The Warriors by S E Gilchrist
Landry's Law by Kelsey Roberts
The Garden of My Imaan by Farhana Zia
Captive by Gale Stanley
Beneath the Burn by Godwin, Pam
Suspicious Ways by Lexxie Couper
Disconnected by Jennifer Weiner