The Information (56 page)

Read The Information Online

Authors: James Gleick

Tags: #Non-Fiction

 

Yet one still tries to take their measure.

How much information
 …?

When an object (a number or a bitstream or a dynamical system) can be expressed a different way in fewer bits, it is compressible. A frugal telegraph operator prefers to send the compressed version. Because the spirit of frugal telegraph operators kept the lights on at Bell Labs, it was natural for Claude Shannon to explore data compression, both theory and practice. Compression was fundamental to his vision: his war work on cryptography analyzed the disguising of information at one end and the recovery of the information at the other; data compression likewise encodes the information, with a different motivation—the efficient use of bandwidth. Satellite television channels, pocket music players, efficient cameras and telephones and countless other modern appurtenances depend on coding algorithms to compress numbers—sequences of bits—and those algorithms trace their lineage to Shannon’s original 1948 paper.

The first of these, now called Shannon-Fano coding, came from his colleague Robert M. Fano. It began with the simple idea of assigning short codes to frequent symbols, as in Morse code. They knew their method was not optimal, however: it could not be relied on to produce the shortest possible messages. Within three years it was surpassed by work of a graduate student of Fano’s at MIT, David Huffman. In the decades since, versions of the Huffman coding algorithm have squeezed many, many bytes.

Ray Solomonoff, a child of Russian immigrants who studied at the University of Chicago, encountered Shannon’s work in the early
1950s and began thinking about what he called the Information Packing Problem: how much information could one “pack” into a given number of bits, or conversely, given some information, how could one pack it into the fewest possible bits.

He had majored in physics, studied mathematical biology and probability and logic on the side, and gotten to know Marvin Minsky and John McCarthy, pioneers in what would soon be called artificial intelligence. Then he read Noam Chomsky’s offbeat and original paper “Three Models for the Description of Language,”

applying the new information-theoretic ideas to the formalization of structure in language. All this was bouncing around in Solomonoff’s mind; he was not sure where it led, but he found himself focusing on the problem of
induction
. How do people create theories to account for their experience of the world? They have to make generalizations, find patterns in data that are always influenced by randomness and noise. Could one enable a machine to do that? In other words, could a computer be made to learn from experience?

He worked out an elaborate answer and published it in 1964. It was idiosyncratic, and hardly anyone noticed until the 1970s, when both Chaitin and Kolmogorov discovered that Solomonoff had anticipated the essential features of what by then was called algorithmic information theory. In effect, Solomonoff, too, had been figuring out how a computer might look at sequences of data—number sequences or bit strings—and measure their randomness and their hidden patterns. When humans or computers learn from experience, they are using induction: recognizing regularities amid irregular streams of information. From this point of view, the laws of science represent data compression in action. A theoretical physicist acts like a very clever coding algorithm. “The laws of science that have been discovered can be viewed as summaries of large amounts of empirical data about the universe,”

wrote Solomonoff. “In the present context, each such law can be transformed into a method of compactly coding the empirical data that gave rise to that law.” A good scientific theory is economical. This was yet another way of saying so.

Solomonoff, Kolmogorov, and Chaitin tackled three different problems and came up with the same answer. Solomonoff was interested in inductive inference: given a sequence of observations, how can one make the best predictions about what will come next? Kolmogorov was looking for a mathematical definition of randomness: what does it mean to say that one sequence is more random than another, when they have the same probability of emerging from a series of coin flips? And Chaitin was trying to find a deep path into Gödel incompleteness by way of Turing and Shannon—as he said later, “putting Shannon’s information theory and Turing’s computability theory into a cocktail shaker and shaking vigorously.”

They all arrived at minimal program size. And they all ended up talking about complexity.

The following bitstream (or number) is not very complex, because it is rational:

D: 14
285
714
285
714
285
714
285
714
285
714
285
714
285
714

Other books

The Fledge Effect by R.J. Henry
Vuelo final by Follett Ken
Revenge by Lisa Jackson
Mad Season by Katia Wildermann
Royal's Bride by Kat Martin
Diario De Martín Lobo by Martín Lobo
New York to Dallas by J. D. Robb