The Violinist's Thumb: And Other Lost Tales of Love, War, and Genius, as Written by Our Genetic Code (12 page)

Reading a full genomic history, however, requires more dexterity than reading other texts. Reading DNA requires both left-to-right and right-to-left reading—boustrophedon reading.
Otherwise scientists miss crucial palindromes and semordnilaps, phrases that read the same forward and backward (and vice versa).

One of the world’s oldest known palindromes is an amazing up-down-and-sideways square carved into walls at Pompeii and other places:

S-A-T-O-R

A-R-E-P-O

T-E-N-E-T

O-P-E-R-A

R-O-T-A-S

At just two millennia old, however,
sator… rotas
*
falls orders of magnitude short of the age of the truly ancient palindromes in DNA. DNA has even invented two kinds of palindromes. There’s the traditional, sex-at-noon-taxes type—GATTACATTAG. But because of A-T and C-G base pairing, DNA sports another, subtler type that reads forward down one strand and backward across the other. Consider the string CTAGCTAG, then imagine what bases must appear on the other strand, GATCGATC. They’re perfect palindromes.

Harmless as it seems, this second type of palindrome would send frissons of fear through any microbe. Long ago, many microbes evolved special proteins (called “restriction enzymes”) that can snip clean through DNA, like wire cutters. And for whatever reason, these enzymes can cut DNA only along stretches that are highly symmetrical, like palindromes. Cutting DNA has some useful purposes, like clearing out bases damaged by radiation or relieving tension in knotted DNA. But naughty microbes mostly used these proteins to play Hatfields versus McCoys and shred each other’s genetic material. As a result microbes have learned the hard way to avoid even modest palindromes.

Not that we higher creatures tolerate many palindromes,
either. Consider CTAGCTAG and GATCGATC again. Notice that the beginning half of either palindromic segment could base-pair with the second half of itself: the first letter with the last (C… G), the second with the penult (T… A), and so on. But for these internal bonds to form, the DNA strand on one side would have to disengage from the other and buckle upward, leaving a bump. This structure, called a “hairpin,” can form along any DNA palindrome of decent length because of its inherent symmetry. As you might expect, hairpins can destroy DNA as surely as knots, and for the same reason—they derail cellular machinery.

Palindromes can arise in DNA in two ways. The shortish DNA palindromes that cause hairpins arose randomly, when A’s, C’s, G’s, and T’s just happened to arrange themselves symmetrically. Longer palindromes litter our chromosomes as well, and many of those—especially those that wreak havoc on the runt Y chromosome—probably arose through a specific two-step process. For various reasons, chromosomes sometimes accidentally duplicate chunks of DNA, then paste the second copy somewhere down the line. Chromosomes can also (sometimes after double-strand breaks) flip a chunk of DNA by 180 degrees and reattach it ass-backwards. In tandem, a duplication and inversion create a palindrome.

Most chromosomes, though, discourage long palindromes or at least discourage the inversions that create them. Inversions can break up or disable genes, leaving the chromosome ineffective. Inversions can also hurt a chromosome’s chances of crossing over—a huge loss. Crossing over (when twin chromosomes cross arms and exchange segments) allows chromosomes to swap genes and acquire better versions, or versions that work better together and make the chromosome more fit. Equally important, chromosomes take advantage of crossing over to perform quality-control checks: they can line up side by side, eyeball each
other up and down, and overwrite mutated genes with nonmutated genes. But a chromosome will cross over only with a partner that looks similar. If the partner looks suspiciously different, the chromosome fears picking up malignant DNA and refuses to swap. Inversions look dang suspicious, and in these circumstances, chromosomes with palindromes get shunned.

Y once displayed this intolerance for palindromes. Way back when, before mammals split from reptiles, X and Y were twin chromosomes and crossed over frequently. Then, 300 million years ago, a gene on Y mutated and became a master switch that causes testes to develop. (Before this, sex was probably determined by the temperature at which Mom incubated her eggs, the same nongenetic system that determines pink or blue in turtles and crocodiles.) Because of this change, Y became the “male” chromosome and, through various processes, accumulated other manly genes, mostly for sperm production. As a consequence, X and Y began to look dissimilar and shied away from crossing over. Y didn’t want to risk its genes being overwritten by shrewish X, while X didn’t want to acquire Y’s meathead genes, which might harm XX females.

After crossing over slowed down, Y grew more tolerant about inversions, small and large. In fact Y has undergone four massive inversions in its history, truly huge flips of DNA. Each one created many cool palindromes—one spans three million letters—but each one made crossing over with X progressively harder. This wouldn’t be a huge deal except, again, crossing over allows chromosomes to overwrite malignant mutations. Xs could keep doing this in XX females, but when Y lost its partner, malignant mutations started to accumulate. And every time one appeared, cells had no choice but to chop Y down and excise the mutated DNA. The results were not pretty. Once a large chromosome, Y has lost all but two dozen of its original fourteen hundred genes.
At that rate, biologists once assumed that Ys were goners. They seem destined to keep picking up dysfunctional mutations and getting shorter and shorter, until evolution did away with Ys entirely—and perhaps did away with males to boot.

Palindromes, however, may have pardoned Y. Hairpins in a DNA strand are bad, but if Y folds
itself
into a giant hairpin, it can bring any two of its palindromes—which are the same genes, one running forward, one backward—into contact. This allows Y to check for mutations and overwrite them. It’s like writing down “A man, a plan, a cat, a ham, a yak, a yam, a hat, a canal: Panama!” on a piece of paper, folding the paper over, and correcting any discrepancies letter by letter—something that happens six hundred times in every newborn male. Folding over also allows Ys to make up for their lack of a sex-chromosome partner and “recombine” with themselves, swapping genes at one point along their lengths for genes at another.

This palindromic fix is ingenious. Too clever, in fact, by half. The system Y uses to compare palindromes regrettably doesn’t “know” which palindrome has mutated and which hasn’t; it just knows there’s a difference. So not infrequently, Y overwrites a good gene with a bad one. The self-recombination also tends to—whoops—accidentally delete the DNA between the palindromes. These mistakes rarely kill a man, but can render his sperm impotent. Overall the Y chromosome would disappear if it couldn’t correct mutations like this; but the very thing that allows it to, its palindromes, can unman it.

Both the linguistic and mathematical properties of DNA contribute to its ultimate purpose: managing data. Cells store, call up, and transmit messages through DNA and RNA, and scientists routinely speak of nucleic acids encoding and processing
information, as if genetics were a branch of cryptography or computer science.

As a matter of fact, modern cryptography has some roots in genetics. After studying at Cornell University, a young geneticist named William Friedman joined an eccentric scientific think tank in rural Illinois in 1915. (It boasted a Dutch windmill, a pet bear named Hamlet, and a lighthouse, despite being 750 miles from the coast.) As Friedman’s first assignment, his boss asked him to study the effects of moonlight on wheat genes. But Friedman’s statistical background soon got him drawn into another of his boss’s lunatic projects
*
—proving that Francis Bacon not only wrote Shakespeare’s plays but left clues throughout the First Folio that trumpeted his authorship. (The clues involved changing the shapes of certain letters.) Although enthused—he’d loved code breaking ever since he’d read Edgar Allan Poe’s “The Gold-Bug” as a child—Friedman determined the supposed references to Bacon were bunkum. Someone could use the same deciphering schemes, he noted, to “prove” that Teddy Roosevelt wrote
Julius Caesar.
Nevertheless Friedman had envisioned genetics as biological code breaking, and after his taste of real code breaking, he took a job in cryptography with the U.S. government. Building on the statistical expertise he’d gained in genetics, he soon cracked the secret telegrams that broke the Teapot Dome bribery scandal open in 1923. In the early 1940s he began deciphering Japanese diplomatic codes, including a dozen infamous cables, intercepted on December 6, 1941, from Japan to its embassy in Washington, D.C., that foreshadowed imminent threat.

Friedman had abandoned genetics because genetics in the first decades of the century (at least on farms) involved too much sitting around and waiting for dumb beasts to breed; it was more animal husbandry than data analysis. Had he been born a
generation or two later, Friedman would have seen things differently. By the 1950s biologists regularly referred to A-C-G-T base pairs as biological “bits” and to genetics as a “code” to crack. Genetics
became
data analysis, and continued to develop along those lines thanks in part to the work of a younger contemporary of Friedman, an engineer whose work encompassed both cryptography and genetics, Claude Shannon.

Scientists routinely cite Shannon’s thesis at MIT, written in 1937 when he was twenty-one years old, as the most important master’s thesis ever. In it Shannon outlined a method to combine electronic circuits and elementary logic to do mathematical operations. As a result, he could now design circuits to perform complex calculations—the basis of all digital circuitry. A decade later, Shannon wrote a paper on using digital circuits to encode messages and transmit them more efficiently. It’s only barely hyperbole to say that these two discoveries created modern digital communications from scratch.

Amid these seminal discoveries, Shannon indulged his other interests. At the office he loved juggling, and riding unicycles, and juggling while riding unicycles down the hall. At home he tinkered endlessly with junk in his basement; his lifetime inventions include rocket-powered Frisbees, motorized pogo sticks, machines to solve Rubik’s Cubes, a mechanical mouse (named Theseus) to solve mazes, a program (named THROBAC) to calculate in Roman numerals, and a cigarette pack–sized “wearable computer” to rip off casinos at roulette.
*

Shannon also pursued genetics in his Ph.D. thesis in 1940. At the time, biologists were firming up the connection between genes and natural selection, but the heavy statistics involved frightened many. Though he later admitted he knew squat about genetics, Shannon dived right in and tried to do for genetics what he’d done for electronic circuits: reduce the complexities
into simple algebra, so that, given any input (genes in a population), anyone could quickly calculate the output (what genes would thrive or disappear). Shannon spent all of a few months on the paper and, after earning his Ph.D., was seduced by electronics and never got back around to genetics. It didn’t matter. His new work became the basis of information theory, a field so widely applicable that it wound its way back to genetics without him.

With information theory, Shannon determined how to transmit messages with as few mistakes as possible—a goal biologists have since realized is equivalent to designing the best genetic code for minimizing mistakes in a cell. Biologists also adopted Shannon’s work on efficiency and redundancy in languages. English, Shannon once calculated, was at least 50 percent redundant. (A pulp novel he investigated, by Raymond Chandler, approached 75 percent.) Biologists studied efficiency too because, per natural selection, efficient creatures should be fitter creatures. The less redundancy in DNA, they reasoned, the more information cells could store, and the faster they could process it, a big advantage. But as the Tie Club knew, DNA is sub-suboptimal in this regard. Up to six A-C-G-T triplets code for just one amino acid: totally superfluous redundancy. If cells economized and used fewer triplets per amino acid, they could incorporate more than just the canonical twenty, which would open up new realms of molecular evolution. Scientists have in fact shown that, if coached, cells in the lab can use fifty amino acids.

Other books

Frontier Wife by Margaret Tanner
Number One Kid by Patricia Reilly Giff
Odalisque by Annabel Joseph
The Portable Edgar Allan Poe by Edgar Allan Poe
Shield's Submissive by Trina Lane
7 Days and 7 Nights by Wendy Wax