Authors: George M. Church
So what have we learned in this discussion of the mirror world, the genetic code, and the generation of diversity? Basically, the following lessons: (1) It is indeed possible to create mirror amino acids and mirror proteins with predictable (small- and large-scale) properties. (2) With the addition of more work and more parts, we can create a fully mirror biological world. (3) These, along with new amino acids and backbones, can then give us access to an entire new world of exotic biomaterials, pharmaceuticals, chemicalsâand who knows what else.
As we continue our crusade for increased diversity, we must now put the essential question: How fast and how diverse can evolution be made to go? We can increase diversity and replexity by adding genes and by adding needed polymer types, including but not limited to mirror versions of standard chemical reaction types and totally new reaction types, but how does evolution scale up to make the truly marvelous functional diversity in the world? The answer lies in the key parameters involved in maximizing the rate of evolutionary change: population and subpopulation sizes, mutation rate, time, selection and replication rates, recombination (the rearrangement of genetic material), multicellularity, and macro-evolution (evolutionary change that occurs at or above the level of the species).
Worldwide we have a steady state of 10
27
organisms and a mutation rate of 10
7
per base pair per cell division. The Cambrian explosion happened in the brief period from 580 to 510 million years ago, when the rate of evolution is said to have accelerated by a factor of ten as seen in the number of species coming and going, and as defined by fossil morphologies. The rearrangement of genetic material (recombination) occurs at a rate of about once or a few times per chromosome in a variety of organisms. On a lab scale we typically are limited to 10
10
organisms, and less than a year for selection, so we are at a 10
17
and 7x10
8
-fold disadvantage, respectively, compared to what occurs in nature. So if we want to evolve in the lab as fast as nature did it during the Cambrian, or even faster, then we have to make up for these disadvantages with higher mutation, selection, and recombination rates.
The fastest lab replication times for a free living cell are held by
E. coli
and
Vibrio
, which clock in at about eleven minutes and eighteen minutes per replication, respectively, speeds that are probably limited by the rate at which a ribosome can make another ribosome. In principle, even though a bacterial virus or phage might take sixteen minutes to complete an infection cycle, the burst size (in terms of the number of virus [phage] babies released when the host cell bursts) is 128, and so the doubling time of the virus is 16/7 = 2.3 minutes (since 2
7
= 128).
Other replication speed demons are flies (midges and fruit flies) due to their boom and bust lifestyle. Flies crawl around starving. Then suddenly a piece of fruit appears, and the flies that manage to convert that piece of fruit into fly eggs fastest win. The best flies lay an egg every forty minutes. But even more impressive, some of its genomesâindeed whole nuclei (which look a bit like cells surrounded by a nuclear membrane)âcan divide in six minutes, beating even
E. coli
, but only by cheating, since whole nuclei depend on prefabricated ribosomes lurking in the vast resources of the newly fertilized fly egg.
The limiting factor on mutation rate is the finite size of the population in question and the deadly consequences of mutations hitting positions in the genome that are essential for life. Some viruses are highly mutable; for example, lentiviruses such as HIV have mutation rates as high as 0.1 percent per replication cycle. This is possible because their small genome of 9,000 base pairs would have on average one (or a few) serious changes and some will have zero. In addition, sharing of genomic material can occur between two adjacent viral genomes that are dysfunctional due to different mutations. In contrast, with 300,000 base pairs that matter per
E. coli
genome, and probably three times that for humans, the number of errors per base pair per division must be close to one per million (and can get better than one per billion).
How far could we push this if we could only mutate the nonessential or, better yet, the most likely to be useful bases, in order to succeed in our quest to turbo-charge the rate of evolution of organisms? If we had forty sites in the genome for which we would like to try out two possible variants in all possible combinations, then that would require a population at least 2
40
= 10
12
(a trillion) cells just to hold all of the combinations at once.
Those cells would all fit in a liter (â quart). This would correspond to a mutation rate of forty genetic changes (rather than one) per genome per cell division. We could get away with fewer by spreading them over time or if the selection is additive. (“Additive” means that each change has some advantage and the order of change doesn't matter much.) So this additive scenario provides an alternative to exploring all 2
40
special combinations at once. If we don't need to explore every combination exhaustively but want the highest mutation rate, that rate could be (theoretically) millions per cell per generation, depending on the efficiency of synthesizing and/or editing genomes (described below).
My gut feeling (by no means proven) is that, despite limitations of space and time, we humans can suddenly start to evolve thousands of times faster than during the impressive Cambrian era, and that we can direct this diversity toward our material needs instead of letting it occur randomly.
The Real Point of Reading Genomes: Comparative and Synthetic Genomics
All these changes and innovations lie in the future. In the nearer term we will reap the benefits not only by manipulating genetic codes and mutating genomes but by reading them.
The object of the Human Genome Project (unsung predecessor to the well-known Personal Genome Project) was not, ironically, to read a real genome. Its goal (and final result) was to sequence a composite genome of several individuals, a veritable MixMaster blend of humanity. Furthermore, the genome that was actually sequenced was riddled with hundreds of gaps. While it was a historic milestone of science, it was nevertheless mostly symbolicâlike the moon landingâand had relatively little value in practical, personal, or medical terms.
But supposing that we had sequenced a number of whole, intact, and genuine human genomes, the real point in reading them would be to compare them against each other and to mine biological widgets from themâ
genetic sequences that performed specific, known, and useful functions. Discovering such sequences would extend our ability to change ourselves and the world because, essentially, we could copy the relevant genes and paste them into our own genomes, thereby acquiring those same useful functions or capacities.
A functional stretch of DNA defines a biological object that does somethingâand not always something good. There are genes that give us diseases and genes that protect us from diseases. Others give us special talents, control our height and weight, and so forth, with new gene discoveries being made all the time. In general, the more highly conserved (unchanged) the RNA or protein sequence, then the more valuable it is in the functions that it encodes, and the farther back in time it goes. The most highly conserved sequences of all are the components of protein synthesis, the ribosomal RNAs and tRNAs that transport amino acids from the cytoplasm of a cell to the ribosome that then strings the amino acids together into proteins. Even though these structures are hard to change evolutionarily, they can be changed via genome engineering in order to make multivirus-resistant organisms (
Chapter 5
), and by using mirror image amino acids, they can be changed to make multi-enzyme-resistant biology (
Chapter 1
).
Today it is possible to read genetic sequences directly into computers where we can store, copy, and alter them and finally insert them back into living cells. We can experiment with those cells and let them compete among themselves to evolve into useful cellular factories. This is a way of placing biology, and evolution, under human direction and control. The J. Craig Venter Institute spent $40 million constructing the first tiny bacterial genomes in 2010 without spelling out the reasons for doing so. So let's identify these reasons now, beginning with the reasons for making smaller viral genomes.
The first synthetic genome was made by Blight, Rice, and colleagues in 2000. They did this with little fanfare, basically burying the achievement in footnote 9 of their paper in
Science
(“cDNAs spanning 600 to 750 bases in length were assembled in a stepwise PCR assay with 10 to 12 gel-purified oligonucleotides [60 to 80 nucleotides (nt)] with unique complementary overlaps of 16 nt”).
The authors had in fact synthesized the hepatitis C virus (HCV). This virus, which affects 170 million people, is the leading cause of liver transplantation. Its genome is about 9,600 bases long. The synthesis allowed researchers to make rapid changes to discover which of them improved or hurt their ability to grow the various strains in vitro (outside of the human body), which was a big deal at the time.
In 2002 Cello, Paul, and Wimmer synthesized the second genome, that of polio virus. This feat received more press coverage even though the genome was smaller (7,500 bases) and the amount of damage done to people per year was considerably less than that done by HCV (polio is nearly extinct). This heightened awareness was due in part to moving their achievement into the title: “Generation of Infectious Virus in the Absence of Natural Template.” One rationale for the synthesis was to develop safer attenuated (weakened) vaccine strains.
In 2003 Hamilton Smith and coworkers synthesized the third genome, that of the phiX174 virus, in order to improve the speed of genome assembly from oligonucleotides. This exercise received even more attention, despite the fact that the genome was still smaller (5,386 bases) and didn't impact human health at all.
In 2008 the causative agent of SARS (severe acute respiratory syndrome), a coronavirus, was made from scratch in order to get access to the virus when the original researchers refused to share samples of it, perhaps under the mistaken impression that synthesis of the 30,000-base pair genome would provide an unacceptable barrier to their competitors. The researchers created the synthetic SARS virus anyway, for the purpose of investigating how the virus evolved, to establish where it came from, and to develop vaccines and other treatments for the disease it caused.
It turned out, however, that the synthetic versions of the SARS virus didn't work due to published sequencing errors. When the errors were discovered and corrected, the synthetic viral genome did work, and was infective both in cultured cells and in mice.
So, in summary, the reasons for synthesizing virus genomes are to better understand where they came from, how they evolved, and to assist in the development of drugs, vaccines, and associated therapeutics. Why did the
JCVI spend $40 million making a copy of a tiny bacterial genome? Some of this is attributable to the cost of researchâfull of dead ends and expensive discoveries. The other factor was that the core technologies used were first-generation technologies for reading and writing DNA, provided in the form of 1,000 base pair chunks by commercial vendors at about $0.50 per base pair plus similar costs for (semiautomated) assembly. So to redo a 1 million base pair genome would cost $1 million, while synthesizing a useful industrial microbe like
E. coli
would cost $12 million. But using second generation technologies for genome engineering (see below), not just one but a billion such genomes can cost less than $9,000. This is done by making many combinations of DNA snippets harvested from inexpensive yet complex DNA chips.
Most fundamentally, the object of synthesizing genomes is to create new organisms that we can experiment with and optimize for various narrow and targeted purposes, from the creation of new drugs and vaccines to biofuels, chemicals, and new materials.
In the quest to engineer genomes into existence, my lab mates and I have developed a technique called multiplex automated genome engineering (MAGE). The kernel of the technique is the idea of multiplexing, a term derived from communications theory and practice. It refers to the simultaneous transmission of several messages over a single communication channel, as for example through an optical fiber. In the context of molecular genetics, multiplexing refers to the process of inserting several small pieces of synthetic DNA into a genome at multiple sites, simultaneously. Doing this would make it possible to introduce as many as 10 million genetic modifications into a genome within a reasonable time period.