Authors: George M. Church
This brings us to the sixth industrial revolutionâthe information-genomics revolution. Like the others, this revolution has crucial quantitative measuresâprobability, information, and complexityâas well as possibly crucial emerging measures of life, evolution, and intelligence. Theories of probability began as ruminations on how to win at games of chance, as illustrated by the
Liber de ludoaleae
(Book on Games of Chance), written by Gerolamo Cardano in 1526. Information and communication theory builds on probability theory. The basic concepts were first clearly articulated by Claude Shannon in his paper “A Mathematical Theory of Communication” (1948). One of Shannon's goals was to quantify the amount of information lost to static and other influences in phone line signals. He called this measure of uncertainty “information entropy” as an analogue to entropy in thermodynamics, where it refers to the irreplaceable loss of heat energy.
John von Neumann, the mathematician and sometime wit, liked Shannon's term, telling him: “You should call it entropy, for two reasons. In the first place your uncertainty function has been used in statistical mechanics
under that name, so it already has a name. In the second place, and more important, nobody knows what entropy really is, so in a debate you will always have the advantage” (This was a concept with legs, as the idea was later exported to the business world as “corporate entropy”âenergy lost to bureaucracy and red tape.)
This sixth revolution is allowing us to understand, generalize, and make connections to the previous ones. For example, the precise sequences of DNA, RNA, and proteins discussed in
Chapter 1
have a great deal in common with the strings of zeros and ones of the digital computing revolution.
Some unwelcome consequences of the sixth revolution include computer viruses, identity theft, privacy invasion, cyberwar, and bioterror. The potential scale of cyberwar became evident recently in the extent to which the Iranian computers controlling isotope separations could be compromised by individuals having no physical access to the devices (possibly Israeli sympathizers). The cost to society of hacking, Trojan horses, and computer viruses and worms is about $50 billion per year. Bioterrorism is currently small to nonexistent, but the stakes are astronomically high.
The original goal of the Personal Genome Project (PGP) was to sequence the genomes of 100,000 volunteers at no cost to them, and to publish the results on the Internet along with each individual's personal data, even down to their picture. Of course, any such plan immediately raises privacy issues. The PGP couldn't promise to keep any of this information private, since the whole point of the exercise was to create an open access source of genetic and personal information and to disseminate it as widely as possible.
The solution was to accept into the program only people who, like myself, consider that the benefits to society outweigh the risksâand also regard privacy as a highly overrated asset. But then the question was, How could we guarantee that we'd recruit only such people? The answer was to formulate a list of eligibility criteria, create consent forms, and devise an
online entrance exam composed of a few dozen questions that would measure each volunteer's understanding and acceptance of the totally free and open-access nature of the enterprise. A potential recruit would have to give correct answers to each and every question, the equivalent of getting a score of 100 percent on an examination. The candidate could take the test as many times as necessary to get 100 percent, but a perfect score would be required for enrollment in the program.
In addition to passing the entrance exam, participants would be required to sign two consent forms: a five-page mini consent form outlining the program and its requirements and constituting a basic eligibility screening. A second and more elaborate sixteen-page full consent form would describe the program, the candidate's participation, and public release of the data generated, in great and heroic detail.
All this was acceptable to Harvard's Institutional Review Board (
IRB
), the body responsible for approving, monitoring, and reviewing medical research or experimentation on humans. In August 2005, the IRB gave us approval for a pilot program. I became the first candidate.
Today, anyone can go to
PersonalGenomes.org
and view my public profile, which includes vital signs (my height, weight, and blood pressure), allergies (none), medications (lovastatin, coenzyme Q, multivitamins, calcium, etc.), medical history (narcolepsy and squamous cell carcinoma, among other fun things), race (white), traits (male, blood type O+, green eyes, etc.), facial photographs (suitable for framing), DNA data sets, and type of tissue samples taken (lymphoblasts and fibroblasts), plus date collected, storage location, and accession number. All of this information is followed, furthermore, by a universal waiver, stating in part: “To the extent possible under law,
PersonalGenomes.org
has waived all copyright and related or neighboring rights to Personal Genome Project Participant Genetic and Trait Dataset.”
My tissue samples were taken in 2005 and 2006. My lab has developed or advised most of the current thirty-six commercial next-generation sequencing technologies, and we test these technologies as they mature. The first set of samples was sequenced at Complete Genomics in Mountain View, California.
The results were underwhelming. My genome should have shown alleles for narcolepsy, dyslexia, high cholesterol, cardiac arrhythmia (and maybe musical arrhythmia as well!), squamous cell carcinoma, and plantar fasciitis. But it didn't (yet). Sequencing a genome is one thing, but interpreting and understanding itâmaking sense out of the practically endless and visually meaningless strings of the nucleotide letters A, T, C, and Gâis quite another. Doing so requires the use of software that translates those otherwise baffling chains of letters into usable, practical information. The process of developing such software is still in its early stages. In December 2010 the University of California-Berkeley hosted a competition for genome interpretation programs. It was called the inaugural Critical Assessment of Genome Interpretation (
CAGI
) competition and attracted more than one hundred entrants. The very existence of this competition shows that we have a long way to go before a genome sequenced is a genome understood.
The Personal Genome Project was formally opened to the general public on DNA Day of 2009, April 25, the anniversary (recognized by the U.S. Congress) of the day in 1953 that Watson and Crick's paper describing
DNA's
structure was published in
Nature
. The first ten participants, myself and nine others, became known as the PGP-10. The Harvard IRB initially wanted the first ten candidates to have at least a master's degree in genetics, but the board later dropped this requirement as impractical for subsequent scaling up. In the end, the PGP-10 included Esther Dyson (PGP-3), who describes herself as “a longtime catalyst of start-ups in information technology,” Steven Pinker (PGP-6), a Harvard psychologist, and Misha Angrist (PGP-4), an assistant professor at Duke, the only one of the PGP-10 who has a PhD in genetics.
In 2009 Pinker wrote a first-person account of his participation in the project for the
New York Times Magazine
, “My Genome, My Self.” The piece described a decision that every potential PGP participant had to face: whether or not to learn that you may be carrying the gene for an incurable
disease such as Huntington's or Alzheimer's. Pinker chose not to learn whether he had a variant of the
APOE
gene that would predispose him to Alzheimer's disease. (This is known as “redacting” the information in question.)
He did learn that he had one copy of a gene for familial dysautonomia, an incurable disorder of the nervous system with unpleasant consequences, including premature death. “A well-meaning colleague tried to console me,” Pinker wrote, “but I was pleased to gain the knowledge”
His other genomic discoveries included good news and bad news. The good news was that he had a less than average chance of getting prostate cancer before age 80. The bad news was that he had a slightly elevated chance of developing type 2 diabetes. (He had a risk of baldness, despite the fact that he had great hair.)
Pinker's results show that genetics is not always destiny. Some genes are deterministic. If you have the gene for Huntington's disease and you live long enough, you will sooner or later develop it. Otherwise, the influence of genes on traits is probabilistic, stacking the odds in one direction or another but without completely predetermining the outcome. Better yet, since there is a strong environmental component of our genetic destiny, we can take action to influence or defeat it.
Another of the PGP-10 also wrote about the journey through his personal genomic universe. This was Misha Angrist (PGP-4), who published the story in his book,
Here Is a Human Being: At the Dawn of Personal Genomics
(2010). His data had been interpreted by the Trait-o-matic, opensource software developed at the Church lab for the purpose of finding and classifying the ways in which genetic sequence variations manifest themselves in the human body.
When Angrist logged on to the website holding this personal genomic information, he too discovered a modest number of unimpressive genetic data points about himself. Those data points, however, are now part of a growing body of openly accessible set of genotypic-phenotypic correlations, a sort of medical analogue to the World Wide Web.
There is a difference worth noting between the Personal Genome Project and the group of private companies that do over-the-counter genome
analysis for the masses. The genomic information provided by the Personal Genome Project is useful both to the individual sequenced as well as to the wider community of researchers, with the dual goals of transforming the practice and delivery of medicine and understanding how genomes give rise to living beings and how they influence the manifold processes of life.
Commercial genome sequencing firms, like Knome, by contrast, hold their data privately, and for good reason, for the information they acquire could be put to a number of unpleasant uses. For example, it could be used to infer paternity, affect employment or insurability, or even one's love life. Ironically, the information often is not used by the person for whom it was intended. A study conducted by Scripps Health, of La Jolla, California, and published in the
New England Journal of Medicine
in 2011, reported that out of 2,037 people whose genomes were analyzed by the private firm Navigenics, most of them had failed to make any changes in their diet or exercise patterns when they were interviewed three months after receiving their test results, even when their results showed a definite need for making such changes. (Still, 27 percent of the participants who shared their results with their physicians did make some lifestyle changes.)
In the end, just as individuals respond differently to drugs and to pathogens, they also respond differently to information.