Ancient DNA: Methods and Protocols (44 page)

cause them to miss alignments when the number of allowed

edits is set too low. Green
et al.
( 8 )
presented an aligner with sensitivity comparable to MegaBLAST ( 79, 80 ) that incorporates base misincorporation patterns typical of aDNA extracts.

12. If nonidentical sequences originating from different DNA molecules are clustered together, a consensus approach will average these. This may result in incorrect haplotype calls and low quality scores for sites where variation is present. Thus, a consensus approach should only be applied if it is very unlikely that two different template molecules may be clustered. For aDNA samples with a few million endogenous molecules, large megabase-sized genomes, and random fragment ends, the

assumption of PCR duplicates as the only source is probably valid. Large amounts of endogenous DNA, small genomes, or

protocols that generate nonrandom fragment ends (such as the use of restriction enzymes or multiplex PCR) may, however, confl ict with this assumption.

Acknowledgments

I thank all current and previous members of the Department of Evolutionary Genetics at the Max Planck Institute for Evolutionary Anthropology, and particularly members of the aDNA and 226

M. Kircher

sequencing group, for interesting discussions and useful insights as well as for providing their sequencing data for analysis (especially Knut Finstermeier for providing the example data set). I also thank Knut Finstermeier and Beth Shapiro for critical reading and revisions. This work was supported by a grant from the Max Planck Society.

References

1. Margulies M et al (2005) Genome sequencing

14. Poinar HN et al (2006) Metagenomics to

in microfabricated high-density picolitre reac—

paleogenomics: large-scale sequencing of

tors. Nature 437(7057):376–380

mammoth DNA. Science 311(5759):392–394

2. Bentley DR et al (2008) Accurate whole human

15. Green RE et al (2008) A complete Neandertal

genome sequencing using reversible termina—

mitochondrial genome sequence determined

tor chemistry. Nature 456(7218):53–59

by high-throughput sequencing. Cell 134(3):

3. Shendure J et al (2005) Accurate multiplex

416–426

polony sequencing of an evolved bacterial 16. Gilbert MT et al (2008) Intraspecifi c phyloge-genome. Science 309(5741):1728–1732

netic analysis of Siberian woolly mammoths

4. Harris TD et al (2008) Single-molecule DNA

using complete mitochondrial genomes. Proc

sequencing of a viral genome. Science

Natl Acad Sci U S A 105(24):8327–8332

320(5872):106–109

17. Briggs AW et al (2007) Patterns of damage in

5. Drmanac R et al (2010) Human genome

genomic DNA sequences from a Neandertal.

sequencing using unchained base reads on self—

Proc Natl Acad Sci USA 104(37):

assembling DNA nanoarrays. Science

14616–14621

327(5961):78–81

18. Heyn P et al (2010) Road blocks on paleoge—

6. Korlach J et al (2008) Selective aluminum pas—

nomes—polymerase extension profi ling reveals

sivation for targeted immobilization of single

the frequency of blocking lesions in ancient

DNA polymerase molecules in zero-mode

DNA. Nucleic Acids Res 38(16):e161

waveguide nanostructures. Proc Natl Acad Sci

19. Hofreiter M et al (2001) DNA sequences from

U S A 105(4):1176–1181

multiple amplifi cations reveal artifacts induced

7. Miller W et al (2008) Sequencing the nuclear

by cytosine deamination in ancient DNA.

genome of the extinct woolly mammoth.

Nucleic Acids Res 29(23):4793–4799

Nature 456(7220):387–390

20. Kircher M, Kelso J (2010) High-throughput

8. Green RE et al (2010) A draft sequence of the

DNA sequencing—concepts and limitations.

Neandertal genome. Science 328(5979):

Bioessays 32(6):524–536

710–722

21. Shendure J, Ji H (2008) Next-generation

9. Rasmussen M et al (2010) Ancient human

DNA sequencing. Nat Biotechnol 26(10):

genome sequence of an extinct Palaeo-Eskimo.

1135–1145

Nature 463(7282):757–762

22. Reich D et al (2010) Genetic history of an

10. Krause J et al (2006) Multiplex amplifi cation of

archaic hominin group from Denisova Cave in

the mammoth mitochondrial genome and the

Siberia. Nature 468(7327):1053–1060

evolution of Elephantidae. Nature 439(7077):

23. Prüfer K et al (2010) Computational challenges

724–727

in the analysis of ancient DNA. Genome Biol

11. Krause J et al (2010) The complete mitochon—

11(5):R47

drial DNA genome of an unknown hominin 24. Dohm JC et al (2008) Substantial biases in from southern Siberia. Nature 464(7290):

ultra-short read data sets from high-through—

894–897

put DNA sequencing. Nucleic Acids Res

12. Briggs AW et al (2009) Targeted retrieval and

36(16):e105

analysis of fi ve Neandertal mtDNA genomes. 25. Lassmann T, Hayashizaki Y, Daub CO (2009) Science 325(5938):318–321

TagDust—a program to eliminate artifacts

13. Burbano HA et al (2010) Targeted investiga—

from next generation sequencing data.

tion of the Neandertal genome by array-based

Bioinformatics 25(21):2839–2840

sequence capture. Science 328(5979):

26. Briggs AW, Stenzel U, Meyer M, Krause J,

723–725

Kircher M, Paabo S (2009) Removal of

23 Analysis of High-Throughput Ancient DNA Sequencing Data 227

deaminated cytosines and detection of in vivo

42. Stiller M et al (2009) Direct multiplex

methylation in ancient DNA. Nucleic Acids

sequencing (DMPS)—a novel method for

Res 38(6):e87

targeted high-throughput sequencing of

27. Krause J et al (2010) A complete mtDNA

ancient and highly degraded DNA. Genome

genome of an early modern human from

Res 19(10):1843–1848

Kostenki, Russia. Curr Biol 20(3):231–236

43. Paabo S, Irwin DM, Wilson AC (1990) DNA

28. Quinlan AR et al (2008) Pyrobayes: an

damage promotes jumping between templates

improved base caller for SNP discovery in

during enzymatic amplifi cation. J Biol Chem

pyrosequences. Nat Methods 5(2):179–181

265(8):4718–4721

29. Erlich Y et al (2008) AltaCyclic: a self-opti—

44. Lahr DJ, Katz LA (2009) Reducing the impact

mizing base caller for next-generation sequenc—

of PCR-mediated recombination in molecular

ing. Nat Methods 5(8):679–682

evolution and environmental studies using a

30. Kao WC, Stevens K, Song YS (2009) BayesCall:

new-generation high-fi delity DNA polymerase.

a model-based basecalling algorithm for high—

Biotechniques 47(4):857–866

throughput short-read sequencing. Genome 45. Meyerhans A, Vartanian JP, Wain-Hobson S

Res 19(10):1884–1895

(1990) DNA recombination during PCR.

31. Kircher M, Stenzel U, Kelso J (2009) Improved

Nucleic Acids Res 18(7):1687–1691

base calling for the Illumina Genome Analyzer

46. Odelberg SJ et al (1995) Template-switching

using machine learning strategies. Genome

during DNA synthesis by
Thermus aquaticus

Biol 10(8):R83

DNA polymerase I. Nucleic Acids Res

32. Whiteford N et al (2009) Swift: primary data

23(11):2049–2057

analysis for the Illumina Solexa sequencing 47. Mamanova L et al (2010) Target-enrichment platform. Bioinformatics 25(17):2194–2199

strategies for next-generation sequencing. Nat

33. Noer GJ (1998) Cygwin: A free win32 porting

Methods 7(2):111–118

layer for UNIX Applications. In: 2nd USENIX

48. R Development Core Team (2010) R: a lan—

NT Symposium, Seattle, WA

guage and environment for statistical comput—

34. Stajich JE et al (2002) The Bioperl toolkit: Perl

ing. R Foundation for Statistical Computing,

modules for the life sciences. Genome Res

Vienna, Austria

12(10):1611

49. Ewing B, Green P (1998) Basecalling of auto—

35. Cock PJA et al (2009) Biopython: freely avail—

mated sequencer traces using phred. II. Error

able Python tools for computational molecular

probabilities. Genome Res 8(3):186–194

biology and bioinformatics. Bioinformatics 50. Dolan PC, Denver DR (2008) TileQC: a 25(11):1422

system for tile-based quality control of Solexa

36. Mason CE et al (2010) Standardizing the next

data. BMC Bioinformatics 9:250

generation of bioinformatics software develop—

51. Andrews S (2010) FastQC: a quality control

ment with BioHDF (HDF5). Adv Exp Med

tool for high throughput sequence data

Biol 680:693–700

52. McKenna A et al (2010) The Genome Analysis

37. Chang F et al (2008) Bigtable: a distributed

Toolkit: a MapReduce framework for analyzing

storage system for structured data. ACM Trans

next-generation DNA sequencing data.

Comput Syst (TOCS) 26(2):1–26

Genome Res 20(9):1297–1303

38. Venner J (2009) Pro Hadoop. In: Moodie M

53. Li H, Durbin R (2009) Fast and accurate short

(ed) Apress. Springer, New York

read alignment with Burrows-Wheeler trans—

39. Meyer M, Kircher M (2010) Illumina sequenc—

form. Bioinformatics 25(14):1754–1760

ing library preparation for highly multiplexed 54. Palmer LE et al (2010) Improving de novo target capture and sequencing. Cold Spring

sequence assembly using machine learning and

Harb Protoc 2010(6):pdb.prot5448. comparative genomics for overlap correction.

doi: 10.1101/pdb.prot5448

BMC Bioinformatics 11:33

40. Meyer M, Stenzel U, Hofreiter M (2008) 55. Zerbino DR, Birney E (2008) Velvet: algo-Parallel tagged sequencing on the 454 plat—

rithms for de novo short read assembly using

form. Nat Protoc 3(2):267–278

de Bruijn graphs. Genome Res 18(5):

41. Illumina Inc. (2008) Multiplexed sequencing

821–829

with the Illumina Genome Analyzer System 56. Birol I et al (2009) De novo transcriptome [PDF] [cited; 770-2008-011]. Available from:

assembly with ABySS. Bioinformatics

http://www.illumina.com/Documents/prod-

25(21):2872–2877

ucts/datasheets/datasheet_sequencing_multi-

57. Chaisson MJ, Brinza D, Pevzner PA (2009) De

plex.pdf

novo fragment assembly with short mate-paired

228

M. Kircher

reads: does the read length matter? Genome 68. Altschul SF et al (1990) Basic local alignment Res 19(2):336–346

search tool. J Mol Biol 215(3):403–410

58. Jeck WR et al (2007) Extending assembly of 69. Kent WJ (2002) BLAT—the BLAST-like align-short DNA sequences to handle error.

ment tool. Genome Res 12(4):656–664

Bioinformatics 23(21):2942–2944

70. Thompson JD, Higgins DG, Gibson TJ (1994)

59. Li H et al (2009) The Sequence Alignment/

CLUSTAL W: improving the sensitivity of pro—

Map format and SAMtools. Bioinformatics

gressive multiple sequence alignment through

25(16):2078–2079

sequence weighting, position-specifi c gap pen—

60. Creighton CJ, Reid JG, Gunaratne PH

alties and weight matrix choice. Nucleic Acids

(2009) Expression profi ling of microRNAs

Res 22(22):4673–4680

by deep sequencing. Brief Bioinform 10(5):

71. Notredame C, Higgins DG, Heringa J (2000)

490–497

T-Coffee: a novel method for fast and accurate

61. Green RE et al (2009) The Neandertal genome

multiple sequence alignment. J Mol Biol

and ancient DNA authenticity. EMBO J

302(1):205–217

28(17):2494–2502

72. Edgar RC (2004) MUSCLE: multiple sequence

62. Edgar RC (2010) Search and clustering orders

alignment with high accuracy and high through—

of magnitude faster than BLAST. Bioinformatics

put. Nucleic Acids Res 32(5):1792–1797

26(19):2460–2461

73. Trapnell C, Salzberg SL (2009) How to map

63. Li W, Godzik A (2006) Cd-hit: a fast program

billions of short reads onto genomes. Nat

for clustering and comparing large sets of pro—

Biotechnol 27(5):455–457

tein or nucleotide sequences. Bioinformatics 74. Li R et al (2008) SOAP: short oligonucleotide 22(13):1658–1659

alignment program. Bioinformatics

64. Niu B et al (2010) Artifi cial and natural dupli—

24(5):713–714

cates in pyrosequencing reads of metagenomic

75. Smith AD, Xuan Z, Zhang MQ (2008) Using

data. BMC Bioinformatics 11:187

quality scores and longer reads improves accu—

65. Blanca J, Chevreux B (2010) sff_extract.

racy of Solexa read mapping. BMC

http://bioinf.comav.upv.es/sff_extract/index

Bioinformatics 9:128

66. Langmead B et al (2009) Ultrafast and mem—

76. Li R et al (2009) SOAP2: an improved ultrafast

ory-effi

cient alignment of short DNA

tool for short read alignment. Bioinformatics

sequences to the human genome. Genome

25(15):1966–1967

Biol 10(3):R25

77. Zhang Z et al (2000) A greedy algorithm for

67. Applied Biosystems (2008) A theoretical

Other books

The Onus of Ancestry by Arpita Mogford
Into the Free by Julie Cantrell
White Horse by Alex Adams
When Morning Comes by Avril Ashton
Dedicated to God by Abbie Reese
Borrowed Ember by Samantha Young
A Study in Murder by Robert Ryan