Ancient DNA: Methods and Protocols (36 page)

This workfl ow is often hindered by the co-amplifi cation of contaminating DNA molecules.

Within the last decade, considerable technical advances have made it possible to sequence entire ancient genomes
( 1– 3 )
. As many of these approaches require only a single DNA library to be prepared prior to sequencing, the amount of work required to generate large amounts of ancient DNA sequences is reduced considerably. However, whole genome sequencing remains prohibitively expensive for most labs, and provides genetic information for only a single individual. For population studies, a major advance has been to couple the targeted analysis of specifi c loci with new high-throughput sequencing technology. DNA hybridization capture makes it possible to target specifi c genomic regions from ancient DNA libraries. These regions can range in length from a few 100 base-pairs (bp) to megabases (Mb) in size.

Here, I describe the use of the hybridization capture protocol described in Chapter 21 (Horn, this volume) to isolate mitochondrial control region data from a sample of subfossil Eurasian beaver (
Castor fi ber
) remains. Extant beavers are known to have pro-nounced phylogeographic structur
e ( 4 )
. The goal of the project was to determine whether these phylogeographic patterns were already present in ancient beavers.

2. Methods

 

2.1. DNA Extraction

I extracted DNA from 103
C. fi ber
bone and tooth samples ranging
and PCR Screening

in age from 400 to approximately 45,000 years old, as described pr
eviously ( 5
) . I amplifi ed two fragments of the mitochondrial control region, each around 90 bp in length including primers, to assess DNA preservation. Negative controls from the extractions did not yield PCR products of the expected size and therefore did not show signs of contaminating DNA. The 70 extracts that yielded at least one out of two PCR products were enriched for a 495-bp stretch of the mitochondrial control region using hybridization capture.

2.2. Library

I prepared barcoded genomic libraries from 70 ancient DNA

Preparation and

extracts and 8 negative controls for sequencing on the Illumina
Quantifi cation

GAII ( 6 )
. Four negative controls were included at the time of DNA extraction and another four at library preparation, both of which I performed in the clean room. I quantifi ed the sequencing libraries using quantitative PCR (qPCR). Most samples yielded between 3 × 10 6 and 1 × 10 8 copies per microliter (cp/ μ L), whereas negative controls yielded around 1 × 10 5 cp/ μ L. The qPCR products were visualized on agarose gels to identify those libraries that 22 Case Study: Enrichment of Ancient Mitochondrial DNA…

191

contained inserts and those that contained only adapter dimers formed during library preparation, as may occur when the amount of extracted DNA is insuffi cient for library preparation
( 6, 7 )
. The latter was the case for all negative controls. Libraries prepared from beaver samples that contained insuffi cient DNA were not processed further. Negative controls were carried through enrichment and sequencing in order to evaluate false assignment rates of barcodes and potential contamination at low levels
( 7
) .

2.3. Generation of Bait

I amplifi ed a roughly 650 bp stretch of beaver mitochondrial control region, which was designed to overlap the target 495 bp fragment on both ends, using biotin-11-dUTP (5

μ M fi nal

concentration) as described in Chapter 21 . Including the fl anking region in the bait molecules ensures that the entire 495 bp target region is captured effi ciently. I used the purifi ed amplifi cation product as bait for hybridization capture, which I performed in 96-well plates.

2.4. Serial

I performed two serial hybridization captures of the genomic
Hybridization Capture

libraries using biotinylated bait as described in Chapter 21 . The hybridization mixture contained about 17 ng of bait DNA and 170 ng of library in each well of a 96-well plate. For each hybridization capture, I placed the plate containing the hybridization mixture in a thermal cycler, heated it to 95°C for 5 min, and then cooled it to 65°C with 0.1°C/s, followed by incubation at 65°C

for 24 h. I then immobilized the hybridized DNA using Dynabeads (Invitrogen) as described in Chapter 21 .

After the fi rst hybridization capture, I amplifi ed the libraries using the Phusion ® High Fidelity PCR master mix (Finnzymes)
( 6 )
. I cleaned the reactions using the AMPure XP kit (Agencourt) and used them in a second round of hybridization capture.

Performing capture twice increases the yield of mitochondrial control region molecules for sequencing. I then amplifi ed and cleaned the resulting libraries using the Phusion ® High Fidelity PCR master mix (Finnzymes) and the AMPure XP kit.

I then quantifi ed the eluates containing the sequencing libraries enriched for mitochondrial control region DNA using a Nanodrop photospectrometer. This information was used to pool the libraries (both samples and negative controls) in equimolar amounts for sequencing.

2.5. Sequence

Illumina base calling was performed using the software IBIS
( 8
)
Analyses

and sequencing reads were sorted according to their corresponding barcode sequence as described in Chapter 23
( 6 )
. I then

mapped the sequencing reads to a control region sequence of
C.

fi ber
using the softwar
e bwa v0.5.5

( 9 )
. Reads were discarded unless they had a minimum mapping quality of 20 and a minimum length of 30 bp. Samples and negative controls for which fewer 192

S. Horn

than 5% of reads mapped to the target region were discarded from further analysis.

Since I used PCR amplifi cation of the library several times during the experiment, the same starting molecule may have been

sequenced multiple times. For this data set, each read that mapped to the target locus was sequenced around 100 times on average.

I therefore applied an additional fi lter in which these high frequency reads that start and end at the same position were only stored once (see Chapter 23 ). Low frequency reads were discarded because they often differed in their sequence from a high frequency read only by short indels. Thus, low frequency reads likely resulted from polymerase slippage, and were discarded prior to the generation of the consensus sequence. This was achieved by requiring that each high frequency read was observed 10 times at minimum. If more than 20 of the high frequency reads were present, they were used to create contigs of the target sequence. Finally, only contigs that covered more than 95% of the target were used for the generation of consensus sequences.

2.6. Phylogenetic

I aligned the consensus sequences to
Castor fi ber
control region
Analyses

sequences from GenBank using ClustalW as implemented in

BioEdit
( 10
) . I constructed a preliminary genealogy in Mega4
( 11
) using the neighbor-joining algorithm with the Kimura 2 parameter evolutionary model, the pairwise deletion criterion, and 1,000

bootstrap replicates.

3. Results

 

3.1. Sequence

The Illumina run yielded sequence data for all of the barcodes
Analyses

used, including those that were used for libraries created from negative controls. The raw number of sequencing reads per barcode refl ects the relative pooling of all libraries, which may be infl uenced by pipetting and quantifi cation errors. Therefore, the success of the enrichment and sequencing needs to be evaluated based on the percentage of reads that map to the target genomic region for each library. Enriched beaver sequencing libraries, on average, yielded around 24% (0.1–62.7%) of the reads mapping to the reference sequence. Six out of eight negative controls yielded fewer than 3%

(0.7–2.8%) of reads mapping to target. Two negative controls yielded 12.2 and 16.8% of reads mapping to target, respectively; both of these had been included at the later step of library preparation.

Based on the counts for an unused index, the false assignment rate of indexes was estimated to be 1 in 6,400, similar to previously reported values ranging between 1 in 1,000 and 1 in 10,000
( 7 )
.

After processing the sequencing reads through quality control fi lters 22 Case Study: Enrichment of Ancient Mitochondrial DNA…

193

described above, only sequencing libraries prepared from beaver samples remained for the construction of consensus sequences.

Out of 70 ancient
Castor fi ber
samples processed through enrichment and sequencing, 33 provided suffi cient high-quality sequence data to reconstruct their mitochondrial control region sequences.

4. Discussion

 

DNA hybridization capture can be an effi cient method for the targeted enrichment of many samples in parallel. To ensure that endogenous DNA survives in a sample prior to processing using this approach, it is recommended to screen the samples using PCR.

While none of the negative controls met the applied quality control fi lter criteria, neither did about half of the beaver samples (33

of 70), suggesting that the pre-screening process employed in the fi rst stages of this experiment was not suffi ciently strict. In the cases of the less well-preserved specimens, conventional PCR may have succeeded in amplifying the target region despite the survival of only a few starting template molecules. Quantitative PCR may be used to improve the effi ciency of the initial screening by discriminating well-preserved samples from poor samples. After sequencing, it is important to apply further quality fi lters to the data produced. The quality and length fi lters applied during mapping are useful to select sequences that originate from endogenous target-DNA. Filters to identify and account for high frequency reads are also useful to identify “real” sequences and generate the consensus sequence, in particular when the experimental setup includes amplifi cation steps.

Even when applying these stringent fi lter criteria, considerable challenges remain in the analysis of high-throughput sequencing data. Very deep sequencing, such as the 100× coverage obtained on average here, may be more sensitive to recovering contaminating DNA molecules
( 12 )
. This may explain why two out of eight negative control sequencing libraries initially (prior to applying quality control fi lters) showed more than 10% of sequencing reads mapping to the targeted genomic region.

In the classic approaches of targeted aDNA research, as soon as negative controls prove to be PCR negative, they are excluded from downstream analyses such as cloning and sequencing. Here, negative controls were carried throughout the entire experiment including sequencing. Quantitative PCR results suggest that the negative control sequencing libraries contained very low copy numbers of sequence, and agarose gel analyses suggested they were insert-free. However, the deep sequencing results showed low levels of sequence in two of the negative control libraries. As it is unclear when this contamination was introduced to the negative controls, 194

S. Horn

this underscores the importance of using extreme care when handling all samples simultaneously in a 96-well plate for enrichment and amplifi cation.

In addition to sequences in the negative controls, sequences were observed that mapped to the target region but also to an unused barcode. This most likely refl ects sequencing err
or ( 13 )
.

The quality control fi lter that selects for multiply-amplifi ed molecules may help to alleviate this problem.

4.1. Phylogeography

A preliminary phylogenetic tree comprising a subset of the
Castor fi ber
control region sequences obtained in this experiment is shown
in Fig. 1
. The results support the previously recognized western and eastern groups of
Castor fi ber
( 4 )
providing further evidence for the authenticity of the sequences. DNA capture by hybridization

72

tu4 (gi 54303865)

88

C. fiber

tu1 (gi 54303862)

96

tu2 (gi 54303863)

Eastern clade

31

tu3 (gi 54303864)

po2 (gi 54303861)

19

99 po1 (gi 54303860)

in2 (gi 54303867)

48

in3 (gi 54303868)

82

48

in1 (gi 54303866)

Ivanovskoe-4760-2481
(HQ880655)

76

52

Ivanovskoe-4760-662
(HQ880656)

88

Ivanovskoe-4760-2647
(HQ880654)

bi2 (gi 54303858)

bi3 (gi 54303859)

99

bi1 (gi 54303857)

fi1 (gi 68271291)

C. fiber

Lednicki-46-96
(HQ880653)

71

98

al2 (gi 68271290)

Western clade

55

al1 (gi 68271289)

ga1 (gi 68271292)

93

Gluchowo-B91
(HQ880652)

37

North-Sea-1751
(HQ880651)

69

North-Sea-1259
(HQ880649
)

94
North-Sea-1257
(HQ880650)

100

C. canadensis
(gi 251826448 )

C. canadensis
(gi 62287778)

0.02

Fig. 1. Genealogy of mitochondrial control region sequences of ancient and extant Eurasian beaver,
Castor fi ber
. Ancient beavers from Europe clustered into two groups: western beavers and eastern beavers. Accession numbers are given in brackets. The tree depicted is a neighbor-joining tree based on a 495-bp alignment (including gaps) rooted with sequences of the North American beaver
Castor canadensis
. Bootstrap support values are shown at the nodes.

Other books

1 Lowcountry Boil by Susan M. Boyer
Picture Her Bound-epub by Sidney Bristol
Alice Close Your Eyes by Averil Dean
Swept Away by Michelle Dalton