Rp-cs.usyd.edu.au
Performance comparison of benchtop high-throughput
sequencing platforms
Nicholas J Loman1, Raju V Misra2, Timothy J Dallman2, Chrystala Constantinidou1, Saheer E Gharbia2,
John Wain2,3 & Mark J Pallen1
Three benchtop high-throughput sequencing instruments are
The 454 GS Junior from Roche was released in early 2010 and is
now available. The 454 GS Junior (Roche), MiSeq (Illumina)
a smaller, lower-throughput version of the 454 GS FLX machine,
and Ion Torrent PGM (Life Technologies) are laser-printer sized
exploiting similar emulsion PCR and pyrosequencing approaches,
and offer modest set-up and running costs. Each instrument can
but with lower set-up and running costs. The Ion Torrent Personal
generate data required for a draft bacterial genome sequence in
Genome Machine (PGM) was launched in early 2011 (ref.). Like the
days, making them attractive for identifying and characterizing
454 GS Junior, this technology exploits emulsion PCR. It also incor-
pathogens in the clinical setting. We compared the performance
porates a sequencing-by-synthesis approach, but uses native dNTP
of these instruments by sequencing an isolate of Escherichia coli
chemistry and relies on a modified silicon chip to detect hydrogen
O104:H4, which caused an outbreak of food poisoning in Germany ions released during base incorporation by DNA polymerase (making
All rights reserved.
in 2011. The MiSeq had the highest throughput per run (1.6 Gb/
it the first ‘post-light' sequencing instrument). The Illumina MiSeq
run, 60 Mb/h) and lowest error rates. The 454 GS Junior generated was announced in January 2011 and began to ship to customers in
the longest reads (up to 600 bases) and most contiguous
the fourth quarter of 2011. The MiSeq is based on the existing Solexa
assemblies but had the lowest throughput (70 Mb/run, 9 Mb/h).
sequencing-by-synthesis chemistr but has dramatical y reduced run
Run in 100-bp mode, the Ion Torrent PGM had the highest
times compared to the Illumina HiSeq (fastest run 4 h versus 1.5 d
throughput (80–100 Mb/h). Unlike the MiSeq, the Ion Torrent PGM for 36-cycle sequencing or 16 h versus 8.5 d for 200-cycle sequenc-
and 454 GS Junior both produced homopolymer-associated indel
ing), made possible by a smaller flow cell, reduced imaging time and
errors (1.5 and 0.38 errors per 100 bases, respectively).
We wished to compare the performance of these three sequencing
Over the past decade and a half, genome sequencing has transformed platforms by analyzing data with commonly used assembly and analy-
almost every corner of the biomedical sciences, including the study of sis pipelines. We therefore benchmarked these platforms by using
bacterial pathogens In the last five years, high-throughput (or ‘next- them to sequence the genome of an isolate from the recent outbreak of
generation') sequencing technologies have delivered a step change food-borne il ness caused by Shiga-toxin-producing
E. coli O104:H4,
in our ability to sequence genomes, whether human or bacterialwhich struck Germany between May and July 2011. This outbreak was
Since arriving in the marketplace, these technologies have undergone responsible for >4,000 infections and more than 40 deaths Previous
sustained technical improvement, which, twinned with lively compe- whole-genome sequencing efforts applied to isolates from the out-
tition between alternative platforms, has placed genome sequencing break yielded novel diagnostic reagents and provided important clues
in a state of permanent revolution.
as to the nature, origins and evolution of the outbreak straThese
Although high-throughput sequencing has seen extensive use in efforts also demonstrated the utility of an ‘open-source' approach to
bacteriology, for example, in the genomic epidemiology of bacterial outbreak genomics that included rapid sequencing, a liberal approach
pathogen, until recently sequencing platforms were tailored chiefly to data release and use of crowdsourcing Although all infections
toward large-scale applications, focused on the race to the ‘$1,000 during the outbreak were acquired in Germany, travelers took their
human genome', with footprints, workflows, reagent costs and run infections back to other countries in North America and Europe,
times poorly matched to the needs of small laboratories studying including the United Kingdo. Here, we have focused on a single
small genomes. However, three different benchtop high-throughput
E. coli isolate of serotype O104 from the United Kingdom, which was
sequencing instruments are currently available, all capable of sequenc- epidemiologically linked to the German outbreak.
ing bacterial genomes in a matter of days ).
1Centre for Systems Biology, University of Birmingham, Birmingham, UK.
Creation of reference assembly
2Health Protection Agency, London, UK. 3School of Medicine, University of
To permit comparisons of benchtop sequencing data, we gener-
East Anglia, Norwich, UK. Correspondence should be addressed to M.J.P. (
[email protected]) or J.W. (
[email protected]).
ated a reference assembly for
E. coli O104:H4 strain 280 (UK Health
Protection Agency's materials identifier H112160280) using estab-
Received 19 December 2011; accepted 30 March 2012; published online 22 April 2012; corrected online 23 April 2012 (details online); corrected after print 7 June
lished high-throughput sequencing platforms. This isolate was
recovered from a female traveler returning from Germany who had
VOLUME 30 NUMBER 5 MAY 2012 nature biotechnology
Table 1 Price comparison of benchtop instruments and sequencing runs
of the three instruments (70–71 megabases).
Minimum throughput
Ion Torrent PGM runs generated over four
times the throughput of 454 GS Junior but
35 Mb (400 bases)
generated the shortest reads (mean 121
bases). The MiSeq run produced the greatest
10 Mb (100 bases)
throughput (1.6 gigabases) with reads
100 Mbd (100 bases)
1,000 Mb (100 bases)
slightly longer than those by Ion Torrent
1,500 Mb (2 × 150 bases)
PGM, permitting the multiplexing of seven
Note pricing may vary between countries and/or sales territories. Instrument prices do not include service contracts.
E. coli strains on a single run when targeting
Sample prices do not include the cost of generating the initial fragmented genomic DNA library with adaptors (an
40-fold coverage of each genome. MiSeq
additional cost of between $75−200 depending on method used). Cost per megabase assumes one sample and one .
reads were paired-end, that is, fragments
aIon Torrent PGM pricing from Invitrogen US territory website (http://www.invitrogen.com/, accessed 21 February 2012).
were sequenced in both directions. Across
bPrice includes Ion Torrent PGM, server, OneTouch and OneTouch ES sample automation systems. cIon Torrent PGM prices
include chip and sample preparation kit. dConfiguration used in this study.
the reference chromosome, coverage was
general y even for al technologies. However,
in the MiSeq data we saw a peak associ-
developed hemolytic uremic syndrome and thrombotic thrombocy- ated with the Shiga toxin–producing phage. A similar, but smaller,
topenic purpura. The isolate was confirmed as typical of the outbreak peak was detectable in the Ion Torrent PGM data (
Supplementary
strain (ST678,
stx-2 positive and intimin negative.
Fig. 1). These peaks may be explained by the occurrence of phage lysis
We used the Roche 454 GS FLX+ system to generate very long in the cultures used to prepare the DNA for sequencing. Differences
fragment reads (modal read length, 812 bases; maximum read length, in relative coverage levels were also seen in the pESBL and pAA
1,170 bases). Additionally, Roche 454 GS FLX was used to sequence plasmids between instruments, possibly due to the use of different
an 8-kb insert paired-end library using Titanium chemistry. The DNA shearing techniques in library preparation.
reads were assembled into contigs, which were scaffolded to produce
Because each manufacturer uses a unique software implementa-
a draft reference assembly. Mean coverage depth for the assembly tion to generate base-quality score predictions, direct comparison of
was 32-fold.
these scores between platforms is difficult. We recalibrated quality
The use of abundant long reads and long-insert, paired-end infor- scores for each instrument by first aligning reads to the reference
All rights reserved.
mation resulted in a very high-quality draft genome assembly con- genome. By observing the counts of matched and mismatched
sisting of three scaffolds. Of the bases in the assembly, 99.42% are bases in each aligned read, a new quality score can be calculated,
Q64 bases (representing accuracy of one miscall around every 2.5 M called alignment quality. We used a scoring system, previously
bases), 99.54% are Q40 (one miscall every 10,000 bases) or higher. published, which takes into account substitutions, insertions and
Bases with a quality score <40 were masked with a lower-case let- deletions. Mismatches resulting in deletions are assigned randomly
ter and excluded from further analysis. The largest scaffold corre- to the position of one of the adjacent bases in the read. Alignment
sponded to the chromosome (5,340,015 bp), the two smal er scaffolds quality scores measured in this way generally had good agreement
corresponded to two large plasmids (pESBL and pAA). The 1.5-kb with predicted scores, with the Ion Torrent PGM generally under-
plasmid sequence was present in a single contig. Although each scaf- estimating quality scores and the other instruments slightly over-
fold represented a single circular replicon, 153 gaps remained within estimating them (
Supplementary Fig. 2). The MiSeq produced the
the scaffolds. These gaps represent repetitive regions longer than highest quality reads, owing to a low substitution error rate (0.1
the mean read length and shorter than the paired-end insert library, substitutions per 100 bases) and the near absence of indel errors
which cannot be resolved by this sequencing strategy.
compared to the other platforms. The Ion Torrent PGM showed
a steadily decreasing accuracy across the read to the 100th base.
Characteristics of reads from benchtop sequencers
However, soft clipping of low-quality read ends by the BWA align-
Genome depth, evenness of coverage, read length and read quality are ment software serves to make the accuracy appear to increase after
the four major factors that determine the ability to reconstruct genome this point as mismatches are not counted in soft-clipped (unaligned)
sequences from sequence data. There were large differences in the parts of the read
number, predicted quality and length of reads obtained from the three
Comparison of the frequency of indels through alignment to
platforms and. The 454 GS Junior produced the longest the reference demonstrated that Ion Torrent PGM reads had 1.5
reads, with a mean length of 522 bases, but had the lowest throughput indels per 100 bases (1.72 indels per read). The 454 GS Junior had
Table 2 Run and alignment metrics for benchtop sequencers
Alignment coverage
Mean read length
Reads aligned (%)
454 GS Junior (1)
454 GS Junior (2)
Ion Torrent PGM (1)
MiSeq (1) demulti-
plexed strain 280
Metrics for each sequencing run are shown as well as results of alignment against the reference sequence. Depth of coverage for the chromosome and two large plasmids (pESBL and pAA) are shown with the percentage of reads that align. For the MiSeq run, the sequence metrics are shown for the entire run as well as the results of de-multiplexing
E. coli
O104:H4 strain 280. Alignment statistics for the entire run are not shown as two strains sequenced were of
E. coli isolates unrelated to the outbreak strain.
nature biotechnology VOLUME 30 NUMBER 5 MAY 2012
454 GS Junior (1+2)
Figure 1 Evaluation of read length and
Ion Torrent PGM (1+2)
MiSeq (280 strain)
quality from benchtop sequencers. (
a) Box
plots generated by the
qrqc software package
showing the predicted per-base quality score for combined sequencing runs for each benchtop
instrument at each read position created by the
qrqc package. Gray shaded bands indicate the
10% and 90% quantiles, orange shaded bands
Base quality (Phred-scaled
indicate the lower and upper quartiles, the blue
dot is the median. A purple smooth curve is fit
100 200 300 400 500 600
through the Quality scores are
given as Phred-scaled quality values where
Q = −10 log
454 GS Junior (1+2)
Ion Torrent PGM (1+2)
MiSeq (280 strain)
10
P (
P is the probability of the
base call being correct). (
b) Histograms showing
read lengths produced by each instrument.
(
c) Comparison of the predicted and measured
accuracy for each benchtop sequencer.
Predicted accuracy is determined by multiplying
the number of alignments of bases of each
quality score by the probability of an incorrect
10). The sum of these values is
0 100 200 300 400 500 600
divided by the number of aligned bases to give a
measurement of accuracy. (
d) The percentage of
reads aligned at each read position.
454 GS Junior (1+2)
Ion Torrent PGM (1+2)
MiSeq (strain 280)
0.38 indels per 100 bases (1.74 indels per
read). In contrast, indels were detected very
infrequently in MiSeq data with <0.001 indels
All rights reserved.
per 100 bases. These results were confirmed
by alignment to two other reference genomes
sequenced with other sequencing technol-
100 200 300 400 500 600
ogies (
Supplementary Tables 1–
3). As with
454 sequencing, the major source of indels
454 GS Junior (1+2)
Ion Torrent PGM (1+2)
MiSeq (strain 280)
in Ion Torrent PGM data are runs of identi-
cal bases (homopolymers). Comparison of
homopolymer accuracy between Ion Torrent
PGM and 454 GS Junior demonstrated
that Ion Torrent PGM was less accurate
when calling homopolymers of any length
Reads aligned (% 20
(
Supplementary Fig. 3). The dominant
source of error was deletions, with accuracy
100 200 300 400 500 600
rates as low as 60% for homopolymers 6 bases
Comparison of de novo assemblies
versus 311 contigs using CLC Assembly Cell). Scaffolds produced
The use of high-throughput sequencing for the discovery of differ- by assembly of Illumina MiSeq paired-end data gave output con-
ences in gene content and arrangement relies on the generation of taining runs of ambiguous ‘N' bases between 1 and 352 bases
accurate
de novo assemblies. We compared draft,
de novo assemblies in length (81 such runs in CLC Assembly Cell output, 153 runs
from benchtop instruments using a variety of metrics. Assembly in Velvet output).
metrics such as total assembly size and N50 (a statistic for describ-
The number of contigs that can be mapped unambiguously to the
ing the distribution of contig lengths in an assembly) (ref. 5) give reference gives a measure of reference genome coverage. Differences
a guide to assembly completeness or fragmentation but not accu- in reference genome coverage were seen when comparing assemblies
racy. An ideal assembly produces a single accurate contig for each from each platform (
Supplementary Table 4). None of the assemblies
replicon, but this is rarely possible owing to the presence of long generated aligned unambiguously to 100% of the reference. Contigs
repeat sequences. When comparing assemblies produced by benchtop obtained from the 454 GS Junior data aligned to the largest proportion
de novo sequencers, we saw two major groupings of assembly quality. of the reference, with 3.72% of the reference unmapped, compared to
Heavily fragmented assemblies were obtained with Ion Torrent PGM 4.6% for Ion Torrent PGM and 3.95% for MiSeq. The MIRA assembler
data (single runs or combined), 454 GS Junior (single runs) and MiSeq produced the assemblies with the highest coverage of the reference
and
Supplementary Table 4). Less fragmented assemblies were genome for each data type.
obtained when reads from two 454 GS Junior runs were combined to
The Ion Torrent PGM assemblies had large numbers of gaps
increase depth of coverage (98 contigs versus 150 contigs using the (), compared to assemblies obtained from 454 GS Junior and
assembler program MIRA) and when paired-end information was MiSeq data. Increasing sequence coverage by combining assemblies
used to scaffold contigs generated from the MiSeq data (200 scaffolds from the two Ion Torrent PGM runs reduced the number of gaps in
VOLUME 30 NUMBER 5 MAY 2012 nature biotechnology
Figure 2 N50 contig sizes from assemblies generated from sequence data
for each sequencing platform. A selection of popular genome assemblers
have been used. The N50 contig size is calculated using the total genome length of the
E. coli strain 280 reference sequence, rather than the sum total of contig lengths.
the assembly. With the MIRA assembly, the combined Ion Torrent
PGM run showed 38% fewer gaps than Ion Torrent PGM run number
two alone. However, many miscalls in long homopolymeric tracts
remained, so that in assemblies produced from combining both Ion
Torrent PGM data sets, large numbers of contigs were disrupted either
by contig breaks or apparent frameshifts. Of the 2,017 gaps seen in
the combined Ion Torrent PGM assembly produced by the Newbler
assembler (1,811 gaps for MIRA), around a third to a quarter were
due to gaps associated with ends of contig or unmapped sequence,
the rest being associated with homopolymeric tracts. Although the
likelihood of a gap increases with the length of the homopolymer,
the number of very short homopolymers (2–3 residues) resulting in
assembly gaps was significantly higher for this platform than for the
454 GS Junior. Manual inspection of assembly alignments revealed
that many of the indels associated with short homopolymeric tracts
demonstrated strand bias, with the correct cal predominantly associ-
MiSeq (scaffolds)
454 GS Junior (1+2)
Ion Torrent PGM (1)
Ion Torrent PGM (2)
ated with either forward or reverse reads and the erroneous sequences
Ion Torrent PGM (1+2
associated with the opposite strand (
Supplementary Fig. 4).
Although problems with homopolymers are known to result from all instruments did badly—for instance, in all assemblies the two
flow cell–based chemistries, it is unclear why this strand bias larger plasmids were broken into multiple contigs, which could not
All rights reserved.
should occur with Ion Torrent technology. However, scrutiny of be readily assigned to chromosome or plasmid without alignment to
other public data sets from this instrument (the reference genome.
) suggests it is a
We used 31 protein sequences linked to pathogen biology as
pervasive problem.
queries in translated BLAST searches of the assemblies obtained
from the benchtop sequencing platforms (. No assembly
Benchtop assemblies and public health microbiology
contained a full set of full-length sequences. The best MiSeq
A key test for a genome-sequencing technology is whether it can assembly captured 29/31 full-length sequences, the best 454 GS
deliver trustworthy new insights into the biology of the organism Junior assembly found 26 and the best Ion Torrent PGM assembly
under scrutiny. We therefore evaluated how
de novo assemblies gener- found 23. Perhaps the most challenging targets in the survey were
ated from data from each platform performed in reporting features the four serine protease autotransporters encoded in the genome
of biological interest in the outbreak strain. For some features, all of the outbreak strain. None of the platforms managed to recover
analyses did well—for example, all documented the presence and all four genes as full-length fragments. This is because these genes
accurate full-length sequence of the genes encoding the Shiga toxin contain multiple domains and some domains exist as multiple copies
type-2 subunits. However, at the other extreme, in some instances, in the genome, which are assembled into repeat consensus con-
tigs that cannot be unambiguously placed
in the genome. Notably, the choice of
assembler affected the ability to detect
certain genes; for example, CLC Assembly
Cell and MIRA run with Illumina
MiSeq were able to reconstruct al four of the
aggregative adhesion fimbrial genes tested,
whereas Velvet could reconstruct only two.
Integration of whole-genome sequenc-
ing into existing practice in a public health
laboratory requires backwards compatibility
Figure 3 An analysis of gaps when aligning draft
de novo assemblies to the reference genome.
(
a, top panel) The number of gaps that are
not associated with homopolymeric tracts, for
example, contig breaks, misassemblies or missing
sequence. (bottom panel) The number of gaps that are associated with homopolymeric tracts for
each draft assembly. (
b) The length of erroneously
454 GS Junior (1)
454 GS Junior (2)
MiSeq (scaffolds)
454 GS Junior (1+2)
Ion Torrent PGM (1)
Ion Torrent PGM (2)
Ion Torrent PGM (1+2)
called homopolymeric tracts for each 454 GS
Homopolymer length
Junior and Ion Torrent PGM assembly.
nature biotechnology VOLUME 30 NUMBER 5 MAY 2012
Table 3 Full-length identical matches of clinically important proteins against draft assemblies
resistance MLST genes
MiSeq (scaffolds)
MiSeq (scaffolds)
Ion Torrent (1+2)
Ion Torrent (1+2)
Protein coding sequences were searched against draft assemblies for each benchtop instrument using translated BLAST (tblastn, part of the BLAST 2.2.22 package). The results
show the number of matches that are identical to the sequence in the reference assembly. For MLST sequences, the nucleotide sequences and nucleotide BLAST (blastn) was used.
A summary of BLAST results can be found in
Supplementary Table 5. SPATEs, serine protease autotransporters.
with existing typing methods. We therefore attempted to generate
Speed, set-up, running costs and simplicity of workflow are also
multi-locus sequence typing (MLST) profiles from each assembly. important factors when comparing these platforms. The Ion Torrent
An accurate MLST profile was generated for the outbreak strain by PGM is the lowest-price instrument. The cost per base of generating
All rights reserved.
all assemblies using MiSeq data. However, some 454 GS Junior and sequence data appears to be an order of magnitude higher for the 454
Ion Torrent PGM assemblies generated indel errors in at least one GS Junior than for the other two platforms. The MiSeq workflow has
housekeeping gene.
the fewest manual steps as template amplification is done directly
on the instrument without manual intervention in contrast to the
Ion Torrent PGM and 454 GS Junior, which require preparation of
In our evaluation, all three benchtop sequencing platforms gener- amplified sequence libraries through emulsion PCR and enrichment
ated useful draft genome sequences of the German
E. coli outbreak stages off the instrument. The Ion Torrent PGM is notable for offering
strain. All could be judged suitable for bacterial genome sequencing, three differently priced sequencing-chip reagents, which gives flex-
in producing assemblies that mapped to 95% or more of the refer- ibility when designing experiments, as a choice can be made based on
ence genome and recovered the vast majority of coding sequences. the throughput required. Since this study was carried out, a paired-
As expected, no instrument could generate completely accurate one- end protocol for the Ion Torrent PGM has been announced, similar
contig-per-replicon assemblies that might equate to a finished to that for the MiSeq, which requires a second sequencing reaction
genome. Thus, for each technology there is a trade-off between to be done immediately after the first, which also has the effect of
advantages and disadvantages. In our survey, the MiSeq generated the doubling the run-time
highest throughput per run and lowest error rate of the instruments, .
without significant indel errors and the lowest rate of substitution
One important conclusion from this evaluation is that saying that
errors (although accuracy does drop off toward the ends of reads). one has "sequenced a bacterial genome" means different things on
However, the MiSeq delivered shorter read lengths than the 454 GS different benchtop sequencing platforms. Potential users of these
Junior, probably a significant factor in the lower quality assemblies technologies need to be sensitive to these differences, particularly
produced from MiSeq data. Even with paired-end sequencing, the when comparing or combining data generated on different platforms.
single scaffold assemblies from the MiSeq are interrupted by unfill- It is also important to ask (i) to what extent errors can be corrected
able gaps, representing difficult-to-resolve repeats. The MiSeq was the by comparison to reference data, (ii) when it is safe to use a mapping
longest-running instrument, with paired-end, 150-base sequencing approach that makes assumptions about the resemblance of a novel
on a pre-release instrument taking >27 h (60 Mb/h).
sequence to an existing reference sequence and (iii) how much one
The 454 GS Junior delivered the longest read length but the should have to rely on human insight rather than automated analyses
lowest throughput (8 Mb/h during a 9-h run) and suffered from and pipelines. In this study, we set a tough test by evaluating algorith-
errors in homopolymeric tracts, even when assembled at high cover- mical y generated
de novo assemblies. However, during the real-world
age. Each Ion Torrent PGM run produced the shortest reads and the test case of the German
E. coli outbreak, even the Ion Torrent plat-
worst performance with homopolymers. However, it delivered the form, using the 314 chip with its low throughput and high error rate,
fastest throughput (80–100 Mb/h) and shortest run time ( 3 h). This delivered useful insights into the biology and evolution of the out-
platform has also shown the greatest improvement in performance break strain—for example, a homopolymer error in an MLST profile
in recent months. An assembly for the outbreak strain generated in was easily corrected by manual comparison to database sequences.
May 2011 from data from the original Ion Torrent 314 chip contained We are thus confident that benchtop high-throughput sequencing
>3,000 contigs whereas, in this study, data from the recently avail- platforms are poised to make a decisive impact on diagnostic and
able 316 chip were assembled into <400 contigs.
public health microbiology in the near future.
VOLUME 30 NUMBER 5 MAY 2012 nature biotechnology
CoMPETING FINANCIAL INTERESTS
Methods and any associated references are available in the online The authors declare competing financial interests: details accompany the ful -text
version of the paper at. HTML version of the paper at
Published online at
Accession codes. 454 sequences have been deposited into the Short Reprints and permissions information is available online at
Read Archive under study number SRA048574, with run accessions
SRR388806 (454 GS Junior run 1), SRR388807 (454 GS Junior run 2),
SRR388808 (454 FLX+) and SRR388809 (454 Titanium 8 kb paired-end). 1. Pallen, M.J., Nelson, K. & Preston, G.M.
Bacterial Pathogenomics (ASM Press,
Ion Torrent PGM sequences have been deposited under study number
SRA048511, with accessions SRR389193 (Ion Torrent PGM run 1), 2. Metzker, M. Sequencing technologies—the next generation.
Nat. Rev. Genet. 11,
31–46 (2010).
SRR389194 (Ion Torrent PGM run 2). The multiplexed MiSeq reads 3. Glenn, T. Field guide to next-generation DNA sequencers.
Mol. Ecol. Resour. 11,
have been deposited under study number SRA048664. Assembly files
759–769 (2011).
and analysis scripts have been uploaded to a public Github repository 4. Pallen, M., Loman, N. & Penn, C. High-throughput sequencing and clinical
microbiology: progress, opportunities and challenges.
Curr. Opin. Microbiol. 13,
625–631 (2010).
5. Rothberg, J.
et al. An integrated semiconductor device enabling non-optical genome
sequencing.
Nature 475, 348–352 (2011).
Note: Supplementary information is available on the website.
6. Bentley, D.
et al. Accurate whole human genome sequencing using reversible
terminator chemistry.
Nature 456, 53–59 (2008).
7. Frank, C.
et al. Epidemic profile of Shiga-toxin-producing
Escherichia coli O104:
We grateful y acknowledge the blogging community for helpful discussion in the
H4 outbreak in Germany.
N. Engl. J. Med. 365, 1771–1780 (2011).
comments section of our blog , and
8. Brzuszkiewicz, E.
et al. Genome sequence analyses of two isolates from the recent
Escherichia coli outbreak in Germany reveal the emergence of a new pathotype:
in particular to B. Chevreux, J. Johnson, K. Robison and L. Nederbragt. We are
entero-aggregative-haemorrhagic
Escherichia coli (EAHEC).
Arch. Microbiol. 193,
grateful to C. Hercus at Novocraft for help with the Novoalign software and to
883–891 (2011).
A. Darling for help with Mauve Assembly Metrics. We thank Roche Diagnostics,
9. Mellmann, A.
et al. Prospective genomic characterization of the German
UK, for 454 GS FLX+ and 454 FLX paired-end sequencing, technical support
enterohemorrhagic
Escherichia coli O104:H4 outbreak by rapid next generation
and helpful discussion. We thank Life Technologies for early access to 316 chips
sequencing technology.
PLoS ONE 6, e22751 (2011).
and instrument fluidics upgrade. We thank G. Smith and Il umina UK for early
10. Rohde, H.
et al. Open-source genomic analysis of Shiga-toxin-producing
E. coli
access to the MiSeq platform and public release of
E. coli outbreak-strain data.
O104:H4.
N. Engl. J. Med. 365, 718–724 (2011).
We thank the three anonymous reviewers for their many helpful suggestions for
11. Rasko, D.
et al. Origins of the
E. coli strain causing an outbreak of hemolytic-uremic
syndrome in Germany.
N. Engl. J. Med. 365, 709–717 (2011).
All rights reserved.
improving the manuscript. The xBASE facility and N.J.L. are funded by BBSRC
12. Grad, Y.
et al. Genomic epidemiology of the
Escherichia coli O104:H4 outbreaks
grant BBE0111791.
in Europe, 2011.
Proc. Natl. Acad. Sci. USA 109, 3065–3070 (2012).
13. Ewing, B. & Green, P. Base-calling of automated sequencer traces using phred. II.
Error probabilities.
Genome Res. 8, 186–194 (1998).
N.J.L., J.W., S.E.G. and M.J.P. conceived the experiments; J.W. and S.G. supplied
14. Li, H.
et al. The Sequence Alignment/Map format and SAMtools.
Bioinformatics 25,
2078–2079 (2009).
the strains; N.J.L., R.V.M. and T.J.D. carried out the bioinformatics analysis;
15. Kingsford, C., Schatz, M. & Pop, M. Assembly complexity of prokaryotic genomes
C.C. performed the Ion Torrent sequencing; and S.E.G. and R.V.M. performed
using short reads.
BMC Bioinformatics 11, 21 (2010).
the 454 GS Junior sequencing. N.J.L. and M.J.P. wrote the manuscript. Al authors
16. Buffalo, V. qrqc: Quick Read Quality Control R package version 1.9.1
commented on the manuscript.
> (2012).
nature biotechnology VOLUME 30 NUMBER 5 MAY 2012
The run was initiated for 2 × 150 bases of SBS sequencing, including on-board
Collection of isolates. E. coli strain 280 was grown according to the protocol clustering and paired-end preparation, the sequencing of the seven barcode
describe. To generate enough DNA for sequencing, the isolate was grown indices and analysis. On the completion of the run, data were base called and
on multiple occasions.
demultiplexed on the instrument (provided as Il umina FASTQ files, Phred+64
encoding). FASTQ format files in Illumina 1.5 format were considered for
Sequencing workflow. A general, simplified workflow for library prepara-
downstream analysis. Although MiSeq produces reads of fixed lengths, tails of
tion, amplification and sequencing is shown in
Supplementary Figure 5 with these reads may be designated as uncallable as indicated by the read segment
approximate timings for each stage. These stages comprise library prepara-
quality control indicator, noted by a quality score of two (‘B'). In these cases
tion from genomic DNA, amplification and sequencing. Library preparation these low-quality tails were trimmed and not used for further analysis.
steps are similar for each instrument, involving extraction and purification of
genomic DNA, fragmentation through either enzymatic or physical means,
Bioinformatics
fragment size selection and ligation of sequencing adaptors.
Construction of reference assembly. A high-quality reference sequence for
E. coli strain 280 was constructed by assembling 454 FLX+ long read data and
Ion Torrent sequencing. Ion Torrent sequencing was performed at the University 454 Titanium paired-end data (8-kb insert) using Newbler 2.6. Newbler was
of Birmingham according to the Ion Torrent protocol (Life Technologies). run with parameters -scaffold -tr -cpu 8 -siom 28 -rip. The resulting scaffolds
Total DNA from
E. coli O104:H4 280 was isolated. Ten micrograms of were used for further analysis. Newbler masks certain bases in the assembly
this DNA was fragmented with a Bioruptor instrument (Diagenode, Liège, regarded as uncertain by assigning it a lower-case nucleotide. These masked
Belgium) using the protocol recommended by Life Technologies. A broad bases correspond with bases with a low-quality score. In bacterial genomes
profile of fragment sizes (75–500 bp, peak at 255 bp) were obtained that were these bases are seen predominantly in consensus contigs resulting from long
end-repaired, ligated with Ion Torrent A and P1 adaptors and size selected repeat regions, long homopolymeric tracts and contig ends. The resulting
using E-Gel EX 2% Gel (Invitrogen, Carlsbad, CA) for 150- to 250-bp frag-
assembly was annotated using the automated xBASE annotation pipeline,
ments. The size-selected fragments were amplified and DNA was purified which uses Glimmer for coding sequence prediction and tRNAScan-SE and
with Agencourt AMPure XP beads (Beckman Coulter Genomics, High RNAmmer for stable RNA predicti.
Wycombe, UK). The median fragment size of the final library was 200 bp
(assessed by a BioAnalyzer High Sensitivity LabChip, Agilent). Library was
De novo assembly of individual strains. Assemblies were generated from data
diluted to 40 pM and two emulsion PCR reactions were set up at two tem-
generated by each of the benchtop sequencing platforms separately. Al data were
plates per sphere. Sequencing primer and polymerase were added to the final assembled by MIRA 3.4.0 using default parameters in genome,denovo,accurate
enriched spheres before loading onto the 316 chip. Two 316 chips were run in mode and the appropriate setting for each instrument type (454,iontor,solexa).
total. Base calls were generated using version 1.5 of the Ion Torrent software Ion Torrent and 454 GS Junior data were additional y assembled with Newbler
All rights reserved.
suite and for further analysis, the resulting flowgram files (assembly) or FASTQ 2.6 with default parameters. Il umina MiSeq data were additional y assembled
files (alignment) were used.
using Velvet and CLC Assembly Cell (both de Bruijn graph assemblers). Velvet
was run using a
k-mer value of 55 and exp_cov and cov_cutoff set to auto. The
454 GS Junior sequencing. 454 GS Junior sequencing was carried out on an program was run again with -scaffolding off to generate a separate assembly
instrument at the Health Protection Agency, Colindale, UK.
E. coli O104:H4 without scaffolds. CLC Assembly Cel version 4.0.6 beta was run with default
280 DNA was prepared following the Roche Rapid Library protocol (Roche, parameters.
De novo assemblies were compared for chromosomal coverage and
Welwyn Garden City, UK), whereby 5 ng/µl was taken from each sample broken genes, among other items using Mauve (mauve_snapshot_2011-08-19)
and libraries prepared. Briefly, samples were subjected to the following key and the Mauve Assembly Metrics packag Assemblies were manual y exam-
steps: DNA fragmentation by nebulization, fragment end-repair, AMPure XP ined using the Tablet viewer. Assembly gaps were inspected using a custom
bead preparation (Amersham International, Buckinghamshire, UK), adaptor script extract_hp.py, which uses as input gaps reported by Mauve Assembly
ligation, small fragment removal, quality assessment using the Agilent 2100 Metrics. Gaps in the whole-genome alignment that are associated with
Bioanalyzer, library quantification and finally preparation of working aliquots homopolymeric tracts in the reference (of length two or more) were categorized
at a final concentration of 1 × 107 molecules (500 ng total). Emulsions PCR, as homopolymer gaps; other gaps were categorized as assembly gaps. Gaps in
enrichment and 454 GS Junior sequencing were carried out per manufacturer's the reference sequence were not counted.
protocols. The resulting flowgram files were used for downstream analysis.
Read mapping. For substitution and indel detection, reads from each platform
454 GS FLX+ and 454 GS FLX 8-kb titanium sequencing. 454 GS FLX were aligned to the reference assembly using the bwasw module of BWA (version
8-kb titanium paired-end and 454 FLX+ (long read) library construction and 0.5.9rc1). The reference genome was indexed with bwa index -a is. The bwasw
sequencing was performed at Roche Diagnostics (Burgess Hil , UK) according module was run with default parameters (gap open penalty 5, gap extension
to their standard protocols.
penalty 2) using FASTQ files as input. Output BAM files were post-processed
using the calmd module of SAMtools, which adds MD tags to each alignment.
Illumina MiSeq sequencing. Illumina MiSeq sequencing was done at Illumina The MD tag describes the positions of base substitutions. Reads that align to
UK, Little Chesterford, UK, on a pre-release, prototype MiSeq instrument. The masked bases in the reference genome were excluded from analysis. Read accu-
seven
E. coli samples were quantified with a Qubit High Sensitivity kit and racy was determined by a custom Python script (calculate_accuracy.py, available
the total amount of DNA for each sample varied between 523 ng and 954 ng. in the Github repository) that uses the pysam module
Samples were sheared with a Covaris S2 instrument followed by end repair, to read the BAM alignment. The calculate_accuracy script counts
A-tailing and the ligation of TruSeq adaptors containing indexes. Samples were run mismatches using a published method which counts mismatches resulting
on a 2% agarose gel (2 samples per gel) and DNA was size selected at 600–700 bp. from substitutions, insertions and deletions. In the case of deletions, mismatches
Ten cycles of PCR were carried out and samples run out on a second 2% aga-
are assigned to one of the adjacent bases in the read at random. Reads were addi-
rose gel (two samples per gel). Samples were excised from the gel and quanti-
tional y mapped against
E. coli strain c236-11 (PacBio and Illumina sequenced)
fied with a Qubit high-sensitivity kit. Libraries were diluted to 2 nM in EB plus and
E. coli strain 55989 (Sanger sequenced).
0.1% Tween and a pool containing an equimolar concentration of each library
For generation of homopolymer accuracy plots, reads for each of the
was prepared. MiSeq instrument was prepared following routine procedures. benchtop sequencing platforms were mapped to the reference assembly
Briefly, a standard MiSeq flow cell was inserted into the flow-cell chamber. using Novoalign (version V2.07.13, Novocraft, Malaysia, registered version).
Next, the DNA sample containing the pool of seven
E. coli libraries was diluted Gap penalties were adjusted with parameters as recommended by the docu-
to 6.2 pmol and pipetted into the sample well on the MiSeq Consumable mentation -g 20 -x 5. Novoalign was set to align its maximum supported read
Cartridge before loading in the chiller section of the MiSeq instrument. length of 300 using -n 300. Homopolymeric tract statistics were enabled using
A sample sheet was prepared on the MiSeq instrument to provide run details. the –hpstats option.
nature biotechnology
17. Chattaway, M., Dallman, T., Okeke, I. & Wain, J. Enteroaggregative
E. coli O104
21. Lagesen, K.
et al. RNAmmer: consistent and rapid annotation of ribosomal RNA
from an outbreak of HUS in Germany 2011, could it happen again?
J. Infect. Dev.
genes.
Nucleic Acids Res. 35, 3100–3108 (2007).
Ctries. 5, 425–436 (2011).
22. Darling, A., Tritt, A., Eisen, J. & Facciotti, M. Mauve assembly metrics.
Bioinformatics
18. Chaudhuri, R.
et al. xBASE2: a comprehensive resource for comparative bacterial
27, 2756–2757 (2011).
genomics.
Nucleic Acids Res. 36, D543–546 (2008).
23. Milne, I.
et al. Tablet–next generation sequence assembly visualization.
Bioinformatics
19. Delcher, A., Bratke, K., Powers, E. & Salzberg, S. Identifying bacterial genes and
26, 401–402 (2010).
endosymbiont DNA with Glimmer.
Bioinformatics 23, 673–679 (2007).
24. Li, H. & Durbin, R. Fast and accurate long-read alignment with Burrows-Wheeler
20. Lowe, T. & Eddy, S. tRNAscan-SE: a program for improved detection of
transform.
Bioinformatics 26, 589–595 (2010).
transfer RNA genes in genomic sequence.
Nucleic Acids Res. 25, 955–964
25. Touchon, M.
et al. Organised genome dynamics in the
Escherichia coli species
results in highly diverse adaptive paths.
PLoS Genet. 5, e1000344 (2009).
All rights reserved.
nature biotechnology
Erratum: Reinventing clinical trials
Malorye Allison
Nat. Biotechnol. 30, 41–49 (2012); published online 9 January 2012; corrected after print 11 May 2012
In the version of the article originally published, the ExoInTouch product being used by Pfizer in its virtual trial for overactive bladder (OAB)
is eDiary, not Recruit, which allows patients to report through mobile phone or internet portals. The Recruit technology is being used in other
studies. The text references to Recruit have been replaced with an explanation of eDiary. Instead of "new technology to recruit patients faster and
in a more standardized fashion," the text now reads, "new technology to al ow home-based clinical trial data reporting." Instead of "‘Recruit' text
messaging technology in a pilot study" for Detrol, the text now reads, "‘eDiary' tool in a Phase 4 trial, cal ed Research on Electronic Monitoring of
OAB Treatment Experience." Additional explanation has been added, including "Patients can respond to simple questionnaires (
Fig. 3) via their
mobile phones or home computers. If they delay in responding, a reminder can be sent." And for space reasons, other text relating to Recruit, "The
tool is integrated with Pfizer's volunteer database and allows immediate text message–based communication and assessment of a subject's suit-
ability within 5–10 min" and "It can also be used to send protocol-specific messages to patients already enrol ed in trials" was deleted. In addition,
it should have been noted that Eric Westin, who was interviewed while senior director of Lil y Oncology, had left the company. The errors have
been corrected in the HTML and PDF versions of the article.
Erratum: Parallel genome universes
Tom Misteli
Nat. Biotechnol. 30, 55–56 (2012); published online 9 January 2012; corrected after print 7 June 2012
In the version of this article initial y published, the volume number and year of reference 2 should have been 30 and 2012, and not 29 and 2011,
respectively. The errors have been corrected in the HTML and PDF versions of the article.
All rights reserved.
Erratum: BASF moves GM crop research to US
Lucas Laursen
Nat. Biotechnol. 30, 204 (2012); published online 7 March 2012; corrected after print 7 June 2012
In the version of this article initial y published, BASF's Amflora, a genetical y modified potato for industrial use, was mistakenly said to be blight
America, Inc.
resistant when it is not. The error has been corrected in the HTML and PDF versions of the article.
Erratum: In Their Words
2012 Nature
Nat. Biotechnol. 30, 203 (2012); published online 7 March 2012; corrected after print 7 June 2012
In the version of this article initial y published online, Craig Thompson was incorrectly identified as the president of Rockefel er University. He is
the president of Memorial Sloan-Kettering Cancer Center in New York. The error has been corrected for the PDF version of this article.
Corrigendum: Performance comparison of whole-genome sequencing
platforms
Hugo Y K Lam, Michael J Clark, Rui Chen, Rong Chen, Georges Natsoulis, Maeve O'Huallachain, Frederick E Dewey, Lukas Habegger,
Euan A Ashley, Mark B Gerstein, Atul J Butte, Hanlee P Ji & Michael Snyder
Nat. Biotechnol. 30, 78–82 (2012); published online 18 December 2011; corrected after print 7 June 2012
In the version of this article initially published, the accession code to obtain raw sequence data was given as SRA045736.2; the correct code is
SRA045736. The error has been corrected in the HTML and PDF versions of the article.
Corrigendum: Performance comparison of benchtop high-throughput
sequencing platforms
Nicholas J Loman, Raju V Misra, Timothy J Dallman, Chrystala Constantinidou, Saheer E Gharbia, John Wain & Mark J Pallen
Nat. Biotechnol. 30, 434–439 (2012); published online 22 April 2012; corrected online 23 April 2012; corrected after print 7 June 2012
In the version of this article initially published online, in the Online Methods "Ion Torrent Sequencing" section, the sentence beginning with
"Ten milligrams of this DNA was fragmented with a Bioruptor instrument…." should have read "Ten micrograms…." and in the "454 GS Junior
sequencing" section, "(500 total)" should have read "(500 ng total)." The errors have been corrected in the PDF and HTML versions of this article.
volume 30 number 6 June 2012 nature biotechnology
Source: http://rp-www.cs.usyd.edu.au/~mcharles/teaching/info5010/student_resources/nbt.2198.pdf
Kawsar, et al / Journal of SUB 4(2): 89-102, 2013 Phosphatidylcholine: A Review Md. Hassan Kawsar1, Md. Firoz Khan2 and Md. Akbar Hossain3 Abstract: In recent years Phosphatidylcholine has greatly impacted the drug delivery technology. The very first and most important advantage of phospholipid based vesicular system is the compatibility of phospholipids with membrane of human either internal membrane or skin (external membrane). For a drug to be absorbed and distributed into organs and tissues and eliminated from the body, it must pass through one or more biological membrane(s)/ barrier(s) at various locations. Such a movement of drug across the membrane is called drug transport. For the drugs to be delivered to the body should cross the membranous barrier, either it would be from oral route or topical/transdermal route. Therefore the phospholipid based carrier systems are of considerable interest in this era. A number of drug delivery systems are based entirely on Phosphatidylcholine such as Liposomes, Ethosomes, Phytosomes, Transferosomes and Nanocochelates.
Consulting in Spring has arrived in the Niki Borchardt also joined We are also co-hosting a South-east and after the good the team in August, working golf day with Limestone BORDERTOWN rains across the region in the as a receptionist in Keith. Coast Agri-Links on 15th 8752 2330 last month, everyone seems Unfortunately Niki can no