Rp-cs.usyd.edu.au

Performance comparison of benchtop high-throughput sequencing platforms Nicholas J Loman1, Raju V Misra2, Timothy J Dallman2, Chrystala Constantinidou1, Saheer E Gharbia2,
John Wain2,3 & Mark J Pallen1
Three benchtop high-throughput sequencing instruments are
The 454 GS Junior from Roche was released in early 2010 and is now available. The 454 GS Junior (Roche), MiSeq (Illumina)
a smaller, lower-throughput version of the 454 GS FLX machine, and Ion Torrent PGM (Life Technologies) are laser-printer sized
exploiting similar emulsion PCR and pyrosequencing approaches, and offer modest set-up and running costs. Each instrument can
but with lower set-up and running costs. The Ion Torrent Personal generate data required for a draft bacterial genome sequence in
Genome Machine (PGM) was launched in early 2011 (ref.). Like the days, making them attractive for identifying and characterizing
454 GS Junior, this technology exploits emulsion PCR. It also incor- pathogens in the clinical setting. We compared the performance
porates a sequencing-by-synthesis approach, but uses native dNTP of these instruments by sequencing an isolate of Escherichia coli
chemistry and relies on a modified silicon chip to detect hydrogen O104:H4, which caused an outbreak of food poisoning in Germany ions released during base incorporation by DNA polymerase (making
All rights reserved.
in 2011. The MiSeq had the highest throughput per run (1.6 Gb/
it the first ‘post-light' sequencing instrument). The Illumina MiSeq run, 60 Mb/h) and lowest error rates. The 454 GS Junior generated was announced in January 2011 and began to ship to customers in
the longest reads (up to 600 bases) and most contiguous
the fourth quarter of 2011. The MiSeq is based on the existing Solexa assemblies but had the lowest throughput (70 Mb/run, 9 Mb/h).
sequencing-by-synthesis chemistr but has dramatical y reduced run Run in 100-bp mode, the Ion Torrent PGM had the highest
times compared to the Illumina HiSeq (fastest run 4 h versus 1.5 d throughput (80–100 Mb/h). Unlike the MiSeq, the Ion Torrent PGM for 36-cycle sequencing or 16 h versus 8.5 d for 200-cycle sequenc-
and 454 GS Junior both produced homopolymer-associated indel
ing), made possible by a smaller flow cell, reduced imaging time and errors (1.5 and 0.38 errors per 100 bases, respectively).
We wished to compare the performance of these three sequencing Over the past decade and a half, genome sequencing has transformed platforms by analyzing data with commonly used assembly and analy- almost every corner of the biomedical sciences, including the study of sis pipelines. We therefore benchmarked these platforms by using bacterial pathogens In the last five years, high-throughput (or ‘next- them to sequence the genome of an isolate from the recent outbreak of generation') sequencing technologies have delivered a step change food-borne il ness caused by Shiga-toxin-producing E. coli O104:H4, in our ability to sequence genomes, whether human or bacterialwhich struck Germany between May and July 2011. This outbreak was Since arriving in the marketplace, these technologies have undergone responsible for >4,000 infections and more than 40 deaths Previous sustained technical improvement, which, twinned with lively compe- whole-genome sequencing efforts applied to isolates from the out- tition between alternative platforms, has placed genome sequencing break yielded novel diagnostic reagents and provided important clues in a state of permanent revolution.
as to the nature, origins and evolution of the outbreak straThese Although high-throughput sequencing has seen extensive use in efforts also demonstrated the utility of an ‘open-source' approach to bacteriology, for example, in the genomic epidemiology of bacterial outbreak genomics that included rapid sequencing, a liberal approach pathogen, until recently sequencing platforms were tailored chiefly to data release and use of crowdsourcing Although all infections toward large-scale applications, focused on the race to the ‘$1,000 during the outbreak were acquired in Germany, travelers took their human genome', with footprints, workflows, reagent costs and run infections back to other countries in North America and Europe, times poorly matched to the needs of small laboratories studying including the United Kingdo. Here, we have focused on a single small genomes. However, three different benchtop high-throughput E. coli isolate of serotype O104 from the United Kingdom, which was sequencing instruments are currently available, all capable of sequenc- epidemiologically linked to the German outbreak.
ing bacterial genomes in a matter of days ).
1Centre for Systems Biology, University of Birmingham, Birmingham, UK. Creation of reference assembly
2Health Protection Agency, London, UK. 3School of Medicine, University of To permit comparisons of benchtop sequencing data, we gener- East Anglia, Norwich, UK. Correspondence should be addressed to M.J.P. ([email protected]) or J.W. ([email protected]).
ated a reference assembly for E. coli O104:H4 strain 280 (UK Health Protection Agency's materials identifier H112160280) using estab- Received 19 December 2011; accepted 30 March 2012; published online 22 April 2012; corrected online 23 April 2012 (details online); corrected after print 7 June lished high-throughput sequencing platforms. This isolate was recovered from a female traveler returning from Germany who had VOLUME 30 NUMBER 5 MAY 2012 nature biotechnology Table 1 Price comparison of benchtop instruments and sequencing runs
of the three instruments (70–71 megabases). Minimum throughput Ion Torrent PGM runs generated over four times the throughput of 454 GS Junior but 35 Mb (400 bases) generated the shortest reads (mean 121 bases). The MiSeq run produced the greatest 10 Mb (100 bases) throughput (1.6 gigabases) with reads 100 Mbd (100 bases) 1,000 Mb (100 bases) slightly longer than those by Ion Torrent 1,500 Mb (2 × 150 bases) PGM, permitting the multiplexing of seven Note pricing may vary between countries and/or sales territories. Instrument prices do not include service contracts. E. coli strains on a single run when targeting Sample prices do not include the cost of generating the initial fragmented genomic DNA library with adaptors (an 40-fold coverage of each genome. MiSeq additional cost of between $75−200 depending on method used). Cost per megabase assumes one sample and one .
reads were paired-end, that is, fragments aIon Torrent PGM pricing from Invitrogen US territory website (http://www.invitrogen.com/, accessed 21 February 2012). were sequenced in both directions. Across bPrice includes Ion Torrent PGM, server, OneTouch and OneTouch ES sample automation systems. cIon Torrent PGM prices include chip and sample preparation kit. dConfiguration used in this study.
the reference chromosome, coverage was general y even for al technologies. However, in the MiSeq data we saw a peak associ- developed hemolytic uremic syndrome and thrombotic thrombocy- ated with the Shiga toxin–producing phage. A similar, but smaller, topenic purpura. The isolate was confirmed as typical of the outbreak peak was detectable in the Ion Torrent PGM data (Supplementary
strain (ST678, stx-2 positive and intimin negative.
Fig. 1). These peaks may be explained by the occurrence of phage lysis
We used the Roche 454 GS FLX+ system to generate very long in the cultures used to prepare the DNA for sequencing. Differences fragment reads (modal read length, 812 bases; maximum read length, in relative coverage levels were also seen in the pESBL and pAA 1,170 bases). Additionally, Roche 454 GS FLX was used to sequence plasmids between instruments, possibly due to the use of different an 8-kb insert paired-end library using Titanium chemistry. The DNA shearing techniques in library preparation.
reads were assembled into contigs, which were scaffolded to produce Because each manufacturer uses a unique software implementa- a draft reference assembly. Mean coverage depth for the assembly tion to generate base-quality score predictions, direct comparison of was 32-fold.
these scores between platforms is difficult. We recalibrated quality The use of abundant long reads and long-insert, paired-end infor- scores for each instrument by first aligning reads to the reference All rights reserved.
mation resulted in a very high-quality draft genome assembly con- genome. By observing the counts of matched and mismatched sisting of three scaffolds. Of the bases in the assembly, 99.42% are bases in each aligned read, a new quality score can be calculated, Q64 bases (representing accuracy of one miscall around every 2.5 M called alignment quality. We used a scoring system, previously bases), 99.54% are Q40 (one miscall every 10,000 bases) or higher. published, which takes into account substitutions, insertions and Bases with a quality score <40 were masked with a lower-case let- deletions. Mismatches resulting in deletions are assigned randomly ter and excluded from further analysis. The largest scaffold corre- to the position of one of the adjacent bases in the read. Alignment sponded to the chromosome (5,340,015 bp), the two smal er scaffolds quality scores measured in this way generally had good agreement corresponded to two large plasmids (pESBL and pAA). The 1.5-kb with predicted scores, with the Ion Torrent PGM generally under- plasmid sequence was present in a single contig. Although each scaf- estimating quality scores and the other instruments slightly over- fold represented a single circular replicon, 153 gaps remained within estimating them (Supplementary Fig. 2). The MiSeq produced the
the scaffolds. These gaps represent repetitive regions longer than highest quality reads, owing to a low substitution error rate (0.1 the mean read length and shorter than the paired-end insert library, substitutions per 100 bases) and the near absence of indel errors which cannot be resolved by this sequencing strategy.
compared to the other platforms. The Ion Torrent PGM showed a steadily decreasing accuracy across the read to the 100th base. Characteristics of reads from benchtop sequencers
However, soft clipping of low-quality read ends by the BWA align- Genome depth, evenness of coverage, read length and read quality are ment software serves to make the accuracy appear to increase after the four major factors that determine the ability to reconstruct genome this point as mismatches are not counted in soft-clipped (unaligned) sequences from sequence data. There were large differences in the parts of the read number, predicted quality and length of reads obtained from the three Comparison of the frequency of indels through alignment to platforms and. The 454 GS Junior produced the longest the reference demonstrated that Ion Torrent PGM reads had 1.5 reads, with a mean length of 522 bases, but had the lowest throughput indels per 100 bases (1.72 indels per read). The 454 GS Junior had Table 2 Run and alignment metrics for benchtop sequencers
Alignment coverage Mean read length Reads aligned (%) 454 GS Junior (1) 454 GS Junior (2) Ion Torrent PGM (1) MiSeq (1) demulti- plexed strain 280 Metrics for each sequencing run are shown as well as results of alignment against the reference sequence. Depth of coverage for the chromosome and two large plasmids (pESBL and pAA) are shown with the percentage of reads that align. For the MiSeq run, the sequence metrics are shown for the entire run as well as the results of de-multiplexing E. coli O104:H4 strain 280. Alignment statistics for the entire run are not shown as two strains sequenced were of E. coli isolates unrelated to the outbreak strain.
nature biotechnology VOLUME 30 NUMBER 5 MAY 2012
454 GS Junior (1+2) Figure 1 Evaluation of read length and
Ion Torrent PGM (1+2) MiSeq (280 strain) quality from benchtop sequencers. (a) Box
plots generated by the qrqc software package showing the predicted per-base quality score for combined sequencing runs for each benchtop instrument at each read position created by the qrqc package. Gray shaded bands indicate the 10% and 90% quantiles, orange shaded bands Base quality (Phred-scaled indicate the lower and upper quartiles, the blue dot is the median. A purple smooth curve is fit 100 200 300 400 500 600 through the Quality scores are given as Phred-scaled quality values where Q = −10 log 454 GS Junior (1+2) Ion Torrent PGM (1+2) MiSeq (280 strain) 10 P (P is the probability of the base call being correct). (b) Histograms showing
read lengths produced by each instrument. (c) Comparison of the predicted and measured
accuracy for each benchtop sequencer. Predicted accuracy is determined by multiplying the number of alignments of bases of each quality score by the probability of an incorrect 10). The sum of these values is 0 100 200 300 400 500 600 divided by the number of aligned bases to give a measurement of accuracy. (d) The percentage of
reads aligned at each read position.
454 GS Junior (1+2) Ion Torrent PGM (1+2) MiSeq (strain 280) 0.38 indels per 100 bases (1.74 indels per read). In contrast, indels were detected very infrequently in MiSeq data with <0.001 indels All rights reserved.
per 100 bases. These results were confirmed by alignment to two other reference genomes sequenced with other sequencing technol- 100 200 300 400 500 600 ogies (Supplementary Tables 13). As with
454 sequencing, the major source of indels 454 GS Junior (1+2) Ion Torrent PGM (1+2) MiSeq (strain 280) in Ion Torrent PGM data are runs of identi- cal bases (homopolymers). Comparison of homopolymer accuracy between Ion Torrent PGM and 454 GS Junior demonstrated that Ion Torrent PGM was less accurate when calling homopolymers of any length Reads aligned (% 20 (Supplementary Fig. 3). The dominant
source of error was deletions, with accuracy 100 200 300 400 500 600 rates as low as 60% for homopolymers 6 bases Comparison of de novo assemblies
versus 311 contigs using CLC Assembly Cell). Scaffolds produced The use of high-throughput sequencing for the discovery of differ- by assembly of Illumina MiSeq paired-end data gave output con- ences in gene content and arrangement relies on the generation of taining runs of ambiguous ‘N' bases between 1 and 352 bases accurate de novo assemblies. We compared draft, de novo assemblies in length (81 such runs in CLC Assembly Cell output, 153 runs from benchtop instruments using a variety of metrics. Assembly in Velvet output).
metrics such as total assembly size and N50 (a statistic for describ- The number of contigs that can be mapped unambiguously to the ing the distribution of contig lengths in an assembly) (ref. 5) give reference gives a measure of reference genome coverage. Differences a guide to assembly completeness or fragmentation but not accu- in reference genome coverage were seen when comparing assemblies racy. An ideal assembly produces a single accurate contig for each from each platform (Supplementary Table 4). None of the assemblies
replicon, but this is rarely possible owing to the presence of long generated aligned unambiguously to 100% of the reference. Contigs repeat sequences. When comparing assemblies produced by benchtop obtained from the 454 GS Junior data aligned to the largest proportion de novo sequencers, we saw two major groupings of assembly quality. of the reference, with 3.72% of the reference unmapped, compared to Heavily fragmented assemblies were obtained with Ion Torrent PGM 4.6% for Ion Torrent PGM and 3.95% for MiSeq. The MIRA assembler data (single runs or combined), 454 GS Junior (single runs) and MiSeq produced the assemblies with the highest coverage of the reference and Supplementary Table 4). Less fragmented assemblies were genome for each data type.
obtained when reads from two 454 GS Junior runs were combined to The Ion Torrent PGM assemblies had large numbers of gaps increase depth of coverage (98 contigs versus 150 contigs using the (), compared to assemblies obtained from 454 GS Junior and assembler program MIRA) and when paired-end information was MiSeq data. Increasing sequence coverage by combining assemblies used to scaffold contigs generated from the MiSeq data (200 scaffolds from the two Ion Torrent PGM runs reduced the number of gaps in VOLUME 30 NUMBER 5 MAY 2012 nature biotechnology Figure 2 N50 contig sizes from assemblies generated from sequence data
for each sequencing platform. A selection of popular genome assemblers
have been used. The N50 contig size is calculated using the total genome length of the E. coli strain 280 reference sequence, rather than the sum total of contig lengths.
the assembly. With the MIRA assembly, the combined Ion Torrent PGM run showed 38% fewer gaps than Ion Torrent PGM run number two alone. However, many miscalls in long homopolymeric tracts remained, so that in assemblies produced from combining both Ion Torrent PGM data sets, large numbers of contigs were disrupted either by contig breaks or apparent frameshifts. Of the 2,017 gaps seen in the combined Ion Torrent PGM assembly produced by the Newbler assembler (1,811 gaps for MIRA), around a third to a quarter were due to gaps associated with ends of contig or unmapped sequence, the rest being associated with homopolymeric tracts. Although the likelihood of a gap increases with the length of the homopolymer, the number of very short homopolymers (2–3 residues) resulting in assembly gaps was significantly higher for this platform than for the 454 GS Junior. Manual inspection of assembly alignments revealed that many of the indels associated with short homopolymeric tracts demonstrated strand bias, with the correct cal predominantly associ- MiSeq (scaffolds) 454 GS Junior (1+2) Ion Torrent PGM (1) Ion Torrent PGM (2) ated with either forward or reverse reads and the erroneous sequences Ion Torrent PGM (1+2 associated with the opposite strand (Supplementary Fig. 4).
Although problems with homopolymers are known to result from all instruments did badly—for instance, in all assemblies the two flow cell–based chemistries, it is unclear why this strand bias larger plasmids were broken into multiple contigs, which could not All rights reserved.
should occur with Ion Torrent technology. However, scrutiny of be readily assigned to chromosome or plasmid without alignment to other public data sets from this instrument (the reference genome.
) suggests it is a We used 31 protein sequences linked to pathogen biology as pervasive problem.
queries in translated BLAST searches of the assemblies obtained from the benchtop sequencing platforms (. No assembly Benchtop assemblies and public health microbiology
contained a full set of full-length sequences. The best MiSeq A key test for a genome-sequencing technology is whether it can assembly captured 29/31 full-length sequences, the best 454 GS deliver trustworthy new insights into the biology of the organism Junior assembly found 26 and the best Ion Torrent PGM assembly under scrutiny. We therefore evaluated how de novo assemblies gener- found 23. Perhaps the most challenging targets in the survey were ated from data from each platform performed in reporting features the four serine protease autotransporters encoded in the genome of biological interest in the outbreak strain. For some features, all of the outbreak strain. None of the platforms managed to recover analyses did well—for example, all documented the presence and all four genes as full-length fragments. This is because these genes accurate full-length sequence of the genes encoding the Shiga toxin contain multiple domains and some domains exist as multiple copies type-2 subunits. However, at the other extreme, in some instances, in the genome, which are assembled into repeat consensus con- tigs that cannot be unambiguously placed in the genome. Notably, the choice of assembler affected the ability to detect certain genes; for example, CLC Assembly Cell and MIRA run with Illumina MiSeq were able to reconstruct al four of the aggregative adhesion fimbrial genes tested, whereas Velvet could reconstruct only two.
Integration of whole-genome sequenc- ing into existing practice in a public health laboratory requires backwards compatibility Figure 3 An analysis of gaps when aligning draft
de novo assemblies to the reference genome.
(a, top panel) The number of gaps that are
not associated with homopolymeric tracts, for example, contig breaks, misassemblies or missing sequence. (bottom panel) The number of gaps that are associated with homopolymeric tracts for each draft assembly. (b) The length of erroneously
454 GS Junior (1) 454 GS Junior (2) MiSeq (scaffolds) 454 GS Junior (1+2) Ion Torrent PGM (1) Ion Torrent PGM (2) Ion Torrent PGM (1+2) called homopolymeric tracts for each 454 GS Homopolymer length Junior and Ion Torrent PGM assembly.
nature biotechnology VOLUME 30 NUMBER 5 MAY 2012
Table 3 Full-length identical matches of clinically important proteins against draft assemblies
resistance MLST genes MiSeq (scaffolds) MiSeq (scaffolds) Ion Torrent (1+2) Ion Torrent (1+2) Protein coding sequences were searched against draft assemblies for each benchtop instrument using translated BLAST (tblastn, part of the BLAST 2.2.22 package). The results
show the number of matches that are identical to the sequence in the reference assembly. For MLST sequences, the nucleotide sequences and nucleotide BLAST (blastn) was used.
A summary of BLAST results can be found in Supplementary Table 5. SPATEs, serine protease autotransporters.
with existing typing methods. We therefore attempted to generate Speed, set-up, running costs and simplicity of workflow are also multi-locus sequence typing (MLST) profiles from each assembly. important factors when comparing these platforms. The Ion Torrent An accurate MLST profile was generated for the outbreak strain by PGM is the lowest-price instrument. The cost per base of generating All rights reserved.
all assemblies using MiSeq data. However, some 454 GS Junior and sequence data appears to be an order of magnitude higher for the 454 Ion Torrent PGM assemblies generated indel errors in at least one GS Junior than for the other two platforms. The MiSeq workflow has housekeeping gene.
the fewest manual steps as template amplification is done directly on the instrument without manual intervention in contrast to the Ion Torrent PGM and 454 GS Junior, which require preparation of In our evaluation, all three benchtop sequencing platforms gener- amplified sequence libraries through emulsion PCR and enrichment ated useful draft genome sequences of the German E. coli outbreak stages off the instrument. The Ion Torrent PGM is notable for offering strain. All could be judged suitable for bacterial genome sequencing, three differently priced sequencing-chip reagents, which gives flex- in producing assemblies that mapped to 95% or more of the refer- ibility when designing experiments, as a choice can be made based on ence genome and recovered the vast majority of coding sequences. the throughput required. Since this study was carried out, a paired- As expected, no instrument could generate completely accurate one- end protocol for the Ion Torrent PGM has been announced, similar contig-per-replicon assemblies that might equate to a finished to that for the MiSeq, which requires a second sequencing reaction genome. Thus, for each technology there is a trade-off between to be done immediately after the first, which also has the effect of advantages and disadvantages. In our survey, the MiSeq generated the doubling the run-time highest throughput per run and lowest error rate of the instruments, .
without significant indel errors and the lowest rate of substitution One important conclusion from this evaluation is that saying that errors (although accuracy does drop off toward the ends of reads). one has "sequenced a bacterial genome" means different things on However, the MiSeq delivered shorter read lengths than the 454 GS different benchtop sequencing platforms. Potential users of these Junior, probably a significant factor in the lower quality assemblies technologies need to be sensitive to these differences, particularly produced from MiSeq data. Even with paired-end sequencing, the when comparing or combining data generated on different platforms. single scaffold assemblies from the MiSeq are interrupted by unfill- It is also important to ask (i) to what extent errors can be corrected able gaps, representing difficult-to-resolve repeats. The MiSeq was the by comparison to reference data, (ii) when it is safe to use a mapping longest-running instrument, with paired-end, 150-base sequencing approach that makes assumptions about the resemblance of a novel on a pre-release instrument taking >27 h (60 Mb/h).
sequence to an existing reference sequence and (iii) how much one The 454 GS Junior delivered the longest read length but the should have to rely on human insight rather than automated analyses lowest throughput (8 Mb/h during a 9-h run) and suffered from and pipelines. In this study, we set a tough test by evaluating algorith- errors in homopolymeric tracts, even when assembled at high cover- mical y generated de novo assemblies. However, during the real-world age. Each Ion Torrent PGM run produced the shortest reads and the test case of the German E. coli outbreak, even the Ion Torrent plat- worst performance with homopolymers. However, it delivered the form, using the 314 chip with its low throughput and high error rate, fastest throughput (80–100 Mb/h) and shortest run time ( 3 h). This delivered useful insights into the biology and evolution of the out- platform has also shown the greatest improvement in performance break strain—for example, a homopolymer error in an MLST profile in recent months. An assembly for the outbreak strain generated in was easily corrected by manual comparison to database sequences. May 2011 from data from the original Ion Torrent 314 chip contained We are thus confident that benchtop high-throughput sequencing >3,000 contigs whereas, in this study, data from the recently avail- platforms are poised to make a decisive impact on diagnostic and able 316 chip were assembled into <400 contigs.
public health microbiology in the near future.
VOLUME 30 NUMBER 5 MAY 2012 nature biotechnology CoMPETING FINANCIAL INTERESTS
Methods and any associated references are available in the online The authors declare competing financial interests: details accompany the ful -text version of the paper at. HTML version of the paper at Published online at Accession codes. 454 sequences have been deposited into the Short Reprints and permissions information is available online at
Read Archive under study number SRA048574, with run accessions SRR388806 (454 GS Junior run 1), SRR388807 (454 GS Junior run 2), SRR388808 (454 FLX+) and SRR388809 (454 Titanium 8 kb paired-end). 1. Pallen, M.J., Nelson, K. & Preston, G.M. Bacterial Pathogenomics (ASM Press, Ion Torrent PGM sequences have been deposited under study number SRA048511, with accessions SRR389193 (Ion Torrent PGM run 1), 2. Metzker, M. Sequencing technologies—the next generation. Nat. Rev. Genet. 11,
31–46 (2010).
SRR389194 (Ion Torrent PGM run 2). The multiplexed MiSeq reads 3. Glenn, T. Field guide to next-generation DNA sequencers. Mol. Ecol. Resour. 11,
have been deposited under study number SRA048664. Assembly files 759–769 (2011).
and analysis scripts have been uploaded to a public Github repository 4. Pallen, M., Loman, N. & Penn, C. High-throughput sequencing and clinical microbiology: progress, opportunities and challenges. Curr. Opin. Microbiol. 13,
625–631 (2010).
5. Rothberg, J. et al. An integrated semiconductor device enabling non-optical genome sequencing. Nature 475, 348–352 (2011).
Note: Supplementary information is available on the website. 6. Bentley, D. et al. Accurate whole human genome sequencing using reversible terminator chemistry. Nature 456, 53–59 (2008).
7. Frank, C. et al. Epidemic profile of Shiga-toxin-producing Escherichia coli O104: We grateful y acknowledge the blogging community for helpful discussion in the H4 outbreak in Germany. N. Engl. J. Med. 365, 1771–1780 (2011).
comments section of our blog , and 8. Brzuszkiewicz, E. et al. Genome sequence analyses of two isolates from the recent Escherichia coli outbreak in Germany reveal the emergence of a new pathotype: in particular to B. Chevreux, J. Johnson, K. Robison and L. Nederbragt. We are entero-aggregative-haemorrhagic Escherichia coli (EAHEC). Arch. Microbiol. 193,
grateful to C. Hercus at Novocraft for help with the Novoalign software and to 883–891 (2011).
A. Darling for help with Mauve Assembly Metrics. We thank Roche Diagnostics, 9. Mellmann, A. et al. Prospective genomic characterization of the German UK, for 454 GS FLX+ and 454 FLX paired-end sequencing, technical support enterohemorrhagic Escherichia coli O104:H4 outbreak by rapid next generation and helpful discussion. We thank Life Technologies for early access to 316 chips sequencing technology. PLoS ONE 6, e22751 (2011).
and instrument fluidics upgrade. We thank G. Smith and Il umina UK for early 10. Rohde, H. et al. Open-source genomic analysis of Shiga-toxin-producing E. coli access to the MiSeq platform and public release of E. coli outbreak-strain data. O104:H4. N. Engl. J. Med. 365, 718–724 (2011).
We thank the three anonymous reviewers for their many helpful suggestions for 11. Rasko, D. et al. Origins of the E. coli strain causing an outbreak of hemolytic-uremic syndrome in Germany. N. Engl. J. Med. 365, 709–717 (2011).
All rights reserved.
improving the manuscript. The xBASE facility and N.J.L. are funded by BBSRC 12. Grad, Y. et al. Genomic epidemiology of the Escherichia coli O104:H4 outbreaks grant BBE0111791.
in Europe, 2011. Proc. Natl. Acad. Sci. USA 109, 3065–3070 (2012).
13. Ewing, B. & Green, P. Base-calling of automated sequencer traces using phred. II. Error probabilities. Genome Res. 8, 186–194 (1998).
N.J.L., J.W., S.E.G. and M.J.P. conceived the experiments; J.W. and S.G. supplied 14. Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25,
2078–2079 (2009).
the strains; N.J.L., R.V.M. and T.J.D. carried out the bioinformatics analysis; 15. Kingsford, C., Schatz, M. & Pop, M. Assembly complexity of prokaryotic genomes C.C. performed the Ion Torrent sequencing; and S.E.G. and R.V.M. performed using short reads. BMC Bioinformatics 11, 21 (2010).
the 454 GS Junior sequencing. N.J.L. and M.J.P. wrote the manuscript. Al authors 16. Buffalo, V. qrqc: Quick Read Quality Control R package version 1.9.1 commented on the manuscript. > (2012).
nature biotechnology VOLUME 30 NUMBER 5 MAY 2012
The run was initiated for 2 × 150 bases of SBS sequencing, including on-board Collection of isolates. E. coli strain 280 was grown according to the protocol clustering and paired-end preparation, the sequencing of the seven barcode
describe. To generate enough DNA for sequencing, the isolate was grown indices and analysis. On the completion of the run, data were base called and on multiple occasions.
demultiplexed on the instrument (provided as Il umina FASTQ files, Phred+64 encoding). FASTQ format files in Illumina 1.5 format were considered for Sequencing workflow. A general, simplified workflow for library prepara-
downstream analysis. Although MiSeq produces reads of fixed lengths, tails of tion, amplification and sequencing is shown in Supplementary Figure 5 with these reads may be designated as uncallable as indicated by the read segment
approximate timings for each stage. These stages comprise library prepara- quality control indicator, noted by a quality score of two (‘B'). In these cases tion from genomic DNA, amplification and sequencing. Library preparation these low-quality tails were trimmed and not used for further analysis.
steps are similar for each instrument, involving extraction and purification of genomic DNA, fragmentation through either enzymatic or physical means, Bioinformatics
fragment size selection and ligation of sequencing adaptors.
Construction of reference assembly. A high-quality reference sequence for
E. coli strain 280 was constructed by assembling 454 FLX+ long read data and Ion Torrent sequencing. Ion Torrent sequencing was performed at the University 454 Titanium paired-end data (8-kb insert) using Newbler 2.6. Newbler was
of Birmingham according to the Ion Torrent protocol (Life Technologies). run with parameters -scaffold -tr -cpu 8 -siom 28 -rip. The resulting scaffolds Total DNA from E. coli O104:H4 280 was isolated. Ten micrograms of were used for further analysis. Newbler masks certain bases in the assembly this DNA was fragmented with a Bioruptor instrument (Diagenode, Liège, regarded as uncertain by assigning it a lower-case nucleotide. These masked Belgium) using the protocol recommended by Life Technologies. A broad bases correspond with bases with a low-quality score. In bacterial genomes profile of fragment sizes (75–500 bp, peak at 255 bp) were obtained that were these bases are seen predominantly in consensus contigs resulting from long end-repaired, ligated with Ion Torrent A and P1 adaptors and size selected repeat regions, long homopolymeric tracts and contig ends. The resulting using E-Gel EX 2% Gel (Invitrogen, Carlsbad, CA) for 150- to 250-bp frag- assembly was annotated using the automated xBASE annotation pipeline, ments. The size-selected fragments were amplified and DNA was purified which uses Glimmer for coding sequence prediction and tRNAScan-SE and with Agencourt AMPure XP beads (Beckman Coulter Genomics, High RNAmmer for stable RNA predicti.
Wycombe, UK). The median fragment size of the final library was 200 bp (assessed by a BioAnalyzer High Sensitivity LabChip, Agilent). Library was De novo assembly of individual strains. Assemblies were generated from data
diluted to 40 pM and two emulsion PCR reactions were set up at two tem- generated by each of the benchtop sequencing platforms separately. Al data were plates per sphere. Sequencing primer and polymerase were added to the final assembled by MIRA 3.4.0 using default parameters in genome,denovo,accurate enriched spheres before loading onto the 316 chip. Two 316 chips were run in mode and the appropriate setting for each instrument type (454,iontor,solexa). total. Base calls were generated using version 1.5 of the Ion Torrent software Ion Torrent and 454 GS Junior data were additional y assembled with Newbler All rights reserved.
suite and for further analysis, the resulting flowgram files (assembly) or FASTQ 2.6 with default parameters. Il umina MiSeq data were additional y assembled files (alignment) were used.
using Velvet and CLC Assembly Cell (both de Bruijn graph assemblers). Velvet was run using a k-mer value of 55 and exp_cov and cov_cutoff set to auto. The 454 GS Junior sequencing. 454 GS Junior sequencing was carried out on an program was run again with -scaffolding off to generate a separate assembly
instrument at the Health Protection Agency, Colindale, UK. E. coli O104:H4 without scaffolds. CLC Assembly Cel version 4.0.6 beta was run with default 280 DNA was prepared following the Roche Rapid Library protocol (Roche, parameters. De novo assemblies were compared for chromosomal coverage and Welwyn Garden City, UK), whereby 5 ng/µl was taken from each sample broken genes, among other items using Mauve (mauve_snapshot_2011-08-19) and libraries prepared. Briefly, samples were subjected to the following key and the Mauve Assembly Metrics packag Assemblies were manual y exam- steps: DNA fragmentation by nebulization, fragment end-repair, AMPure XP ined using the Tablet viewer. Assembly gaps were inspected using a custom bead preparation (Amersham International, Buckinghamshire, UK), adaptor script extract_hp.py, which uses as input gaps reported by Mauve Assembly ligation, small fragment removal, quality assessment using the Agilent 2100 Metrics. Gaps in the whole-genome alignment that are associated with Bioanalyzer, library quantification and finally preparation of working aliquots homopolymeric tracts in the reference (of length two or more) were categorized at a final concentration of 1 × 107 molecules (500 ng total). Emulsions PCR, as homopolymer gaps; other gaps were categorized as assembly gaps. Gaps in enrichment and 454 GS Junior sequencing were carried out per manufacturer's the reference sequence were not counted.
protocols. The resulting flowgram files were used for downstream analysis.
Read mapping. For substitution and indel detection, reads from each platform
454 GS FLX+ and 454 GS FLX 8-kb titanium sequencing. 454 GS FLX were aligned to the reference assembly using the bwasw module of BWA (version
8-kb titanium paired-end and 454 FLX+ (long read) library construction and 0.5.9rc1). The reference genome was indexed with bwa index -a is. The bwasw sequencing was performed at Roche Diagnostics (Burgess Hil , UK) according module was run with default parameters (gap open penalty 5, gap extension to their standard protocols.
penalty 2) using FASTQ files as input. Output BAM files were post-processed using the calmd module of SAMtools, which adds MD tags to each alignment. Illumina MiSeq sequencing. Illumina MiSeq sequencing was done at Illumina The MD tag describes the positions of base substitutions. Reads that align to
UK, Little Chesterford, UK, on a pre-release, prototype MiSeq instrument. The masked bases in the reference genome were excluded from analysis. Read accu- seven E. coli samples were quantified with a Qubit High Sensitivity kit and racy was determined by a custom Python script (calculate_accuracy.py, available the total amount of DNA for each sample varied between 523 ng and 954 ng. in the Github repository) that uses the pysam module Samples were sheared with a Covaris S2 instrument followed by end repair, to read the BAM alignment. The calculate_accuracy script counts A-tailing and the ligation of TruSeq adaptors containing indexes. Samples were run mismatches using a published method which counts mismatches resulting on a 2% agarose gel (2 samples per gel) and DNA was size selected at 600–700 bp. from substitutions, insertions and deletions. In the case of deletions, mismatches Ten cycles of PCR were carried out and samples run out on a second 2% aga- are assigned to one of the adjacent bases in the read at random. Reads were addi- rose gel (two samples per gel). Samples were excised from the gel and quanti- tional y mapped against E. coli strain c236-11 (PacBio and Illumina sequenced) fied with a Qubit high-sensitivity kit. Libraries were diluted to 2 nM in EB plus and E. coli strain 55989 (Sanger sequenced).
0.1% Tween and a pool containing an equimolar concentration of each library For generation of homopolymer accuracy plots, reads for each of the was prepared. MiSeq instrument was prepared following routine procedures. benchtop sequencing platforms were mapped to the reference assembly Briefly, a standard MiSeq flow cell was inserted into the flow-cell chamber. using Novoalign (version V2.07.13, Novocraft, Malaysia, registered version). Next, the DNA sample containing the pool of seven E. coli libraries was diluted Gap penalties were adjusted with parameters as recommended by the docu- to 6.2 pmol and pipetted into the sample well on the MiSeq Consumable mentation -g 20 -x 5. Novoalign was set to align its maximum supported read Cartridge before loading in the chiller section of the MiSeq instrument. length of 300 using -n 300. Homopolymeric tract statistics were enabled using A sample sheet was prepared on the MiSeq instrument to provide run details. the –hpstats option.
nature biotechnology 17. Chattaway, M., Dallman, T., Okeke, I. & Wain, J. Enteroaggregative E. coli O104 21. Lagesen, K. et al. RNAmmer: consistent and rapid annotation of ribosomal RNA from an outbreak of HUS in Germany 2011, could it happen again? J. Infect. Dev. genes. Nucleic Acids Res. 35, 3100–3108 (2007).
Ctries. 5, 425–436 (2011).
22. Darling, A., Tritt, A., Eisen, J. & Facciotti, M. Mauve assembly metrics. Bioinformatics 18. Chaudhuri, R. et al. xBASE2: a comprehensive resource for comparative bacterial 27, 2756–2757 (2011).
genomics. Nucleic Acids Res. 36, D543–546 (2008).
23. Milne, I. et al. Tablet–next generation sequence assembly visualization. Bioinformatics 19. Delcher, A., Bratke, K., Powers, E. & Salzberg, S. Identifying bacterial genes and 26, 401–402 (2010).
endosymbiont DNA with Glimmer. Bioinformatics 23, 673–679 (2007).
24. Li, H. & Durbin, R. Fast and accurate long-read alignment with Burrows-Wheeler 20. Lowe, T. & Eddy, S. tRNAscan-SE: a program for improved detection of transform. Bioinformatics 26, 589–595 (2010).
transfer RNA genes in genomic sequence. Nucleic Acids Res. 25, 955–964
25. Touchon, M. et al. Organised genome dynamics in the Escherichia coli species results in highly diverse adaptive paths. PLoS Genet. 5, e1000344 (2009).
All rights reserved.
nature biotechnology Erratum: Reinventing clinical trials
Malorye Allison
Nat. Biotechnol. 30, 41–49 (2012); published online 9 January 2012; corrected after print 11 May 2012
In the version of the article originally published, the ExoInTouch product being used by Pfizer in its virtual trial for overactive bladder (OAB) is eDiary, not Recruit, which allows patients to report through mobile phone or internet portals. The Recruit technology is being used in other studies. The text references to Recruit have been replaced with an explanation of eDiary. Instead of "new technology to recruit patients faster and in a more standardized fashion," the text now reads, "new technology to al ow home-based clinical trial data reporting." Instead of "‘Recruit' text messaging technology in a pilot study" for Detrol, the text now reads, "‘eDiary' tool in a Phase 4 trial, cal ed Research on Electronic Monitoring of OAB Treatment Experience." Additional explanation has been added, including "Patients can respond to simple questionnaires (Fig. 3) via their
mobile phones or home computers. If they delay in responding, a reminder can be sent." And for space reasons, other text relating to Recruit, "The tool is integrated with Pfizer's volunteer database and allows immediate text message–based communication and assessment of a subject's suit- ability within 5–10 min" and "It can also be used to send protocol-specific messages to patients already enrol ed in trials" was deleted. In addition, it should have been noted that Eric Westin, who was interviewed while senior director of Lil y Oncology, had left the company. The errors have been corrected in the HTML and PDF versions of the article.
Erratum: Parallel genome universes
Tom Misteli
Nat. Biotechnol. 30, 55–56 (2012); published online 9 January 2012; corrected after print 7 June 2012
In the version of this article initial y published, the volume number and year of reference 2 should have been 30 and 2012, and not 29 and 2011, respectively. The errors have been corrected in the HTML and PDF versions of the article.
All rights reserved.
Erratum: BASF moves GM crop research to US
Lucas Laursen
Nat. Biotechnol. 30, 204 (2012); published online 7 March 2012; corrected after print 7 June 2012
In the version of this article initial y published, BASF's Amflora, a genetical y modified potato for industrial use, was mistakenly said to be blight America, Inc.
resistant when it is not. The error has been corrected in the HTML and PDF versions of the article.
Erratum: In Their Words 2012 Nature
Nat. Biotechnol. 30, 203 (2012); published online 7 March 2012; corrected after print 7 June 2012
In the version of this article initial y published online, Craig Thompson was incorrectly identified as the president of Rockefel er University. He is the president of Memorial Sloan-Kettering Cancer Center in New York. The error has been corrected for the PDF version of this article.
Corrigendum: Performance comparison of whole-genome sequencing platforms
Hugo Y K Lam, Michael J Clark, Rui Chen, Rong Chen, Georges Natsoulis, Maeve O'Huallachain, Frederick E Dewey, Lukas Habegger,
Euan A Ashley, Mark B Gerstein, Atul J Butte, Hanlee P Ji & Michael Snyder
Nat. Biotechnol. 30, 78–82 (2012); published online 18 December 2011; corrected after print 7 June 2012
In the version of this article initially published, the accession code to obtain raw sequence data was given as SRA045736.2; the correct code is SRA045736. The error has been corrected in the HTML and PDF versions of the article.
Corrigendum: Performance comparison of benchtop high-throughput sequencing platforms
Nicholas J Loman, Raju V Misra, Timothy J Dallman, Chrystala Constantinidou, Saheer E Gharbia, John Wain & Mark J Pallen
Nat. Biotechnol. 30, 434–439 (2012); published online 22 April 2012; corrected online 23 April 2012; corrected after print 7 June 2012
In the version of this article initially published online, in the Online Methods "Ion Torrent Sequencing" section, the sentence beginning with "Ten milligrams of this DNA was fragmented with a Bioruptor instrument…." should have read "Ten micrograms…." and in the "454 GS Junior sequencing" section, "(500 total)" should have read "(500 ng total)." The errors have been corrected in the PDF and HTML versions of this article.
volume 30 number 6 June 2012 nature biotechnology

Source: http://rp-www.cs.usyd.edu.au/~mcharles/teaching/info5010/student_resources/nbt.2198.pdf

A social protection strategies (adb)

Kawsar, et al / Journal of SUB 4(2): 89-102, 2013 Phosphatidylcholine: A Review Md. Hassan Kawsar1, Md. Firoz Khan2 and Md. Akbar Hossain3 Abstract: In recent years Phosphatidylcholine has greatly impacted the drug delivery technology. The very first and most important advantage of phospholipid based vesicular system is the compatibility of phospholipids with membrane of human either internal membrane or skin (external membrane). For a drug to be absorbed and distributed into organs and tissues and eliminated from the body, it must pass through one or more biological membrane(s)/ barrier(s) at various locations. Such a movement of drug across the membrane is called drug transport. For the drugs to be delivered to the body should cross the membranous barrier, either it would be from oral route or topical/transdermal route. Therefore the phospholipid based carrier systems are of considerable interest in this era. A number of drug delivery systems are based entirely on Phosphatidylcholine such as Liposomes, Ethosomes, Phytosomes, Transferosomes and Nanocochelates.

goodcountryphysio.com.au

Consulting in Spring has arrived in the Niki Borchardt also joined We are also co-hosting a South-east and after the good the team in August, working golf day with Limestone BORDERTOWN rains across the region in the as a receptionist in Keith. Coast Agri-Links on 15th 8752 2330 last month, everyone seems Unfortunately Niki can no