Dominating the Next Generation Sequencing data analysis challenge
We have overcome the challenge to analyze Next Generation Sequencing
data faster than it is produced by implementing a SIMD-accelerated
assembly algorithm in our Next Generation Sequencing solution, CLC
Genomics Workbench - a cross-platform desktop application with a
graphical user-interface.
CLC Genomics Workbench, for analyzing and visualizing Next Generation
Sequencing data, incorporates cutting-edge technology and algorithms,
while also supporting and integrating with the rest of your typical NGS
workflow.
Genomics
Whole genome re-sequencing and targeted re-sequencing of
genomes of any size and type – from bacteria and vira to humans
De novo assembly of an unlimited number of reads,
for genomes up to human size
SNP detection, DIP detection, and identification of genomic
rearrangements
Visualization and interactive graphical manipulation of
results
Transcriptomics
CLC Genomics Workbench on
Mac OS X platform
Digital Gene Expression based on RNA-Seq, including a wide
range of downstream gene expression analyses
Discovery of novel transcripts/exons
Ability to work with Expression Arrays and RNA-seq results
at the same time, enabling comparison of results
Interactive views of assemblies and derived gene expression
data
Graph and table of background distribution and false discovery
rate
Peak table and annotations
Cross-platform
CLC Genomics Workbench is available for Windows, Mac OS X, and Linux
platforms, and includes all features of CLC Main Workbench for carrying
out a wide range of downstream analyses.
Benchmarking
In benchmark tests we have assembled half a million 454 reads against
the full E. coli reference genome in around 2 minutes on a
two-core laptop with 2 gigabyte RAM. This speed-up, based on integrated
SIMD high-performance computing technology, increases even more when
using a computer with more CPU-cores and RAM
Head of
Bioinformatics Division at BGI, Ruiqiang Li
We have chosen CLC Genomics Workbench as
our platform for analyzing Next Generation Sequencing data after testing
several commercial solutions, because it’s simply in a league of its
own when it comes to flexibility and the way the Next Generation
Sequencing tools can be used together with our own algorithms.
A few benchmarks
- E. Coli
Time (minutes)
454: Read mapping and visualization
of 439,000 reads to E. Coli (5 mega bases) on a 1,500 USD 2GB
dual core, 2.13 GHz, 32 bit laptop computer
2
Illumina Genome Analyzer: Read
mapping and visualization of 2 x 2.7 = 5.4 million paired end reads (1
lane) to E. Coli (5 Mega bases) on a 32GB, 8 core, 2.5 GHz, 64
bit desktop computer
3
A few benchmarks
- Human
Time (hours)
Illumina Genome Analyzer: Read
mapping and visualization of 2 x 43 million = 86 million paired end
reads to Human (3 Giga bases) on a 32GB, 8 core, 2.5 GHz, 64 bit desktop
computer (ungapped alignment)
4
Illumina Genome Analyzer: Read
mapping and visualization of 2 x 43 million = 86 million paired end
reads to Human (3 Giga bases) on a 32GB, 8 core, 2.5 GHz, 64 bit desktop
computer (gapped alignment)
454, SOLiD, Illumina Genome Analyzer - no problem!
We support all the major Next Generation Sequencing platforms, such as
SOLiD, 454, Illumina Genome Analyzer and of course also Sanger. We are
working closely together with all the instrument vendors to ensure full
integration in the ongoing development.
CLC Genomics Workbench includes all features of CLC Main
Workbench and the following additional functionalities:
Genomics
Assembly view, zoomed to 100%, together with a conflict table
view. A row in the table has been selected, and the associated conflict
position is automatically highlighted in the assembly view.
Figure 1 (click to enlarge): Assembly view, zoomed to 100%, together
with a conflict table view.
De novo assembly of mixed datasets (e.g. 454 and Illumina
Genome Analyser)
Contig
report that records various statistics and graphs for contigs,
including e.g. N75, N50 and N25 statistics, coverage distribution,
contig size distributions.
Interactive and zoom-able viewing of genome assemblies, including
sequencing reads, quality data, and reference sequences. Full
integration of the viewers with the downstream analyses
The
screen shot shows a table view of an expression sample generated from a
sequence file of NGS mRNA reads. The table gives gene expression level
values (read per kilo base of exon model: RPKM), along with statistics
to the read counts, exons and transcripts. For each gene the assembly
result may be opened, allowing examination of reads for that gene. (CLC
Genomics Workbench - v.2.9 beta)
Figure 2 (click to enlarge):
A
table view of an expression sample generated from an sequence file of
NGS mRNA reads.
RNA-seq
outputs and can use unique and total gene/exon reads as well as median
coverage as measures of expression.
Small
RNA analysis -Adapter trimming, Counting of tags, Annotation using
miRBase and other resources, Visualization of miRNA variants and
Expression analysis.
Facility for annotating sequences from GFF or GTF files (as used by
Ensembl and the UCSC Genome Browser), useful for annotating reference
genomes before assembly
CLC Genomics Workbench is fully integrated with CLC Assembly Cell,
our command line solution for super fast assembly of Next Generation
Sequencing data.
More features
CLC Genomics Workbench includes all features from CLC Main Workbench -
all listed below:
System requirements for all workbenches from CLC bio
Mac OS X 10.4 or later (including Intel-based Macs)
Windows 2000, Windows XP, Windows Vista, or Windows 7
Linux: Redhat or SuSE
256 MB RAM required
2 GB RAM recommended
1024 x 768 display recommended
Additional requirements for CLC Protein Workbench, CLC Main
Workbench, & CLC Genomics Workbench
3D viewing requires an OpenGL 3D graphics driver (included with almost all graphics
cards)
Special requirements for CLC Genomics Workbench
Intel or AMD CPU required
64 bit computer and operating system recommended for more than 2
GB RAM
Small data sets:
Assembly and analysis of genomes up to
50 mega-bases and up to 10 mil. reads
2 GB RAM required
4
GB RAM recommended
Medium data sets:
Assembly and analysis of larger
genomes and up to 100 mil. reads
8 GB RAM required
16 GB
RAM recommended
Large data sets:
Assembly and analysis of larger genomes
and more than 100 mil. reads
16 GB RAM required
32 GB
RAM recommended
Special requirements for de novo assembly:
De novo
assembly may need more memory than stated above - this depends both on
the number of reads and the complexity and size of the genome. See our white paper on de novo
assembly for examples of the memory usage of various data sets.
System requirements for CLC License Server
Mac OS X 10.4 or Mac OS X 10.5
Windows XP, Windows Vista, Windows 7 or Server 2003
Linux: Red Hat, Fedora Core or SuSE
Java
The CLC Workbenches are built using Java technology. If you are a
Windows or Linux user, the CLC Workbench includes a JRE (Java Runtime
Environment) which is needed to run the CLC Workbench. This JRE will not
interfere with existing JRE's on your computer and will only be used to
run the CLC Workbench.
Mac OS X includes a Java Runtime Environment as default.
Mac and the Mac logo are trademarks of
Apple Computer, Inc., registered in the U.S. and other countries.
Microsoft, Windows and the Windows logo are either registered trademarks
or trademarks of Microsoft Corporation in the United States and other
countries.