• Home
  • News
  • Tutorials
  • Analysis
  • About
  • Contact

TechEnablement

Education, Planning, Analysis, Code

  • CUDA
    • News
    • Tutorials
    • CUDA Study Guide
  • OpenACC
    • News
    • Tutorials
    • OpenACC Study Guide
  • Xeon Phi
    • News
    • Tutorials
    • Intel Xeon Phi Study Guide
  • OpenCL
    • News
    • Tutorials
    • OpenCL Study Guide
  • Web/Cloud
    • News
    • Tutorials
You are here: Home / CUDA / Metagenomic Sequence Clustering using CUDA-enabled GPUs

Metagenomic Sequence Clustering using CUDA-enabled GPUs

August 15, 2014 by admin Leave a Comment

There have been huge advances in DNA sequencing technologies in recent years; e.g. Illumina has just announced the HiSeq X system which can sequence human genomes at a cost of only $1000 per genome. Besides population-scale human genome sequencing another important application of sequencing technologies is environmental sequencing (so called metagenomics). Metagenomic studies usually involve the sequencing of a sample of cells collected from their environment (such as soil, marine water, or human gut). Analyzing metagenomic sequencing data is computational challenging because of the diverse mix of sequenced genomes and the large data size.

CRiSPy (Computing Richness in Pyrosequencing Datasets) is a software tool developed by PhD student Nguyen Thuy Diem from Nanyang Technological University (Singapore) and Prof. Bertil Schmidt from the University of Mainz (Germany) that deals with estimating the genomic richness and variation within a microbial sequencing dataset. More specifically, CRiSPy is a de-novo (i.e. reference database independent) clustering technique to perform taxonomic profiling of a microbial community by grouping 16S rRNA amplicon reads into operational taxonomic units (OTUs). The processing pipeline for computing an OTU clustering is based on hierarchical clustering and is shown in Figure 1. CRiSPy is freely available at https://github.com/ngthuydiem/crispy.

figure1cropped-fs8-0

The processing pipeline of CRiSPy (rectangular boxes represent computation and oval boxes denote data). The dashed boxes belong to the preprocessing step using third-party error correction and chimera removal tools.

CRiSPy-CUDA aims at producing high-quality clustering results at a reasonably low computational cost. In particular, we tackle two computational issues of the OTU hierarchical clustering approach:

  1. High computational complexity (i.e. the compute-intensive sequence alignment operation for building a pair-wise distance matrix)
  2. High memory complexity (i.e. the quadratic memory requirement of the hierarchical clustering procedure)

The first issue is addressed by using CUDA-enabled GPUs for parallelizing the two-staged pairwise distance computation (i.e. k-mer distances first (as a computationally cheaper filter) and then the more accurate genetic distances for all pairs with a low k-mer distance). Taking full advantage of fast CUDA shared memory, we are able to achieve speedups of around 100x for the genetic distance computation (based on dynamic programming) and around 50x for the k-mer distance computation on a GeForce GTX 580 compared to a single-core CPU code.

The second issue is addressed by the design of a memory-efficient sparse matrix hierarchical clustering algorithm. Even though we use a sparse representation of the calculated distance matrices, typical sizes of large matrices can still exceed the RAM capacity of a standard workstation. Thus, we have developed a fast and memory-efficient hierarchical clustering algorithm for sparse matrices called SparseHC. SparseHC scans a sorted sparse distance matrix chunk-by-chunk and a dendrogram is built by merging cluster pairs as soon as the distance between them is determined to be the smallest among all remaining cluster pairs. The main insight used is that it is unnecessary to wait for the completion of all cluster pair distance computations for finding the cluster pair with the smallest distance. Partial information can be used to determine lower bounds on cluster pair distances that are used for cluster distance comparison. In summary, SparseHC is an online hierarchical clustering tool which can perform accurate single-, complete- and average-linkage hierarchical clustering with linear empirical memory complexity, making it particularly useful to cluster large datasets using computers with limited memory resources (see Table 1). SparseHC is available at https://bitbucket.com/ngthuydiem/sparsehc and described in more detail here: http://www.sciencedirect.com/science/article/pii/S1877050914001781.

Number of data points Matrix size (in GB) Memory usage (in MB) Memory efficiency of SparseHC
Single Complete Average Single Complete Average
50,000 14 7.0 30.5 44.9 2,055 469 318
100,000 56 12.2 60.0 90.2 4,673 954 635
150,000 126 17.7 89.6 143.4 7,272 1,437 897
200,000 224 22.9 119.4 198.9 10,013 1,917 1,151

Table 1: The memory efficiency of SparseHC, presented by the ratio matrix size to memory usage.

Another important feature of CRiSPy is the usage of a dynamic method to automatically determine the distance threshold based on a single change point detection method to obtain the natural grouping of a dataset. Due to this dynamic dendrogram cutting, CRiSPy produces better grouping results than most existing OTU binning tools including QIIME, CD-HIT-OTU, UPARSE, ESPRIT-Tree and ESPRIT in terms of the adjusted rand index (ARI) score (a random labeling in which the predicted grouping is very different from the actual grouping results in an ARI score close to 0.0 while a perfect labeling has a score of 1.0). Table 2 shows the results of a benchmark test.

CRiSPy M-Pick ESPRIT ESPRIT-Tree QIIME UPARSE CD-HIT-OTU
ARI-score 0.86 0.42 0.57 0.53 0.57 0.57 0.55
Runtime (in sec) 4 61 1,956 31 3 33 2
Memory usage (in MB) 47 2,330 76 252 370 11 50

Table 2: Benchmark of all OTU clustering tools using 10 randomly-generated datasets with 5,000 16S rRNA reads from 100 species. Means from 10 runs are reported. M-pick and CRiSPy are dynamic grouping tools. ESPRIT and ESPRIT-Tree are static hierarchical clustering tools. QIIME (pick_otus v1.8), UPARSE (in USEARCH v7.0) and CD-HIT-OTU (v0.0.2) are static greedy heuristic clustering tools. Experiments are conducted on a 64-bit Linux OS using a PC with a quad-core Intel Xeon W3540 CPU and 8GB RAM. CRiSPy runs on a GeForce GTX 580 installed on the same PC.

CRiSPy is capable of producing high-quality clusters and is scalable to several hundreds of thousands of reads on our workstation with only 8GB of RAM due to its memory and computational efficiency. Although targeting 16S rRNA short reads for OTU binning of metagenomic datasets, the utilized alignment and clustering techniques could potentially be applied to other types of sequence data expressed sequence tags or protein sequences) or to other hierarchical tree construction problems (such as phylogenetic tree or guide tree construction). Furthermore, SparseHC can be a useful tool for a variety of big data application areas that need to perform memory-efficient hierarchical clustering from a (possibly sparse) distance matrix based on single-, complete- or average-linkage schemes.

bertil

Click for more information about Professor Bertil Schmidt

Bertil Schmidt is tenured Full Professor and Chair for Parallel and Distributed Architectures at the University of Mainz, Germany. Prior to that he was a faculty member at Nanyang Technological University (Singapore) and at University of New South Wales (UNSW). His research group has designed a variety of algorithms and tools for Bioinformatics mainly focusing on the analysis of large-scale datasets. For his research work, he has received a CUDA Research Centre award, a CUDA Academic Partnership award, a CUDA Professor Partnership award and the Best Paper Award at IEEE ASAP 2009. Furthermore, he serves as the champion for Bioinformatics and Computational Biology on gpucomputing.net.  

Share this:

  • Twitter

Filed Under: CUDA, Featured article, Featured news, News, News Tagged With: CUDA, HPC, NVIDIA

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Tell us you were here

Recent Posts

Farewell to a Familiar HPC Friend

May 27, 2020 By Rob Farber Leave a Comment

TechEnablement Blog Sunset or Sunrise?

February 12, 2020 By admin Leave a Comment

The cornerstone is laid – NVIDIA acquires ARM

September 13, 2020 By Rob Farber Leave a Comment

Third-Party Use Cases Illustrate the Success of CPU-based Visualization

April 14, 2018 By admin Leave a Comment

More Tutorials

Learn how to program IBM’s ‘Deep-Learning’ SyNAPSE chip

February 5, 2016 By Rob Farber Leave a Comment

Free Intermediate-Level Deep-Learning Course by Google

January 27, 2016 By Rob Farber Leave a Comment

Intel tutorial shows how to view OpenCL assembly code

January 25, 2016 By Rob Farber Leave a Comment

More Posts from this Category

Top Posts & Pages

  • Intel Broadwell Compute Gen8 GPU Architecture
  • Recovering Speech from a Potato-chip Bag Viewed Through Soundproof Glass - Even With Commodity Cameras!
  • MultiOS Gaming, Media, and OpenCL Using XenGT Virtual Machines On Shared Intel GPUs
  • NVIDIA GTC 2015 keynote - Near-term Roadmap is Deep-Learning
  • LibreOffice OpenCL Acceleration for the Masses - Intel vs. AMD GPU performance

Archives

© 2025 · techenablement.com