A Memory-Access Efficient Parallel Algorithm for Spectrum-Based Short Read Error Correction

Location

Room 2901

Session Format

Paper Presentation

Research Area Topic:

Computer Science - Computational Sciences

Co-Presenters and Faculty Mentors or Advisors

Sriram Chockalingam

Srinivas Aluru

Abstract

DNA read error correction enhances the quality of results produced by applications in areas such as genomics, metagenomics, and transcriptomics. Use of error corrected reads also improves the runtime and memory usage of such applications. Sequential error correction tools cannot cope with billions of reads produced by modern day sequencing instruments. A scalable parallel spectrum-based error correction algorithm was proposed to address this shortcoming. In this work, we improved the memory-access efficiency of the error correction phase, which is the most time consuming step of the algorithm, resulting in a speedup of up to 2.4x over the total runtime. This was accomplished by using cache-oblivious and cache-aware search trees, data layouts optimized for memory accesses. We anticipate that even better speedup can be obtained for large datasets, such as those generated for humans and plants.

While developing parallel algorithms, significant emphasis is placed on reducing communication among nodes, as communication is considered to be expensive. In a similar manner, importance needs to be given to optimizing memory-accesses while developing algorithms. This is especially important in case of modern systems in which a memory access is about a few hundred times more expensive than the cost of computation. In this spirit, we adopted memory-access efficient search trees to significantly improve the performance of a previously proposed parallel error correction algorithm. We expect that our work will spur a renewed interest in embracing such memory-access efficient data structures and algorithms for various other problems.

Keywords

DNA read error correction, Parallel algorithm, Memory access efficiency, Cache-aware search tree, Cache-oblivious search tree, Binary search tree

Presentation Type and Release Option

Presentation (Open Access)

Start Date

4-24-2015 4:00 PM

End Date

4-24-2015 5:00 PM

This document is currently not available here.

Share

COinS
 
Apr 24th, 4:00 PM Apr 24th, 5:00 PM

A Memory-Access Efficient Parallel Algorithm for Spectrum-Based Short Read Error Correction

Room 2901

DNA read error correction enhances the quality of results produced by applications in areas such as genomics, metagenomics, and transcriptomics. Use of error corrected reads also improves the runtime and memory usage of such applications. Sequential error correction tools cannot cope with billions of reads produced by modern day sequencing instruments. A scalable parallel spectrum-based error correction algorithm was proposed to address this shortcoming. In this work, we improved the memory-access efficiency of the error correction phase, which is the most time consuming step of the algorithm, resulting in a speedup of up to 2.4x over the total runtime. This was accomplished by using cache-oblivious and cache-aware search trees, data layouts optimized for memory accesses. We anticipate that even better speedup can be obtained for large datasets, such as those generated for humans and plants.

While developing parallel algorithms, significant emphasis is placed on reducing communication among nodes, as communication is considered to be expensive. In a similar manner, importance needs to be given to optimizing memory-accesses while developing algorithms. This is especially important in case of modern systems in which a memory access is about a few hundred times more expensive than the cost of computation. In this spirit, we adopted memory-access efficient search trees to significantly improve the performance of a previously proposed parallel error correction algorithm. We expect that our work will spur a renewed interest in embracing such memory-access efficient data structures and algorithms for various other problems.