A Memory-Access Efficient Parallel Algorithm for Spectrum-Based Short Read Error Correction
Location
Room 2901
Session Format
Paper Presentation
Research Area Topic:
Computer Science - Computational Sciences
Co-Presenters and Faculty Mentors or Advisors
Sriram Chockalingam
Srinivas Aluru
Abstract
DNA read error correction enhances the quality of results produced by applications in areas such as genomics, metagenomics, and transcriptomics. Use of error corrected reads also improves the runtime and memory usage of such applications. Sequential error correction tools cannot cope with billions of reads produced by modern day sequencing instruments. A scalable parallel spectrum-based error correction algorithm was proposed to address this shortcoming. In this work, we improved the memory-access efficiency of the error correction phase, which is the most time consuming step of the algorithm, resulting in a speedup of up to 2.4x over the total runtime. This was accomplished by using cache-oblivious and cache-aware search trees, data layouts optimized for memory accesses. We anticipate that even better speedup can be obtained for large datasets, such as those generated for humans and plants.
While developing parallel algorithms, significant emphasis is placed on reducing communication among nodes, as communication is considered to be expensive. In a similar manner, importance needs to be given to optimizing memory-accesses while developing algorithms. This is especially important in case of modern systems in which a memory access is about a few hundred times more expensive than the cost of computation. In this spirit, we adopted memory-access efficient search trees to significantly improve the performance of a previously proposed parallel error correction algorithm. We expect that our work will spur a renewed interest in embracing such memory-access efficient data structures and algorithms for various other problems.
Keywords
DNA read error correction, Parallel algorithm, Memory access efficiency, Cache-aware search tree, Cache-oblivious search tree, Binary search tree
Presentation Type and Release Option
Presentation (Open Access)
Start Date
4-24-2015 4:00 PM
End Date
4-24-2015 5:00 PM
Recommended Citation
Jammula, Nagakishore, "A Memory-Access Efficient Parallel Algorithm for Spectrum-Based Short Read Error Correction" (2015). GS4 Georgia Southern Student Scholars Symposium. 156.
https://digitalcommons.georgiasouthern.edu/research_symposium/2015/2015/156
A Memory-Access Efficient Parallel Algorithm for Spectrum-Based Short Read Error Correction
Room 2901
DNA read error correction enhances the quality of results produced by applications in areas such as genomics, metagenomics, and transcriptomics. Use of error corrected reads also improves the runtime and memory usage of such applications. Sequential error correction tools cannot cope with billions of reads produced by modern day sequencing instruments. A scalable parallel spectrum-based error correction algorithm was proposed to address this shortcoming. In this work, we improved the memory-access efficiency of the error correction phase, which is the most time consuming step of the algorithm, resulting in a speedup of up to 2.4x over the total runtime. This was accomplished by using cache-oblivious and cache-aware search trees, data layouts optimized for memory accesses. We anticipate that even better speedup can be obtained for large datasets, such as those generated for humans and plants.
While developing parallel algorithms, significant emphasis is placed on reducing communication among nodes, as communication is considered to be expensive. In a similar manner, importance needs to be given to optimizing memory-accesses while developing algorithms. This is especially important in case of modern systems in which a memory access is about a few hundred times more expensive than the cost of computation. In this spirit, we adopted memory-access efficient search trees to significantly improve the performance of a previously proposed parallel error correction algorithm. We expect that our work will spur a renewed interest in embracing such memory-access efficient data structures and algorithms for various other problems.