Computer Science: Faculty Publications

COGRAM: A Computational Pipeline for Genome Assembly and Reconstruction via Optimized K-mer Sampling and De Bruijn Graph Networks

Document Type

Article

Publication Date

1-26-2026

Publication Title

Social Networks Analysis and Mining - 17th International Conference, ASONAM 2025, Proceedings

DOI

10.1007/978-3-032-13513-1_31

ISBN

9783032135124

Abstract

Genome assembly and annotation accuracy fundamentally depend on optimal selection of parameters and robust computational approaches. Here we introduce COGRAM (Coggins-Ramasamy Genomic Assembly Method), a novel bioinformatics pipeline that enhances genome assembly and reconstruction by optimizing k-mer parameters, leveraging graph theory, and incorporating machine learning techniques. Initially, COGRAM identifies the optimal k-mer length using methods inspired by KMERGENIE and grid search techniques, followed by random genomic sampling at the optimal resolution. It then conducts a comprehensive analysis of the frequency distributions of k-mer and GC-content across the sampled genome windows. Subsequently, the pipeline constructs a detailed de Bruijn framework graph from parsed genomic data. Using this graph, COGRAM trains a network to model genomic structures effectively, enhancing accuracy and scalability. Genome reconstruction is accomplished through rigorous cross-validation with a greedy algorithm designed to refine the quality of genome assembly iteratively. We demonstrate the effectiveness of COGRAM through benchmark tests on the E. coli genome. This pipeline represents a powerful tool for genomic projects with potential for expansion to other projects.

Copyright

This work is archived and distributed under the repository's Standard Copyright and Reuse License (opens in new tab). End users may copy, store, and distribute this work without restriction. For all other uses, permission must be obtained from the copyright owners or their authorized agents.

Share

COinS