Publications

Detailed Information

NASCUP: Nucleic Acid Sequence Classification by Universal Probability

Cited 0 time in Web of Science Cited 0 time in Scopus
Authors

Kwon, Sunyoung; Kim, Gyuwan; Lee, Byunghan; Chun, Jongsik; Yoon, Sung Roh; Kim, Young-Han

Issue Date
2021-11
Publisher
Institute of Electrical and Electronics Engineers Inc.
Citation
IEEE Access, Vol.9, pp.162779-162791
Abstract
Nucleic acid sequence classification is a fundamental task in the field of bioinformatics. Due to the increasing amount of unlabeled nucleotide sequences, fast and accurate classification of them on a large scale has become crucial. In this work, we developed NASCUP, a new classification method that captures statistical structures of nucleotide sequences by compact context-tree models and universal probability from information theory. A comprehensive experimental study involving nine public databases for functional non-coding RNA, microbial taxonomy and coding/non-coding RNA classification demonstrates the advantages of NASCUP over widely-used alternatives in efficiency, accuracy, and scalability across all datasets considered. NASCUP achieved BLAST-like classification accuracy consistently for several large-scale databases in orders-of-magnitude reduced runtime, and was applied to other bioinformatics tasks such as outlier detection and synthetic sequence generation.
ISSN
2169-3536
URI
https://hdl.handle.net/10371/184333
Files in This Item:
There are no files associated with this item.
Appears in Collections:

Altmetrics

Item View & Download Count

  • mendeley

Items in S-Space are protected by copyright, with all rights reserved, unless otherwise indicated.

Share