Publications

Detailed Information

NASCUP: Nucleic Acid Sequence Classification by Universal Probability

DC Field Value Language
dc.contributor.authorKwon, Sunyoung-
dc.contributor.authorKim, Gyuwan-
dc.contributor.authorLee, Byunghan-
dc.contributor.authorChun, Jongsik-
dc.contributor.authorYoon, Sung Roh-
dc.contributor.authorKim, Young-Han-
dc.date.accessioned2022-08-22T09:09:55Z-
dc.date.available2022-08-22T09:09:55Z-
dc.date.created2022-07-08-
dc.date.created2022-07-08-
dc.date.issued2021-11-
dc.identifier.citationIEEE Access, Vol.9, pp.162779-162791-
dc.identifier.issn2169-3536-
dc.identifier.urihttps://hdl.handle.net/10371/184333-
dc.description.abstractNucleic acid sequence classification is a fundamental task in the field of bioinformatics. Due to the increasing amount of unlabeled nucleotide sequences, fast and accurate classification of them on a large scale has become crucial. In this work, we developed NASCUP, a new classification method that captures statistical structures of nucleotide sequences by compact context-tree models and universal probability from information theory. A comprehensive experimental study involving nine public databases for functional non-coding RNA, microbial taxonomy and coding/non-coding RNA classification demonstrates the advantages of NASCUP over widely-used alternatives in efficiency, accuracy, and scalability across all datasets considered. NASCUP achieved BLAST-like classification accuracy consistently for several large-scale databases in orders-of-magnitude reduced runtime, and was applied to other bioinformatics tasks such as outlier detection and synthetic sequence generation.-
dc.language영어-
dc.publisherInstitute of Electrical and Electronics Engineers Inc.-
dc.titleNASCUP: Nucleic Acid Sequence Classification by Universal Probability-
dc.typeArticle-
dc.citation.journaltitleIEEE Access-
dc.identifier.wosid000730449500001-
dc.identifier.scopusid2-s2.0-85119427713-
dc.citation.endpage162791-
dc.citation.startpage162779-
dc.citation.volume9-
dc.description.isOpenAccessY-
dc.contributor.affiliatedAuthorChun, Jongsik-
dc.contributor.affiliatedAuthorYoon, Sung Roh-
dc.type.docTypeArticle-
dc.description.journalClass1-
dc.subject.keywordPlusTREE WEIGHTING METHOD-
dc.subject.keywordPlusRNA GENE DATABASE-
dc.subject.keywordPlusPHYLOGENETIC CLASSIFICATION-
dc.subject.keywordPlusPREDICTION-
dc.subject.keywordPlusPROTEIN-
dc.subject.keywordPlusSEARCH-
dc.subject.keywordAuthorContext modeling-
dc.subject.keywordAuthorMarkov processes-
dc.subject.keywordAuthorHidden Markov models-
dc.subject.keywordAuthorData models-
dc.subject.keywordAuthorMaximum likelihood estimation-
dc.subject.keywordAuthorProbability-
dc.subject.keywordAuthorDatabases-
dc.subject.keywordAuthorBioinformatics-
dc.subject.keywordAuthorcontext-tree models-
dc.subject.keywordAuthorinformation theory-
dc.subject.keywordAuthorsequence classification-
dc.subject.keywordAuthoruniversal probability-
Appears in Collections:
Files in This Item:
There are no files associated with this item.

Altmetrics

Item View & Download Count

  • mendeley

Items in S-Space are protected by copyright, with all rights reserved, unless otherwise indicated.

Share