Publications

Detailed Information

Multi-label classification with XGBoost for metabolic pathway prediction

Cited 0 time in Web of Science Cited 0 time in Scopus
Authors

Joe, Hyunwhan; Kim, Hong-Gee

Issue Date
2024-02-01
Publisher
BMC
Citation
BMC Bioinformatics, Vol.25 no.52
Keywords
Metabolic pathway predictionBioCycXGBoost
Abstract
Background
Metabolic pathway prediction is one possible approach to address the problem in system biology of reconstructing an organisms metabolic network from its genome sequence. Recently there have been developments in machine learning-based pathway prediction methods that conclude that machine learning-based approaches are similar in performance to the most used method, PathoLogic which is a rule-based method. One issue is that previous studies evaluated PathoLogic without taxonomic pruning which decreases its performance.

Results
In this study, we update the evaluation results from previous studies to demonstrate that PathoLogic with taxonomic pruning outperforms previous machine learning-based approaches and that further improvements in performance need to be made for them to be competitive. Furthermore, we introduce mlXGPR, a XGBoost-based metabolic pathway prediction method based on the multi-label classification pathway prediction framework introduced from mlLGPR. We also improve on this multi-label framework by utilizing correlations between labels using classifier chains. We propose a ranking method that determines the order of the chain so that lower performing classifiers are placed later in the chain to utilize the correlations between labels more. We evaluate mlXGPR with and without classifier chains on single-organism and multi-organism benchmarks. Our results indicate that mlXGPR outperform other previous pathway prediction methods including PathoLogic with taxonomic pruning in terms of hamming loss, precision and F1 score on single organism benchmarks.

Conclusions
The results from our study indicate that the performance of machine learning-based pathway prediction methods can be substantially improved and can even outperform PathoLogic with taxonomic pruning.
ISSN
1471-2105
Language
English
URI
https://hdl.handle.net/10371/198976
DOI
https://doi.org/10.1186/s12859-024-05666-0
Files in This Item:
Appears in Collections:

Altmetrics

Item View & Download Count

  • mendeley

Items in S-Space are protected by copyright, with all rights reserved, unless otherwise indicated.

Share