Publications

Detailed Information

Unifying the known and unknown microbial coding sequence space

Cited 23 time in Web of Science Cited 24 time in Scopus
Authors

Vanni, Chiara; Schechter, Matthew S.; Acinas, Silvia G.; Barberán, Albert; Buttigieg, Pier Luigi; Casamayor, Emilio O.; Delmont, Tom O.; Duarte, Carlos M.; Eren, A. Murat; Finn, Robert D.; Kottmann, Renzo; Mitchell, Alex; Sanchez, Pablo; Siren, Kimmo; Steinegger, Martin; Glöckner, Frank Oliver; Fernandez-Guerra, Antonio

Issue Date
2022-03
Publisher
eLife Sciences Publications
Citation
eLife, Vol.11, p. e67667
Abstract
Genes of unknown function are among the biggest challenges in molecular biology, especially in microbial systems, where 40-60% of the predicted genes are unknown. Despite previous attempts, systematic approaches to include the unknown fraction into analytical workflows are still lacking. Here, we present a conceptual framework, its translation into the computational workflow AGNOSTOS and a demonstration on how we can bridge the known-unknown gap in genomes and metagenomes. By analyzing 415,971,742 genes predicted from 1749 metagenomes and 28,941 bacterial and archaeal genomes, we quantify the extent of the unknown fraction, its diversity, and its relevance across multiple organisms and environments. The unknown sequence space is exceptionally diverse, phylogenetically more conserved than the known fraction and predominantly taxonomically restricted at the species level. From the 71 M genes identified to be of unknown function, we compiled a collection of 283,874 lineage-specific genes of unknown function for Cand. Patescibacteria (also known as Candidate Phyla Radiation, CPR), which provides a signifi-cant resource to expand our understanding of their unusual biology. Finally, by identifying a target gene of unknown function for antibiotic resistance, we demonstrate how we can enable the genera-tion of hypotheses that can be used to augment experimental data.
ISSN
2050-084X
URI
https://hdl.handle.net/10371/202532
DOI
https://doi.org/10.7554/eLife.67667
Files in This Item:
There are no files associated with this item.
Appears in Collections:

Related Researcher

  • College of Natural Sciences
  • School of Biological Sciences
Research Area Development of algorithms to search, cluster and assemble sequence data, Metagenomic analysis, Pathogen detection in sequencing data

Altmetrics

Item View & Download Count

  • mendeley

Items in S-Space are protected by copyright, with all rights reserved, unless otherwise indicated.

Share