Publications

Detailed Information

Foldcomp: a library and format for compressing and indexing large protein structure sets

Cited 5 time in Web of Science Cited 7 time in Scopus
Authors

Kim, Hyunbin; Mirdita, Milot; Steinegger, Martin

Issue Date
2023-04
Publisher
Oxford University Press
Citation
Bioinformatics, Vol.39 No.4
Abstract
Highly accurate protein structure predictors have generated hundreds of millions of protein structures; these pose a challenge in terms of storage and processing. Here, we present Foldcomp, a novel lossy structure compression algorithm, and indexing system to address this challenge. By using a combination of internal and Cartesian coordinates and a bi-directional NeRF-based strategy, Foldcomp improves the compression ratio by a factor of three compared to the next best method. Its reconstruction error of 0.08 angstrom is comparable to the best lossy compressor. It is five times faster than the next fastest compressor and competes with the fastest decompressors. With its multi-threading implementation and a Python interface that allows for easy database downloads and efficient querying of protein structures by accession, Foldcomp is a powerful tool for managing and analysing large collections of protein structures.
ISSN
1367-4803
URI
https://hdl.handle.net/10371/202507
DOI
https://doi.org/10.1093/bioinformatics/btad153
Files in This Item:
There are no files associated with this item.
Appears in Collections:

Related Researcher

  • College of Natural Sciences
  • School of Biological Sciences
Research Area Development of algorithms to search, cluster and assemble sequence data, Metagenomic analysis, Pathogen detection in sequencing data

Altmetrics

Item View & Download Count

  • mendeley

Items in S-Space are protected by copyright, with all rights reserved, unless otherwise indicated.

Share