Publications

Detailed Information

AlphaFold Protein Structure Database in 2024: providing structure coverage for over 214 million protein sequences

Cited 18 time in Web of Science Cited 17 time in Scopus
Authors

Varadi, Mihaly; Bertoni, Damian; Magana, Paulyna; Paramval, Urmila; Pidruchna, Ivanna; Radhakrishnan, Malarvizhi; Tsenkov, Maxim; Nair, Sreenath; Mirdita, Milot; Yeo, Jingi; Kovalevskiy, Oleg; Tunyasuvunakool, Kathryn; Laydon, Agata; Zidek, Augustin; Tomlinson, Hamish; Hariharan, Dhavanthi; Abrahamson, Josh; Green, Tim; Jumper, John; Birney, Ewan; Steinegger, Martin; Hassabis, Demis; Velankar, Sameer

Issue Date
2023-11
Publisher
Oxford University Press
Citation
Nucleic Acids Research, Vol.52 No.D1, pp.D368-D375
Abstract
The AlphaFold Database Protein Structure Database (AlphaFold DB, https://alphafold.ebi.ac.uk) has significantly impacted structural biology by amassing over 214 million predicted protein structures, expanding from the initial 300k structures released in 2021. Enabled by the groundbreaking AlphaFold2 artificial intelligence (AI) system, the predictions archived in AlphaFold DB have been integrated into primary data resources such as PDB, UniProt, Ensembl, InterPro and MobiDB. Our manuscript details subsequent enhancements in data archiving, covering successive releases encompassing model organisms, global health proteomes, Swiss-Prot integration, and a host of curated protein datasets. We detail the data access mechanisms of AlphaFold DB, from direct file access via FTP to advanced queries using Google Cloud Public Datasets and the programmatic access endpoints of the database. We also discuss the improvements and services added since its initial release, including enhancements to the Predicted Aligned Error viewer, customisation options for the 3D viewer, and improvements in the search engine of AlphaFold DB. The AlphaFold Protein Structure Database (AlphaFold DB) is a massive digital library of predicted protein structures, with over 214 million entries, marking a 500-times expansion in size since its initial release in 2021. The structures are predicted using Google DeepMind's AlphaFold 2 artificial intelligence (AI) system. Our new report highlights the latest updates we have made to this database. We have added more data on specific organisms and proteins related to global health and expanded to cover almost the complete UniProt database, a primary data resource of protein sequences. We also made it easier for our users to access the data by directly downloading files or using advanced cloud-based tools. Finally, we have also improved how users view and search through these protein structures, making the user experience smoother and more informative. In short, AlphaFold DB has been growing rapidly and has become more user-friendly and robust to support the broader scientific community. Graphical Abstract
ISSN
0305-1048
URI
https://hdl.handle.net/10371/202490
DOI
https://doi.org/10.1093/nar/gkad1011
Files in This Item:
There are no files associated with this item.
Appears in Collections:

Related Researcher

  • College of Natural Sciences
  • School of Biological Sciences
Research Area Development of algorithms to search, cluster and assemble sequence data, Metagenomic analysis, Pathogen detection in sequencing data

Altmetrics

Item View & Download Count

  • mendeley

Items in S-Space are protected by copyright, with all rights reserved, unless otherwise indicated.

Share