Publications
Detailed Information
Reconstruction of lossless molecular representations from fingerprints
Cited 0 time in
Web of Science
Cited 0 time in Scopus
- Authors
- Issue Date
- 2023-02-23
- Publisher
- BMC
- Citation
- Journal of Cheminformatics, 15(1):26
- Keywords
- Fingerprints ; SMILES ; SELFIES ; Neural Machine Translation
- Abstract
- The simplified molecular-input line-entry system (SMILES) is the most prevalent molecular representation used in AI-based chemical applications. However, there are innate limitations associated with the internal structure of SMILES representations. In this context, this study exploits the resolution and robustness of unique molecular representations, i.e., SMILES and SELFIES (SELF-referencIng Embedded strings), reconstructed from a set of structural fingerprints, which are proposed and used herein as vital representational tools for chemical and natural language processing (NLP) applications. This is achieved by restoring the connectivity information lost during fingerprint transformation with high accuracy. Notably, the results reveal that seemingly irreversible molecule-to-fingerprint conversion is feasible. More specifically, four structural fingerprints, extended connectivity, topological torsion, atom pairs, and atomic environments can be used as inputs and outputs of chemical NLP applications. Therefore, this comprehensive study addresses the major limitation of structural fingerprints that precludes their use in NLP models. Our findings will facilitate the development of text- or fingerprint-based chemoinformatic models for generative and translational tasks.
- ISSN
- 1758-2946
- Language
- English
- Files in This Item:
- Appears in Collections:
Item View & Download Count
Items in S-Space are protected by copyright, with all rights reserved, unless otherwise indicated.