Publications

Detailed Information

Reconstruction of lossless molecular representations from fingerprints

Cited 0 time in Web of Science Cited 0 time in Scopus
Authors

Ucak, Umit V.; Ashyrmamatov, Islambek; Lee, Juyong

Issue Date
2023-02-23
Publisher
BMC
Citation
Journal of Cheminformatics, 15(1):26
Keywords
FingerprintsSMILESSELFIESNeural Machine Translation
Abstract
The simplified molecular-input line-entry system (SMILES) is the most prevalent molecular representation used in AI-based chemical applications. However, there are innate limitations associated with the internal structure of SMILES representations. In this context, this study exploits the resolution and robustness of unique molecular representations, i.e., SMILES and SELFIES (SELF-referencIng Embedded strings), reconstructed from a set of structural fingerprints, which are proposed and used herein as vital representational tools for chemical and natural language processing (NLP) applications. This is achieved by restoring the connectivity information lost during fingerprint transformation with high accuracy. Notably, the results reveal that seemingly irreversible molecule-to-fingerprint conversion is feasible. More specifically, four structural fingerprints, extended connectivity, topological torsion, atom pairs, and atomic environments can be used as inputs and outputs of chemical NLP applications. Therefore, this comprehensive study addresses the major limitation of structural fingerprints that precludes their use in NLP models. Our findings will facilitate the development of text- or fingerprint-based chemoinformatic models for generative and translational tasks.
ISSN
1758-2946
Language
English
URI
https://hdl.handle.net/10371/192364
DOI
https://doi.org/10.1186/s13321-023-00693-0
Files in This Item:
Appears in Collections:

Altmetrics

Item View & Download Count

  • mendeley

Items in S-Space are protected by copyright, with all rights reserved, unless otherwise indicated.

Share