Publications

Detailed Information

Reconstruction of lossless molecular representations from fingerprints

Cited 4 time in Web of Science Cited 5 time in Scopus
Authors

Ucak, Umit V. V.; Ashyrmamatov, Islambek; Lee, Ju Yong

Issue Date
2023-02
Publisher
Chemistry Central
Citation
Journal of Cheminformatics, Vol.15 No.1, p. 13321
Abstract
The simplified molecular-input line-entry system (SMILES) is the most prevalent molecular representation used in AI-based chemical applications. However, there are innate limitations associated with the internal structure of SMILES representations. In this context, this study exploits the resolution and robustness of unique molecular representations, i.e., SMILES and SELFIES (SELF-referencIng Embedded strings), reconstructed from a set of structural fingerprints, which are proposed and used herein as vital representational tools for chemical and natural language processing (NLP) applications. This is achieved by restoring the connectivity information lost during fingerprint transformation with high accuracy. Notably, the results reveal that seemingly irreversible molecule-to-fingerprint conversion is feasible. More specifically, four structural fingerprints, extended connectivity, topological torsion, atom pairs, and atomic environments can be used as inputs and outputs of chemical NLP applications. Therefore, this comprehensive study addresses the major limitation of structural fingerprints that precludes their use in NLP models. Our findings will facilitate the development of text- or fingerprint-based chemoinformatic models for generative and translational tasks.
ISSN
1758-2946
URI
https://hdl.handle.net/10371/201500
DOI
https://doi.org/10.1186/s13321-023-00693-0
Files in This Item:
There are no files associated with this item.
Appears in Collections:

Related Researcher

  • Graduate School of Convergence Science & Technology
  • Dept. of Molecular and Biopharmaceutical Sciences
Research Area AI models for drug discovery, Free energy calculation, Molecular dynamics, 분자동역학, 신약개발을 위한 AI 모델, 자유에너지 계산

Altmetrics

Item View & Download Count

  • mendeley

Items in S-Space are protected by copyright, with all rights reserved, unless otherwise indicated.

Share