• search hit 10 of 22
Back to Result List

Performance of chemical structure string representations for chemical image recognition using transformers

  • The use of molecular string representations for deep learning in chemistry has been steadily increasing in recent years. The complexity of existing string representations, and the difficulty in creating meaningful tokens from them, lead to the development of new string representations for chemical structures. In this study, the translation of chemical structure depictions in the form of bitmap images to corresponding molecular string representations was examined. An analysis of the recently developed DeepSMILES and SELFIES representations in comparison with the most commonly used SMILES representation is presented where the ability to translate image features into string representations with transformer models was specifically tested. The SMILES representation exhibits the best overall performance whereas SELFIES guarantee valid chemical structures. DeepSMILES perform in between SMILES and SELFIES, InChIs are not appropriate for the learning task. All investigations were performed using publicly available datasets and the code used to train and evaluate the models has been made available to the public.

Export metadata

Additional Services

Share in Twitter Search Google Scholar
Metadaten
Author:Kohulan Rajan, Christoph Steinbeck, Achim Zielesny
ISSN:2635-098X
Parent Title (English):Digital Discovery
Publisher:Royal Society of Chemistry
Place of publication:Cambridge
Document Type:Article
Language:English
Date of Publication (online):2022/01/15
Date of first Publication:2022/01/15
Publishing Institution:Westfälische Hochschule Gelsenkirchen Bocholt Recklinghausen
Release Date:2022/12/22
Volume:1.2022
Issue:1
Pagenumber:7
First Page:84
Last Page:90
Departments / faculties:Institute / Institut für biologische und chemische Informatik
Licence (German):License LogoEs gilt das Urheberrechtsgesetz

$Rev: 13159 $