STOUT V2.0: SMILES to IUPAC name conversion using transformer models

  • Naming chemical compounds systematically is a complex task governed by a set of rules established by the International Union of Pure and Applied Chemistry (IUPAC). These rules are universal and widely accepted by chemists worldwide, but their complexity makes it challenging for individuals to consistently apply them accurately. A translation method can be employed to address this challenge. Accurate translation of chemical compounds from SMILES notation into their corresponding IUPAC names is crucial, as it can significantly streamline the laborious process of naming chemical structures. Here, we present STOUT (SMILES-TO-IUPAC-name translator) V2, which addresses this challenge by introducing a transformer-based model that translates string representations of chemical structures into IUPAC names. Trained on a dataset of nearly 1 billion SMILES strings and their corresponding IUPAC names, STOUT V2 demonstrates exceptional accuracy in generating IUPAC names, even for complex chemical structures. The model’s ability to capture intricate patterns and relationships within chemical structures enables it to generate precise and standardised IUPAC names. While established deterministic algorithms remain the gold standard for systematic chemical naming, our work, enabled by access to OpenEye’s Lexichem software through an academic license, demonstrates the potential of neural approaches to complement existing tools in chemical nomenclature.

Export metadata

Metadaten
Author:Kohulan Rajan, Achim Zielesny, Christoph Steinbeck
DOI:https://doi.org/10.1186/s13321-024-00941-x
Parent Title (English):Journal of Cheminformatics
Document Type:Article
Language:English
Date of Publication (online):2024/12/27
Date of first Publication:2024/12/27
Publishing Institution:Westfälische Hochschule Gelsenkirchen Bocholt Recklinghausen
Release Date:2025/01/23
Tag:Artificial Intelligence; IUPAC names; Machine Learning; SMILES; Transformer
Volume:2024
Issue:16:146
Departments / faculties:Institute / Institut für biologische und chemische Informatik
Licence (German):License LogoEs gilt das Urheberrechtsgesetz

$Rev: 13159 $