Die 10 zuletzt veröffentlichten Dokumente
The DECIMER.ai Project
(2024)
Over the past few decades, the number of publications describing chemical structures and their metadata has increased significantly. Chemists have published the majority of this information as bitmap images along with other important information as human-readable text in printed literature and have never been retained and preserved in publicly available databases as machine-readable formats. Manually extracting such data from printed literature is error-prone, time-consuming, and tedious. The recognition and translation of images of chemical structures from printed literature into machine-readable format is known as Optical Chemical Structure Recognition (OCSR). In recent years, deep-learning-based OCSR tools have become increasingly popular. While many of these tools claim to be highly accurate, they are either unavailable to the public or proprietary. Meanwhile, the available open-source tools are significantly time-consuming to set up. Furthermore, none of these offers an end-to-end workflow capable of detecting chemical structures, segmenting them, classifying them, and translating them into machine-readable formats.
To address this issue, we present the DECIMER.ai project, an open-source platform that provides an integrated solution for identifying, segmenting, and recognizing chemical structure depictions within the scientific literature. DECIMER.ai comprises three main components: DECIMER-Segmentation, which utilizes a Mask-RCNN model to detect and segment images of chemical structure depictions; DECIMER-Image Classifier EfficientNet-based classification model identifies which images contain chemical structures and DECIMER-Image Transformer which acts as an OCSR engine which combines an encoder-decoder model to convert the segmented chemical structure images into machine-readable formats, like the SMILES string.
A key strength of DECIMER.ai is that its algorithms are data-driven, relying solely on the training data to make accurate predictions without any hand-coded rules or assumptions. By offering this comprehensive, open-source, and transparent pipeline, DECIMER.ai enables automated extraction and representation of chemical data from unstructured publications, facilitating applications in chemoinformatics and drug discovery.
Advancements in hand-drawn chemical structure recognition through an enhanced DECIMER architecture
(2024)
Accurate recognition of hand-drawn chemical structures is crucial for digitising hand-written chemical information in traditional laboratory notebooks or facilitating stylus-based structure entry on tablets or smartphones. However, the inherent variability in hand-drawn structures poses challenges for existing Optical Chemical Structure Recognition (OCSR) software. To address this, we present an enhanced Deep lEarning for Chemical ImagE Recognition (DECIMER) architecture that leverages a combination of Convolutional Neural Networks (CNNs) and Transformers to improve the recognition of hand-drawn chemical structures. The model incorporates an EfficientNetV2 CNN encoder that extracts features from hand-drawn images, followed by a Transformer decoder that converts the extracted features into Simplified Molecular Input Line Entry System (SMILES) strings. Our models were trained using synthetic hand-drawn images generated by RanDepict, a tool for depicting chemical structures with different style elements. A benchmark was performed using a real-world dataset of hand-drawn chemical structures to evaluate the model's performance. The results indicate that our improved DECIMER architecture exhibits a significantly enhanced recognition accuracy compared to other approaches.