Filtern
Erscheinungsjahr
Dokumenttyp
- Wissenschaftlicher Artikel (40)
- Sonstiges (31)
- Konferenzveröffentlichung (25)
- Preprint (9)
- Buch (Monographie) (2)
- Lehrmaterial (2)
Schlagworte
- Dissipative Particle Dynamics (4)
- OCSR (3)
- CDK (2)
- DECIMER (2)
- Deep Learning (2)
- OCSR, Optical Chemical Structure Recognition (2)
- Transformer (2)
- AI (1)
- AlphaFold, ColabFold, PyMOL (1)
- Bone Morphogenetic Protein, BMP, BMP2 (1)
- Chemical image depiction (1)
- Chemical space (1)
- Chemical structure depictions (1)
- Cheminformatics (1)
- Chemistry Development Kit (1)
- Chemistry Development Kit, CDK, Molecule fragmentation, In silico fragmentation, Scaffolds, Functional groups, Glycosidic moieties, Rich client, Graphical user interface, GUI (1)
- Clustering (1)
- DPD, Dissipative Particle Dynamics (1)
- Deep learning (1)
- Depiction generator image augmentation (1)
- Dissipative particle dynamics, DPD, Surfactant, Bilayer, Lamellar, Simulation, Mesoscopic (1)
- Flory-Huggins parameter (1)
- Fragmentation (1)
- Hand-drawn chemical structures (1)
- Hand-drawn images (1)
- Indigo (1)
- Molecule images (1)
- Natural products (1)
- Optical Chemical Structure Recognition (1)
- RDKit (1)
- Scaffold (1)
- Scaffold network (1)
- Scaffold tree (1)
- artificial intelligence (1)
- intermolecular interaction (1)
- machine learning (1)
- molecular force field (1)
- optical chemical structure recognition (1)
- protein structure prediction (1)
The DECIMER.ai Project
(2024)
Over the past few decades, the number of publications describing chemical structures and their metadata has increased significantly. Chemists have published the majority of this information as bitmap images along with other important information as human-readable text in printed literature and have never been retained and preserved in publicly available databases as machine-readable formats. Manually extracting such data from printed literature is error-prone, time-consuming, and tedious. The recognition and translation of images of chemical structures from printed literature into machine-readable format is known as Optical Chemical Structure Recognition (OCSR). In recent years, deep-learning-based OCSR tools have become increasingly popular. While many of these tools claim to be highly accurate, they are either unavailable to the public or proprietary. Meanwhile, the available open-source tools are significantly time-consuming to set up. Furthermore, none of these offers an end-to-end workflow capable of detecting chemical structures, segmenting them, classifying them, and translating them into machine-readable formats.
To address this issue, we present the DECIMER.ai project, an open-source platform that provides an integrated solution for identifying, segmenting, and recognizing chemical structure depictions within the scientific literature. DECIMER.ai comprises three main components: DECIMER-Segmentation, which utilizes a Mask-RCNN model to detect and segment images of chemical structure depictions; DECIMER-Image Classifier EfficientNet-based classification model identifies which images contain chemical structures and DECIMER-Image Transformer which acts as an OCSR engine which combines an encoder-decoder model to convert the segmented chemical structure images into machine-readable formats, like the SMILES string.
A key strength of DECIMER.ai is that its algorithms are data-driven, relying solely on the training data to make accurate predictions without any hand-coded rules or assumptions. By offering this comprehensive, open-source, and transparent pipeline, DECIMER.ai enables automated extraction and representation of chemical data from unstructured publications, facilitating applications in chemoinformatics and drug discovery.
Geometries, stabilities, electronic properties and NMR-shielding of cucurbit[6]uril–spermine host-ligand complexes are investigated with DFT calculations and compared to experimental results. Cucurbit[6]uril and spermine can form complexes with two different minimum energy geometries and corresponding characteristic differences in NMR shielding. The energetically preferred complex geometry has a perfect inversion symmetry and its proton NMR shielding agrees very well with experimental results. The cucurbit[6]uril host molecule shows a distinct geometrical flexibility in ligand binding which allows an induced fit of the spermine ligand. The energetic barrier for the rotation of spermine in the favourable complex is approximated to be in the order of a few kilocalories per mole.
Steps Towards an Open All-in-one Rich-Client Environment for Particle-Based Mesoscopic Simulation
(2018)
SPICES (Simplified Particle Input ConnEction Specification) is a particle-based molecular structure representation derived from straightforward simplifications of the atom-based SMILES line notation. It aims at supporting tedious and error-prone molecular structure definitions for particle-based mesoscopic simulation techniques like Dissipative Particle Dynamics by allowing for an interplay of different molecular encoding levels that range from topological line notations and corresponding particle-graph visualizations to 3D structures with support of their spatial mapping into a simulation box. An open Java library for SPICES structure handling and mesoscopic simulation support in combination with an open Java Graphical User Interface viewer application for visual topological inspection of SPICES definitions are provided.
The concept of molecular scaffolds as defining core structures of organic molecules is utilised in many areas of chemistry and cheminformatics, e.g. drug design, chemical classification, or the analysis of high-throughput screening data. Here, we present Scaffold Generator, a comprehensive open library for the generation, handling, and display of molecular scaffolds, scaffold trees and networks. The new library is based on the Chemistry Development Kit (CDK) and highly customisable through multiple settings, e.g. five different structural framework definitions are available. For display of scaffold hierarchies, the open GraphStream Java library is utilised. Performance snapshots with natural products (NP) from the COCONUT (COlleCtion of Open Natural prodUcTs) database and drug molecules from DrugBank are reported. The generation of a scaffold network from more than 450,000 NP can be achieved within a single day.
The concept of molecular scaffolds as defining core structures of organic molecules is utilised in many areas of chemistry and cheminformatics, e.g. drug design, chemical classification, or the analysis of high-throughput screening data. Here, we present Scaffold Generator, a comprehensive open library for the generation, handling, and display of molecular scaffolds, scaffold trees and networks. The new library is based on the Chemistry Development Kit (CDK) and highly customisable through multiple settings, e.g. five different structural framework definitions are available. For display of scaffold hierarchies, the open GraphStream Java library is utilised. Performance snapshots with natural products (NP) from the COCONUT database and drug molecules from DrugBank are reported. The generation of a scaffold network from more than 450,000 NP can be achieved within a single day.
The development of deep learning-based optical chemical structure recognition (OCSR) systems has led to a need for datasets of chemical structure depictions. The diversity of the features in the training data is an important factor for the generation of deep learning systems that generalise well and are not overfit to a specific type of input. In the case of chemical structure depictions, these features are defined by the depiction parameters such as bond length, line thickness, label font style and many others. Here we present RanDepict, a toolkit for the creation of diverse sets of chemical structure depictions. The diversity of the image features is generated by making use of all available depiction parameters in the depiction functionalities of the CDK, RDKit, and Indigo. Furthermore, there is the option to enhance and augment the image with features such as curved arrows, chemical labels around the structure, or other kinds of distortions. Using depiction feature fingerprints, RanDepict ensures diversely picked image features. Here, the depiction and augmentation features are summarised in binary vectors and the MaxMin algorithm is used to pick diverse samples out of all valid options. By making all resources described herein publicly available, we hope to contribute to the development of deep learning-based OCSR systems.
The development of deep learning-based optical chemical structure recognition (OCSR) systems has led to a need for datasets of chemical structure depictions. The diversity of the features in the training data is an important factor for the generation of deep learning systems that generalise well and are not overfit to a specific type of input. In the case of chemical structure depictions, these features are defined by the depiction parameters such as bond length, line thickness, label font style and many others. Here we present RanDepict, a toolkit for the creation of diverse sets of chemical structure depictions. The diversity of the image features is generated by making use of all available depiction parameters in the depiction functionalities of the CDK, RDKit, and Indigo. Furthermore, there is the option to enhance and augment the image with features such as curved arrows, chemical labels around the structure, or other kinds of distortions. Using depiction feature fingerprints, RanDepict ensures diversely picked image features. Here, the depiction and augmentation features are summarised in binary vectors and the MaxMin algorithm is used to pick diverse samples out of all valid options. By making all resources described herein publicly available, we hope to contribute to the development of deep learning-based OCSR systems.
Computational methods for the accurate prediction of protein folding based on amino acid sequences have been researched for decades. The field has been significantly advanced in recent years by deep learning-based approaches, like AlphaFold, RoseTTAFold, or ColabFold. Although these can be used by the scientific community in various, mostly free and open ways, they are not yet widely used by bench scientists in relevant fields such as protein biochemistry or molecular biology, who are often not familiar with software tools such as scripting notebooks, command-line interfaces or cloud computing. In addition, visual inspection functionalities like protein structure displays, structure alignments, and specific protein hotspot analyses are required as a second step to interpret and apply the predicted structures in ongoing research studies.
PySSA (Python rich client for visual protein Sequence to Structure Analysis) is an open Graphical User Interface (GUI) application combining the protein sequence to structure prediction capabilities of ColabFold with the open-source variant of the molecular structure visualisation and analysis system PyMOL to make both available to the scientific end-user. PySSA enables the creation of managed and shareable projects with defined protein structure prediction and corresponding alignment workflows that can be conveniently performed by scientists without specialised computer skills or programming knowledge on their local computers. Thus, PySSA can help make protein structure prediction more accessible for end-users in protein chemistry and molecular biology as well as be used for educational purposes. It is openly available on GitHub, alongside a custom graphical installer executable for the Windows operating system: https://github.com/urban233/PySSA/wiki/Installation-for-Windows-Operating-System.
To demonstrate the capabilities of PySSA, its usage in a protein mutation study on the protein drug Bone Morphogenetic Protein 2 (BMP2) is described: the structure prediction results indicate that the previously reported BMP2-2Hep-7M mutant, which is intended to be less prone to aggregation, does not exhibit significant spatial rearrangements of amino acid residues interacting with the receptor.