Refine
Year of publication
Document Type
- Article (40)
- Other (31)
- Conference Proceeding (25)
- Preprint (9)
- Book (2)
- Course Material (2)
Keywords
- Dissipative Particle Dynamics (4)
- OCSR (3)
- CDK (2)
- DECIMER (2)
- Deep Learning (2)
- OCSR, Optical Chemical Structure Recognition (2)
- Transformer (2)
- AI (1)
- AlphaFold, ColabFold, PyMOL (1)
- Bone Morphogenetic Protein, BMP, BMP2 (1)
- Chemical image depiction (1)
- Chemical space (1)
- Chemical structure depictions (1)
- Cheminformatics (1)
- Chemistry Development Kit (1)
- Chemistry Development Kit, CDK, Molecule fragmentation, In silico fragmentation, Scaffolds, Functional groups, Glycosidic moieties, Rich client, Graphical user interface, GUI (1)
- Clustering (1)
- DPD, Dissipative Particle Dynamics (1)
- Deep learning (1)
- Depiction generator image augmentation (1)
- Dissipative particle dynamics, DPD, Surfactant, Bilayer, Lamellar, Simulation, Mesoscopic (1)
- Flory-Huggins parameter (1)
- Fragmentation (1)
- Hand-drawn chemical structures (1)
- Hand-drawn images (1)
- Indigo (1)
- Molecule images (1)
- Natural products (1)
- Optical Chemical Structure Recognition (1)
- RDKit (1)
- Scaffold (1)
- Scaffold network (1)
- Scaffold tree (1)
- artificial intelligence (1)
- intermolecular interaction (1)
- machine learning (1)
- molecular force field (1)
- optical chemical structure recognition (1)
- protein structure prediction (1)
From https://github.com/zielesny/MFsim:
MFsim - An open Java all-in-one rich-client simulation environment for mesoscopic simulation
MFsim is an open Java all-in-one rich-client computing environment for mesoscopic simulation with Jdpd as its default simulation kernel for Molecular Fragment Dissipative Particle Dynamics (DPD). The environment integrates and supports the complete preparation-simulation-evaluation triad of a mesoscopic simulation task. Productive highlights are a SPICES molecular structure editor, a PDB-to-SPICES parser for particle-based peptide/protein representations, a support of polymer definitions, a compartment editor for complex simulation box start configurations, interactive and flexible simulation box views including analytics, simulation movie generation or animated diagrams. As an open project, MFsim enables customized extensions for different fields of research.
MFsim uses several open libraries (see MFSimVersionHistory.txt for details and references below) and is published as open source under the GNU General Public License version 3 (see LICENSE).
MFsim has been described in the scientific literature and used for DPD studies (see references below).
From https://github.com/zielesny/Jdpd:
Jdpd - An open Java Simulation Kernel for Molecular Fragment Dissipative Particle Dynamics (DPD)
Jdpd is an open Java simulation kernel for Molecular Fragment Dissipative Particle Dynamics (DPD) with parallelizable force calculation, efficient caching options and fast property calculations. It is characterized by an interface and factory-pattern driven design for simple code changes and may help to avoid problems of polyglot programming. Detailed input/output communication, parallelization and process control as well as internal logging capabilities for debugging purposes are supported. The kernel may be utilized in different simulation environments ranging from flexible scripting solutions up to fully integrated “all-in-one” simulation systems like MFsim.
Since Jdpd version 1.6.1.0 Jdpd is available in a (basic) double-precision version and a (derived) single-precision version (= JdpdSP) for all numerical calculations, where the single precision version needs about half the memory of the double precision version.
Jdpd uses the Apache Commons Math and Apache Commons RNG libraries and is published as open source under the GNU General Public License version 3. This repository comprises the Java bytecode libraries (including the Apache Commons Math and RNG libraries), the Javadoc HTML documentation and the Netbeans source code packages including Unit tests.
Jdpd has been described in the scientific literature (the final manuscript 2018 - van den Broek - Jdpd - Final Manucsript.pdf is added to the repository) and used for DPD studies (see references below).
See text file JdpdVersionHistory.txt for a version history with more detailed information.
Computational methods for the accurate prediction of protein folding based on amino acid sequences have been researched for decades. The field has been significantly advanced in recent years by deep learning-based approaches, like AlphaFold, RoseTTAFold, or ColabFold. Although these can be used by the scientific community in various, mostly free and open ways, they are not yet widely used by bench scientists in relevant fields such as protein biochemistry or molecular biology, who are often not familiar with software tools such as scripting notebooks, command-line interfaces or cloud computing. In addition, visual inspection functionalities like protein structure displays, structure alignments, and specific protein hotspot analyses are required as a second step to interpret and apply the predicted structures in ongoing research studies.
PySSA (Python rich client for visual protein Sequence to Structure Analysis) is an open Graphical User Interface (GUI) application combining the protein sequence to structure prediction capabilities of ColabFold with the open-source variant of the molecular structure visualisation and analysis system PyMOL to make both available to the scientific end-user. PySSA enables the creation of managed and shareable projects with defined protein structure prediction and corresponding alignment workflows that can be conveniently performed by scientists without specialised computer skills or programming knowledge on their local computers. Thus, PySSA can help make protein structure prediction more accessible for end-users in protein chemistry and molecular biology as well as be used for educational purposes. It is openly available on GitHub, alongside a custom graphical installer executable for the Windows operating system: https://github.com/urban233/PySSA/wiki/Installation-for-Windows-Operating-System.
To demonstrate the capabilities of PySSA, its usage in a protein mutation study on the protein drug Bone Morphogenetic Protein 2 (BMP2) is described: the structure prediction results indicate that the previously reported BMP2-2Hep-7M mutant, which is intended to be less prone to aggregation, does not exhibit significant spatial rearrangements of amino acid residues interacting with the receptor.
An automated pipeline for comprehensive calculation of intermolecular interaction energies based on molecular force-fields using the Tinker molecular modelling package is presented. Starting with non-optimized chemically intuitive monomer structures, the pipeline allows the approximation of global minimum energy monomers and dimers, configuration sampling for various monomer-monomer distances, estimation of coordination numbers by molecular dynamics simulations, and the evaluation of differential pair interaction energies. The latter are used to derive Flory-Huggins parameters and isotropic particle-particle repulsions for Dissipative Particle Dynamics (DPD). The computational results for force fields MM3, MMFF94, OPLS-AA and AMOEBA09 are analyzed with Density Functional Theory (DFT) calculations and DPD simulations for a mixture of the non-ionic polyoxyethylene alkyl ether surfactant C10E4 with water to demonstrate the usefulness of the approach.
Advancements in Hand-Drawn Chemical Structure Recognition through an Enhanced DECIMER Architecture
(2024)
Accurate recognition of hand-drawn chemical structures is crucial for digitising hand-written chemical information found in traditional laboratory notebooks or for facilitating stylus-based structure entry on tablets or smartphones. However, the inherent variability in hand-drawn structures poses challenges for existing Optical Chemical Structure Recognition (OCSR) software. To address this, we present an enhanced Deep lEarning for Chemical ImagE Recognition (DECIMER) architecture that leverages a combination of Convolutional Neural Networks (CNNs) and Transformers to improve the recognition of hand-drawn chemical structures. The model incorporates an EfficientNetV2 CNN encoder that extracts features from hand-drawn images, followed by a Transformer decoder that converts the extracted features into Simplified Molecular Input Line Entry System (SMILES) strings. Our models were trained using synthetic hand-drawn images generated by RanDepict, a tool for depicting chemical structures with different style elements. To evaluate the model's performance, a benchmark was performed using a real-world dataset of hand-drawn chemical structures. The results indicate that our improved DECIMER architecture exhibits a significantly enhanced recognition accuracy compared to other approaches.
Inspired by the super-human performance of deep learning models in playing the game of Go after being presented with virtually unlimited training data, we looked into areas in chemistry where similar situations could be achieved. Encountering large amounts of training data in chemistry is still rare, so we turned to two areas where realistic training data can be fabricated in large quantities, namely a) the recognition of machine-readable structures from images of chemical diagrams and b) the conversion of IUPAC(-like) names into structures and vice versa. In this talk, we outline the challenges, technical implementation and results of this study.
Optical Chemical Structure Recognition (OCSR): Vast amounts of chemical information remain hidden in the primary literature and have yet to be curated into open-access databases. To automate the process of extracting chemical structures from scientific papers, we developed the DECIMER.ai project. This open-source platform provides an integrated solution for identifying, segmenting, and recognising chemical structure depictions in scientific literature. DECIMER.ai comprises three main components: DECIMER-Segmentation, which utilises a Mask-RCNN model to detect and segment images of chemical structure depictions; DECIMER-Image Classifier EfficientNet-based classification model identifies which images contain chemical structures and DECIMER-Image Transformer which acts as an OCSR engine which combines an encoder-decoder model to convert the segmented chemical structure images into machine-readable formats, like the SMILES string.
DECIMER.ai is data-driven, relying solely on the training data to make accurate predictions without hand-coded rules or assumptions. The latest model was trained with 127 million structures and 483 million depictions (4 different per structure) on Google TPU-V4 VMs
Name to Structure Conversion: The conversion of structures to IUPAC(-like) or systematic names has been solved algorithmically or rule-based in satisfying ways. This fact, on the other side, provided us with an opportunity to generate a name-structure training pair at a very large scale to train a proof-of-concept transformer network and evaluate its performance.
In this work, the largest model was trained using almost one billion SMILES strings. The Lexichem software utility from OpenEye was employed to generate the IUPAC names used in the training process. STOUT V2 was trained on Google TPU-V4 VMs. The model's accuracy was validated through one-to-one string matching, BLEU scores, and Tanimoto similarity calculations. To further verify the model's reliability, every IUPAC name generated by STOUT V2 was analysed for accuracy and retranslated using OPSIN, a widely used open-source software for converting IUPAC names to SMILES. This additional validation step confirmed the high fidelity of STOUT V2's translations.
The DECIMER.ai Project
(2024)
Over the past few decades, the number of publications describing chemical structures and their metadata has increased significantly. Chemists have published the majority of this information as bitmap images along with other important information as human-readable text in printed literature and have never been retained and preserved in publicly available databases as machine-readable formats. Manually extracting such data from printed literature is error-prone, time-consuming, and tedious. The recognition and translation of images of chemical structures from printed literature into machine-readable format is known as Optical Chemical Structure Recognition (OCSR). In recent years, deep-learning-based OCSR tools have become increasingly popular. While many of these tools claim to be highly accurate, they are either unavailable to the public or proprietary. Meanwhile, the available open-source tools are significantly time-consuming to set up. Furthermore, none of these offers an end-to-end workflow capable of detecting chemical structures, segmenting them, classifying them, and translating them into machine-readable formats.
To address this issue, we present the DECIMER.ai project, an open-source platform that provides an integrated solution for identifying, segmenting, and recognizing chemical structure depictions within the scientific literature. DECIMER.ai comprises three main components: DECIMER-Segmentation, which utilizes a Mask-RCNN model to detect and segment images of chemical structure depictions; DECIMER-Image Classifier EfficientNet-based classification model identifies which images contain chemical structures and DECIMER-Image Transformer which acts as an OCSR engine which combines an encoder-decoder model to convert the segmented chemical structure images into machine-readable formats, like the SMILES string.
A key strength of DECIMER.ai is that its algorithms are data-driven, relying solely on the training data to make accurate predictions without any hand-coded rules or assumptions. By offering this comprehensive, open-source, and transparent pipeline, DECIMER.ai enables automated extraction and representation of chemical data from unstructured publications, facilitating applications in chemoinformatics and drug discovery.
Advancements in hand-drawn chemical structure recognition through an enhanced DECIMER architecture
(2024)
Accurate recognition of hand-drawn chemical structures is crucial for digitising hand-written chemical information in traditional laboratory notebooks or facilitating stylus-based structure entry on tablets or smartphones. However, the inherent variability in hand-drawn structures poses challenges for existing Optical Chemical Structure Recognition (OCSR) software. To address this, we present an enhanced Deep lEarning for Chemical ImagE Recognition (DECIMER) architecture that leverages a combination of Convolutional Neural Networks (CNNs) and Transformers to improve the recognition of hand-drawn chemical structures. The model incorporates an EfficientNetV2 CNN encoder that extracts features from hand-drawn images, followed by a Transformer decoder that converts the extracted features into Simplified Molecular Input Line Entry System (SMILES) strings. Our models were trained using synthetic hand-drawn images generated by RanDepict, a tool for depicting chemical structures with different style elements. A benchmark was performed using a real-world dataset of hand-drawn chemical structures to evaluate the model's performance. The results indicate that our improved DECIMER architecture exhibits a significantly enhanced recognition accuracy compared to other approaches.
MFsim - An open Java all-in-one rich-client simulation environment for mesoscopic simulation
MFsim is an open Java all-in-one rich-client computing environment for mesoscopic simulation with Jdpd as its default simulation kernel for Molecular Fragment Dissipative Particle Dynamics (DPD). The environment integrates and supports the complete preparation-simulation-evaluation triad of a mesoscopic simulation task. Productive highlights are a SPICES molecular structure editor, a PDB-to-SPICES parser for particle-based peptide/protein representations, a support of polymer definitions, a compartment editor for complex simulation box start configurations, interactive and flexible simulation box views including analytics, simulation movie generation or animated diagrams. As an open project, MFsim enables customized extensions for different fields of research.
MFsim uses several open libraries (see MFSimVersionHistory.txt for details and references below) and is published as open source under the GNU General Public License version 3 (see LICENSE).
MFsim has been described in the scientific literature and used for DPD studies.
Jdpd - An open Java Simulation Kernel for Molecular Fragment Dissipative Particle Dynamics (DPD)
Jdpd is an open Java simulation kernel for Molecular Fragment Dissipative Particle Dynamics (DPD) with parallelizable force calculation, efficient caching options and fast property calculations. It is characterized by an interface and factory-pattern driven design for simple code changes and may help to avoid problems of polyglot programming. Detailed input/output communication, parallelization and process control as well as internal logging capabilities for debugging purposes are supported. The kernel may be utilized in different simulation environments ranging from flexible scripting solutions up to fully integrated “all-in-one” simulation systems like MFsim.
Since Jdpd version 1.6.1.0 Jdpd is available in a (basic) double-precision version and a (derived) single-precision version (= JdpdSP) for all numerical calculations, where the single precision version needs about half the memory of the double precision version.
Jdpd uses the Apache Commons Math and Apache Commons RNG libraries and is published as open source under the GNU General Public License version 3. This repository comprises the Java bytecode libraries (including the Apache Commons Math and RNG libraries), the Javadoc HTML documentation and the Netbeans source code packages including Unit tests.
Jdpd has been described in the scientific literature (the final manuscript 2018 - van den Broek - Jdpd - Final Manucsript.pdf is added to the repository) and used for DPD studies (see references below).
See text file JdpdVersionHistory.txt for a version history with more detailed information.