Filtern
Erscheinungsjahr
- 2024 (6) (entfernen)
Dokumenttyp
- Konferenzveröffentlichung (6) (entfernen)
Schlagworte
- Abusive Supervision (1)
- Burnout (1)
- CFD Simulation (1)
- High Reynold Numer (1)
- Mastery Experience (1)
- Physics-Informed Deep Learning (1)
Inspired by the super-human performance of deep learning models in playing the game of Go after being presented with virtually unlimited training data, we looked into areas in chemistry where similar situations could be achieved. Encountering large amounts of training data in chemistry is still rare, so we turned to two areas where realistic training data can be fabricated in large quantities, namely a) the recognition of machine-readable structures from images of chemical diagrams and b) the conversion of IUPAC(-like) names into structures and vice versa. In this talk, we outline the challenges, technical implementation and results of this study.
Optical Chemical Structure Recognition (OCSR): Vast amounts of chemical information remain hidden in the primary literature and have yet to be curated into open-access databases. To automate the process of extracting chemical structures from scientific papers, we developed the DECIMER.ai project. This open-source platform provides an integrated solution for identifying, segmenting, and recognising chemical structure depictions in scientific literature. DECIMER.ai comprises three main components: DECIMER-Segmentation, which utilises a Mask-RCNN model to detect and segment images of chemical structure depictions; DECIMER-Image Classifier EfficientNet-based classification model identifies which images contain chemical structures and DECIMER-Image Transformer which acts as an OCSR engine which combines an encoder-decoder model to convert the segmented chemical structure images into machine-readable formats, like the SMILES string.
DECIMER.ai is data-driven, relying solely on the training data to make accurate predictions without hand-coded rules or assumptions. The latest model was trained with 127 million structures and 483 million depictions (4 different per structure) on Google TPU-V4 VMs
Name to Structure Conversion: The conversion of structures to IUPAC(-like) or systematic names has been solved algorithmically or rule-based in satisfying ways. This fact, on the other side, provided us with an opportunity to generate a name-structure training pair at a very large scale to train a proof-of-concept transformer network and evaluate its performance.
In this work, the largest model was trained using almost one billion SMILES strings. The Lexichem software utility from OpenEye was employed to generate the IUPAC names used in the training process. STOUT V2 was trained on Google TPU-V4 VMs. The model's accuracy was validated through one-to-one string matching, BLEU scores, and Tanimoto similarity calculations. To further verify the model's reliability, every IUPAC name generated by STOUT V2 was analysed for accuracy and retranslated using OPSIN, a widely used open-source software for converting IUPAC names to SMILES. This additional validation step confirmed the high fidelity of STOUT V2's translations.
The DECIMER.ai Project
(2024)
Over the past few decades, the number of publications describing chemical structures and their metadata has increased significantly. Chemists have published the majority of this information as bitmap images along with other important information as human-readable text in printed literature and have never been retained and preserved in publicly available databases as machine-readable formats. Manually extracting such data from printed literature is error-prone, time-consuming, and tedious. The recognition and translation of images of chemical structures from printed literature into machine-readable format is known as Optical Chemical Structure Recognition (OCSR). In recent years, deep-learning-based OCSR tools have become increasingly popular. While many of these tools claim to be highly accurate, they are either unavailable to the public or proprietary. Meanwhile, the available open-source tools are significantly time-consuming to set up. Furthermore, none of these offers an end-to-end workflow capable of detecting chemical structures, segmenting them, classifying them, and translating them into machine-readable formats.
To address this issue, we present the DECIMER.ai project, an open-source platform that provides an integrated solution for identifying, segmenting, and recognizing chemical structure depictions within the scientific literature. DECIMER.ai comprises three main components: DECIMER-Segmentation, which utilizes a Mask-RCNN model to detect and segment images of chemical structure depictions; DECIMER-Image Classifier EfficientNet-based classification model identifies which images contain chemical structures and DECIMER-Image Transformer which acts as an OCSR engine which combines an encoder-decoder model to convert the segmented chemical structure images into machine-readable formats, like the SMILES string.
A key strength of DECIMER.ai is that its algorithms are data-driven, relying solely on the training data to make accurate predictions without any hand-coded rules or assumptions. By offering this comprehensive, open-source, and transparent pipeline, DECIMER.ai enables automated extraction and representation of chemical data from unstructured publications, facilitating applications in chemoinformatics and drug discovery.
An automated pipeline for comprehensive calculation of intermolecular interaction energies based on molecular force-fields using the Tinker molecular modelling package is presented. Starting with non-optimized chemically intuitive monomer structures, the pipeline allows the approximation of global minimum energy monomers and dimers, configuration sampling for various monomer-monomer distances, estimation of coordination numbers by molecular dynamics simulations, and the evaluation of differential pair interaction energies. The latter are used to derive Flory-Huggins parameters and isotropic particle-particle repulsions for Dissipative Particle Dynamics (DPD). The computational results for force fields MM3, MMFF94, OPLSAA and AMOEBA09 are analyzed with Density Functional Theory (DFT) calculations and DPD simulations for a mixture of the non-ionic polyoxyethylene alkyl ether surfactant C10E4 with water to demonstrate the usefulness of the approach.
Einleitung und Fragestellung:
Abusive Supervision wird mit willentlicher Leistungszurückhaltung, verringerter Motivation, erhöhtem Stresserleben, psychosomatischen Beschwerden und Burnout bei Mitarbeitenden assoziiert. Angesichts der hohen Prävalenz destruktiver Führung bleibt bislang die Frage offen, welche
protektiven Ressourcen die genannten Zusammenhänge abpuffern.
Theoretischer Hintergrund:
Abusive Supervision bezieht sich auf das Ausmaß der feindseligen verbalen und nonverbalen Verhaltensweisen einer Führungskraft. Basierend auf dem Anforderungs- Ressourcen- Modell gehen wir davon aus, dass sich personale Ressourcen, die Mitarbeitende in der arbeitsfreien Zeit aufbauen, positiv auf den negativen Effekt zwischen destruktiver Führung und Mitarbeitergesundheit auswirken. Wir fokussieren hier die generalisierte Selbstwirksamkeitserwartung, die sich im Sinne der sozialkognitiven Theorie und zahlreichen empirischen Befunden als gesundheitsrelevante Ressource im
Umgang mit domänenübergreifenden Belastungen herausgestellt hat. Diese sollte durch Bewältigungserfahrung in der arbeitsfreien Zeit gefördert werden. Bewältigungserfahrung in der Freizeit bedeutet die Gelegenheit des Erlebens von Kompetenz und Fachwissen.
Methode:
Die Moderatoranalyse wurde im Rahmen einer Querschnittsbefragung einer anfallenden Stichprobe mit N = 305 Personen getestet. Die Variablen wurden mit der Abusive Supervision Scale (Tepper, 2000), dem REQ (Sonnentag & Fritz, 2007), und der Subskala emotionale Erschöpfung des MBI (Büssing & Perrar, 1992) gemessen.
Ergebnisse:
In dieser Studie zeigen „Mastery Experiences“ einen hypothesenkonformen Puffereffekt, nicht jedoch die anderen Erholungsstrategien, die auch mit getestet wurden. Es zeigt sich also die Tendenz, dass sich Mitarbeitende durch das Erlernen neuer Kompetenzen und den Aufbau von Selbstwirksamkeit vor den gesundheitsschädlichen Auswirkungen destruktiver Führung schützen können. Das
Korrelationsmuster deutet aber vrmtl. auch problematische Aspekte dieser Erholungsstrategie an.
Diskussion:
Limitierend muss erwähnt werden, dass wir die vermutete vermittelnde Variable Selbstwirksamkeit nicht explizit gemessen haben, und dass zukünftige Untersuchungen den Effekt in Form einer mediierten Moderation replizieren müssen.