Filtern
Erscheinungsjahr
Dokumenttyp
- Konferenzveröffentlichung (200) (entfernen)
Sprache
- Englisch (200) (entfernen)
Volltext vorhanden
- nein (200) (entfernen)
Schlagworte
- Field measurement (2)
- Solar modules (2)
- AEM-Electrolysis (1)
- Air handling unit (1)
- Alternative Geschäftsmodelle (1)
- Assisted living technologies (1)
- Assistive robotics (1)
- Augmented Reality (1)
- CFD Simulation (1)
- Climate change (1)
Institut
- Westfälisches Institut für Gesundheit (49)
- Institut für Internetsicherheit (45)
- Westfälisches Energieinstitut (23)
- Elektrotechnik und angewandte Naturwissenschaften (18)
- Informatik und Kommunikation (16)
- Maschinenbau Bocholt (12)
- Wirtschaft und Informationstechnik Bocholt (7)
- Institut für biologische und chemische Informatik (6)
- Fachbereiche (2)
- Institut Arbeit und Technik (2)
The Unfitted Discontinuous Galerkin Method for Solving the EEG Forward Problem: A Second Order Study
(2016)
A simplified model for spondylodesis, ie fixation of vertebrae by osteosynthesis, is developed for virtual magnetic resonance imaging (MRI) examinations to numerically calculate energy absorption. This paper presents results of calculated energy absorption in body tissue surrounding titanium rod implants. In general each wire or rod behaves like an antenna in electromagnetic fields. The specific absorption rate (SAR) profile describes dependence of implant size. SAR hotspots appear near the rod edges. Depending of the size of implant fixation SAR is 62%(small fixation) up to 90.95%(large fixation) higher than without implants. In addition, local SAR profile displays local dependency on tissue: SAR is lower between the vertebrae.
Thermal Stress at the Surface of Thick Conductive Plates Induced by Sinusoidal Current Pulses
(2016)
In this experimental work we present a novel electrolyzer system for the production of hydrogen and oxygen at high pressure levels without an additional mechanical compressor. Due to its control strategies, the operation conditions for this electrolyzer can be kept optimal for each load situation of the system. Furthermore, the novel system design allows for dynamic long-term operation as well as for easy maintainability. Therefore, the device meets the requirements for prospective power-to-gas applications, especially, in order to store excess energy from renewable sources. A laboratory scale device has been developed and high-pressure operation was validated. We also studied the long-term stability of the system by applying dynamic load cycles with load changes every 30 sec. After 80 h of operation the used membrane electrode assembly (MEA) was investigated by means of SEM, EDX and XRD analysis.
CoCoSpot: Clustering and recognizing botnet command and control channels using traffic analysis
(2017)
Improved Plasma Membrane Models as Test Systems for the Membrane
Disrupting Activity of Kalata B1
(2017)
There is a strongly held belief that if companies can direct their marketing activities to improve customer attitudes and intentions, it will impact on purchase behaviors. Departing from complementary yet sometimes conflicting findings of the current literature, we intend to contribute to the literature by answering two related questions. First, we investigate drivers of loyalty intention over time, and by so doing try to better understand loyalty formation. Second, once we understand loyalty formation, we assess the impact of loyalty on different aspects of purchase behavior, considering temporal effects. Therefore, we develop a consumption-system model which assumes that perceptions, intention, and the impact of perceptions and intention on behavior in one period serve as anchors for the same constructs in a subsequent period, implying a pattern of repeated consumption over time.
Using 3SLS regression analysis, results of a large-scale study using survey data from a sample of 2,478 customers from two points in time and purchase data gathered over a 30-month period suggest interesting findings on the two aforementioned questions:
Considering the first question, we find strong support for customer equity drivers directly influencing loyalty. Moreover, we see evidence for loyalty formation as a consumption-system as equity drivers and loyalty intention of one period are significant predictors of the same constructs in the next period.
Addressing the second research question is less straightforward. We find a significant impact of loyalty intention only for purchase frequency, but not for future sales and average receipt. This suggests that in a retailing context, the amount spent depends to a larger extent on actual needs and not on loyalty intention. Loyalty intention seems to be a more appropriate lead indicator for the frequency of store visits. For most categories, repurchase intention will not necessarily be related to higher sales. On the contrary, higher future sales are more likely to depend on the retailer’s ability to cross- and up-sell to its customers. In all, we need to acknowledge that the strongest predictor of future behavior is, in fact, past behavior.
These results question some of the strongly held beliefs of relationship marketing and its impact on actual behavior. Effects might not be as simple as they appear at first, i.e., temporal interplay between constructs. Moreover, it seems that inertia is more important than some marketing research tends to acknowledge. We would therefore suggest a more detailed investigation of customers’ initial choice behavior. If, in fact, inertia is the driving force behind purchase behavior, companies need to augment their emphasis on increasing initial customer contact and, accordingly, on initial product trial. This is somewhat counter-intuitive from a relationship marketing perspective, because that stream of research largely suggests the advantage of retaining customers rather than acquiring new ones. While we are not denying the importance of customer retention, it seems that companies are already fairly successful in doing so – the strong inertia effect confirms that. Hence, customer retention might not be the best strategy to differentiate in the market. Perhaps companies can better differentiate by excelling in customer acquisition. This, however, would have a significant impact on how marketing budgets should be spent by companies trying to reach sustained success. It might be time for re-balancing customer acquisition and customer retention.
Global registration of heterogeneous ground and aerial mapping data is a challenging task. This is especially difficult in disaster response scenarios when we have no prior information on the environment and cannot assume the regular order of man-made environments or meaningful semantic cues. In this work we extensively evaluate different approaches to globally register UGV generated 3D point-cloud data from LiDAR sensors with UAV generated point-cloud maps from vision sensors. The approaches are realizations of different selections for: a) local features: key-points or segments; b) descriptors: FPFH, SHOT, or ESF; and c) transformation estimations: RANSAC or FGR. Additionally, we compare the results against standard approaches like applying ICP after a good prior transformation has been given. The evaluation criteria include the distance which a UGV needs to travel to successfully localize, the registration error, and the computational cost. In this context, we report our findings on effectively performing the task on two new Search and Rescue datasets. Our results have the potential to help the community take informed decisions when registering point-cloud maps from ground robots to those from aerial robots.
This paper presents a novel approach to build consistent 3D maps for multi robot cooperation in USAR environments. The sensor streams from unmanned aerial vehicles (UAVs) and ground robots (UGV) are fused in one consistent map. The UAV camera data are used to generate 3D point clouds that are fused with the 3D point clouds generated by a rolling 2D laser scanner at the UGV. The registration method is based on the matching of corresponding planar segments that are extracted from the point clouds. Based on the registration, an approach for a globally optimized localization is presented. Apart from the structural information of the point clouds, it is important to mention that no further information is required for the localization. Two examples show the performance of the overall registration.
Upgrade of Bioreactor System Providing Physiological Stimuli
to Engineered Musculoskeletal Tissues
(2017)
A novel central control interface (CCI) is developed to improve the modular bioreactor system with regard to extendability and modifiability in Tissue Engineering (TE) applications. This paper presents the results developed in the project with open-source hardware and the graphical programming system LabVIEW. A new platform independent User Interface was further developed to contribute to the new flexibility of the device.
Performance enhancing study for large scale PEM electrolyzer cells based on hydraulic compression
(2017)
Web advertisements are the primary financial source for many online services, but also for cybercriminals. Successful ad campaigns rely on good online profiles of their potential customers. The financial potentials of displaying ads have led to the rise of malware that injects or replaces ads on websites, in particular, so-called adware. This development leads to always further optimized and customized advertising. For these customization's, various tracking methods are used. However, only sparse work has gone into privacy issues emerging from adware. In this paper, we investigate the tracking capabilities and related privacy implications of adware and potentially unwanted programs (PUPs). Therefore, we developed a framework that allows us to analyze any network communication of the Firefox browser on the application level to circumvent encryption like TLS. We use this to dynamically analyze the communication streams of over 16,000 adware or potentially unwanted programs samples that tamper with the users' browser session. Our results indicate that roughly 37% of the requests issued by the analyzed samples contain private information and are accordingly able to track users. Additionally, we analyze which tracking techniques and services are used.
Steps Towards an Open All-in-one Rich-Client Environment for Particle-Based Mesoscopic Simulation
(2018)
Neuroscientists want to inspect the data their simulations are producing while these are still running. This will on the one hand save them time waiting for results and therefore insight. On the other, it will allow for more efficient use of CPU time if the simulations are being run on supercomputers. If they had access to the data being generated, neuroscientists could monitor it and take counter-actions, e.g., parameter adjustments, should the simulation deviate too much from in-vivo observations or get stuck.
As a first step toward this goal, we devise an in situ pipeline tailored to the neuroscientific use case. It is capable of recording and transferring simulation data to an analysis/visualization process, while the simulation is still running. The developed libraries are made publicly available as open source projects. We provide a proof-of-concept integration, coupling the neuronal simulator NEST to basic 2D and 3D visualization.
Opportunities and Challenges in Mixed-Reality for an Inclusive Human-Robot Collaboration Environment
(2018)
This paper presents an approach to enhance robot control using Mixed-Reality. It highlights the opportunities and challenges in the interaction design to achieve a Human-Robot Collaborative environment. In fact, Human-Robot Collaboration is the perfect space for social inclusion. It enables people, who suffer severe physical impairments, to interact with the environment by providing them movement control of an external robotic arm. Now, when discussing about robot control it is important to reduce the visual-split that different input and output modalities carry. Therefore, Mixed-Reality is of particular interest when trying to ease communication between humans and robotic systems.
A Robust Interface for Head Motion based Control of a Robot Arm using MARG and Visual Sensors
(2018)
Head-controlled human machine interfaces have gained popularity over the past years, especially in the restoration of the autonomy of severely disabled people, like tetraplegics. These interfaces need to be reliable and robust regarding the environmental conditions to guarantee safety of the user and enable a direct interaction between a human and a machine. This paper presents a hybrid MARG and visual sensor system for head orientation estimation which is in this case used to teleoperate a robotic arm. The system contains a Magnetic Angular Rate Gravity (MARG)-sensor and a Tobii eye tracker 4C. A MARG sensor consists of tri-axis accelerometer, gyroscope as well as a magnetometer which enable a complete measurement of orientation relative to the direction of gravity and magnetic field of the earth. The tri-axis magnetometer is sensitive to external magnetic fields which result in incorrect orientation estimation from the sensor fusion process. In this work the Tobii eye tracker 4C is used to increase head orientation estimation because it also features head tracking even though it is commonly used for eye tracking. This type of visual sensor does not suffer magnetic drift. However, it computes orientation data only, if a user is detectable. Within this work a state machine is presented which enables data fusion of the MARG and visual sensor to improve orientation estimation. The fusion of the orientation data of MARG and visual sensors enables a robust interface, which is immune against external magnetic fields. Therefore, it increases the safety of the human machine interaction.
A compact and efficient PEM electrolyser stack design based on hydraulic single cell compression
(2019)
This technical report is about the architecture and integration of commercial UAVs in Search and Rescue missions. We describe a framework that consists of heterogeneous UAVs, a UAV task planner, a bridge to the UAVs, an intelligent image hub, and a 3D point cloud generator. A first version of the framework was developed and tested in several training missions in the EU project TRADR.
Purpose
Although courage has generally been understood as a powerful virtue, research to establish it as a psychological construct is in its infancy. We examined courage in organizations against the backdrop of positive psychology with a design in the Grounded Theory tradition that connects Positive Organizational Behavior and Positive Organizational Scholarship.
Method
The sample consists of organizations that define courage in their mission statement and organizations without such a definition. It includes employees and executives, exploring workplace courage on the macro as well as the micro level. Eleven organizations and 23 participants contributed to the interview study.
Results
Applying Glaser's theoretical coding, specifically the C-family, we propose that courage arises from a decisional conflict in three major domains: the self, social interaction, and performance. It is located on a continuum between apathy and foolhardiness and can take on reactive, proactive, or autonomous forms. Whether and to what extent courage manifests, is a dynamic process contingent upon organizational structure, culture, and communication climate as well as individual cognitiveaffective personality systems.
Limitations
The model depicts the complexity of the phenomenon, rather than details of its individual components. It goes beyond pre-defined categories and prevailing definitions.
Implications
Modern organizations are characterized by volatility, uncertainty, complexity, and ambiguity (VUCA).
Courage is crucial in such an environment and can be systematically fostered across the whole human
resource management cycle.
Value
The study advances theory building on courage in the workplace and highlights its potential to be
measured, developed and managed for more effective work performance.
Purpose
Although the systemic approach to the leadership concept seems to fit well into our modern complex and dynamic work environment, only little research has been conducted to define and assess systemic leadership. In this study we therefore developed and assessed criterion validity of the
multidimensional systemic leadership inventory (SLI, Sülzenbrück & Externbrink, 2017).
Methodology
We conducted two cross-sectional survey among managers and employees of various organizations (N = 143 and N = 150).
Results
We found a robust five-factor structure of the SLI, comprising systemic thinking, self-knowledge, solution-oriented communication, creating meaning and delegation. Regarding criterion validity, a significant positive correlation of systemic leadership was found with affective commitment, while a significant negative correlation with emotional strain in occupational contexts occurred. These overall positive outcomes for employees were not undermined by negative personality traits of the employee (Machiavellianism), while strong growth need strength further enhanced positive effects on affective commitment.
Limitations
Since all variables were measured as self-reports, common method variance could limit our findings.
Practical Implications
Systemic leadership is a very promising new approach for leaders to ensure committed and less strained employees.
Value
Systemic leadership, especially in terms of a leaders’ understanding of organizational and private systems influencing work behaviour of all members of an organization, is a promising novel leadership model suitable to address challenges of complex and dynamic work environments.
Purpose
So far, there are several approaches of measuring the Dark Triad traits, but still all of them are
personality questionnaires with at least questionable usability for applied contexts such as Human
Resource Management.
The purpose of the study is the development of a structured interview with the aim of measuring the Dark Triad in a rather qualitative way that increases social validity for the respondents.
Design/Methodology/Approach/Intervention
In the present study, 15 executives from the telecommunications industry were interviewed on their personal evaluation of management success and derailment. Afterwards, their personality traits of the Dark Triad were measured with the help of the Short Dark Triad Scale. Subsequently, the data from qualitative and quantitative research were examined for correlations using the mixed-method approach.
Results
The results of the mixed-method approach showed a statistically significant correlation between the Short Dark Triad Scale and the ratings for narcissism, Machiavellianism and subclinical psychopathy in the Dark Triad interview.
Limitations
Replicating the results in a bigger sample and a deeper investigation of the criterion-related validity as well as an integration of multiple raters can provide more confidence in our results.
Research/Practical Implications
Structured interviews allow the measurement of personality traits in a more convenient way especially in personnel selection and development processes. Identifying subclinical traits in leadership candidates can, e.g. prevent management derailment.
Originality/Value
The present study advances the measurement methods of the Dark Triad.
Recommendations for the Development of a Robotic Drinking and Eating Aid - An Ethnographic Study
(2021)
Being able to live independently and self-determined in one’s own home is a crucial factor or human dignity and preservation of self-worth. For people with severe physical impairments who cannot use their limbs for every day tasks, living in their own home is only possible with assistance from others. The inability to move arms and hands makes it hard to take care of oneself, e.g. drinking and eating independently. In this paper, we investigate how 15 participants with disabilities consume food and drinks. We report on interviews, participatory observations, and analyzed the aids they currently use. Based on our findings, we derive a set of recommendations that supports researchers and practitioners in designing future robotic drinking and eating aids for people with disabilities.
This Article introduces two research projects towards assistive robotic arms for people with severe body impairments. Both projects aim to develop new control and interaction designs to promote accessibility and a better performance for people with functional losses in all four extremities, e.g. due to quadriplegic or multiple sclerosis. The project MobILe concentrates on using a robotic arm as drinking aid and controlling it with smart glasses, eye-tracking and augmented reality. A user oriented development process with participatory methods were pursued which brought new knowledge about the life and care situation of the future target group and the requirements a robotic drinking aid needs to meet. As a consequence the new project DoF-Adaptiv follows an even more participatory approach, including the future target group, their family and professional caregivers from the beginning into decision making and development processes within the project. DoF-Adaptiv aims to simplify the control modalities of assistive robotic arms to enhance the usability of the robotic arm for activities of daily living. lo decide on exemplary activities, like eating or open a door, the future target group, their family and professional caregivers are included in the decision making process. Furthermore all relevant stakeholders will be included in the investigation of ethical, legal and social implications as well as the identification of potential risks. This article will show the importance of the participatory design for the development and research process in MobILe and DoF-Adaptiv.
Air Handling units (AHU) are designed to guarantee a high indoor air quality for any time and outdoor condition all over the year. To do so, the AHU removes particle matter like dust or pollen and adapts the thermophysical properties of air to the desired, seasonal indoor comfort conditions. AHU have a robust design and thus operate for more than fifteen years, sometimes even for decades. An AHU designed today must consider and anticipate the change of user needs as well as outdoor air conditions for the next twenty years. To anticipate the outdoor air condition of coming decades, scientific models exist, which allow the design of peak performance and capacities of the air treatment components. It is most likely, that the ongoing climate change will lead to higher temperatures as well as higher humidity, while the comfort zone of human beings will remain at today’s values. Next to the impact of global warming with average rise of mean air temperature local effects will influence the operation of AHU. On effect investigated here is the steep temperature increase in city centres called urban heat islands. Heating and cooling capacities as well as water consumption for humidification are investigated for a reference AHU for fifteen regional locations in Germany. These regions represent all climate zones within the country. Additionally, the urban heat island effect was investigated for Berlin Alexanderplatz compared a rural area close by. The AHU was chosen to operate in an intensive care unit of a hospital. The set-up leads to 24/7 operation with 8760 hours per year. The article presents the modelling of current and future weather data as well as the unit set up. The calculated hourly performance and capacity parameters for current (reference year 2012) and future weather data (reference year 2045) yield energy consumption and peak loads of the unit for heating, cooling and humidification. The results are displayed by relative comparisons of each performance value.
Various aqueous citrate electrolyte compositions for the Ni-Mo electrodeposition are explored in order to deposit Ni-Mo alloys with Mo-content ranging from 40 wt% to 65 wt% to find an alloy composition with superior catalytic activity towards the hydrogen evolution reaction (HER). The depositions were performed on copper substrates mounted onto a rotating disc electrode (RDE) and were investigated via scanning electron microscopy (SEM), X-ray fluorescence (XRF) and X-ray diffraction (XRD) methods as well as linear sweep voltammetry (LSV) and impedance spectroscopy. Kinetic parameters were calculated via Tafel analysis. Partial deposition current densities and current efficiencies were determined by correlating XRF measurements with gravimetric results. The variation of the electrolyte composition and deposition parameters enabled the deposition of alloys with Mo-content over the range of 40-65 wt%. An increase in Mo-content in deposited alloys was recorded with an increase in rotation speed of the RDE. Current efficiency of the deposition was in the magnitude of <1%, which is characteristic for the deposition of alloys with high Mo-content. The calculated kinetic parameters were used to determine the Mo-content with the highest catalytic activity for use in the HER.
Measurement studies are essential for research and industry alike to understand the Web’s inner workings better and help quantify specific phenomena. Performing such studies is demanding due to the dynamic nature and size of the Web. An experiment’s careful design and setup are complex, and many factors might affect the results. However, while several works have independently observed differences in
the outcome of an experiment (e.g., the number of observed trackers) based on the measurement setup, it is unclear what causes such deviations. This work investigates the reasons for these differences by visiting 1.7M webpages with five different measurement setups. Based on this, we build ‘dependency trees’ for each page and cross-compare the nodes in the trees. The results show that the measured trees differ considerably, that the cause of differences can be attributed to specific nodes, and that even identical measurement setups can produce different results.
In this work a mathematical approach to calculate solar panel temperature based on measured irradiance, temperature and wind speed is applied. With the calculated module temperature, the electrical solar module characteristics is determined. A program developed in MatLab App Designer allows to import measurement data from a weather station and calculates the module temperature based on the mathematical NOCT and stationary approach with a time step between the measurements of 5 minutes. Three commercially available solar panels with different cell and interconnection technologies are used for the verification of the established models. The results show a strong correlation between the measured and by the stationary model predicted module temperature with a coefficient of determination R2 close to 1 and a root mean square deviation (RMSE) of ≤ 2.5 K for a time period of three months. Based on the predicted temperature, measured irradiance in module plane and specific module information the program models the electrical data as time series in 5-minute steps. Predicted to measured power for a time period of three months shows a linear correlation with an R2 of 0.99 and a mean absolute error (MAE) of 3.5, 2.7 and 4.8 for module ID 1, 2 and 3. The calculated energy (exemplarily for module ID 2) based on the measured, calculated by the NOCT and stationary model for this time period is 118.4 kWh, resp. 116.7 kWh and 117.8 kWh. This is equivalent to an uncertainty of 1.4% for the NOCT and 0.5% for the stationary model.
Advanced Determination of Temperature Coefficients of Photovoltaic Modules by Field Measurements
(2023)
In this work data from outdoor measurements, acquired over the course of up to three years on commercially available solar panels, is used to determine the temperature coefficients and compare these to the information as stated by the producer in the data sheets. A program developed in MatLab App Designer allows to import the electrical and ambient measurement data. Filter algorithms for solar irradiance narrow the irradiance level down to ~1000 W/m2 before linear regression methods are applied to obtain the temperature coefficients. A repeatability investigation proves the accuracy of the determined temperature coefficients which are in good agreement to the supplier specification if the specified values for power are not larger than -0.3%/K. Further optimization is achieved by applying wind filter techniques and days with clear sky condition. With the big (measurement) data on hand it was possible to determine the change of the temperature coefficients for varying irradiance. As stated in literature we see an increase of the temperature coefficient of voltage and a decline for the temperature coefficient of power with increasing irradiance.
Cookie notices (or cookie banners) are a popular mechanism for websites to provide (European) Internet users a tool to choose which cookies the site may set. Banner implementations range from merely providing information that a site uses cookies over offering the choice to accepting or denying all cookies to allowing fine-grained control of cookie usage. Users frequently get annoyed by the banner’s pervasiveness as they interrupt “natural” browsing on the Web. As a remedy, different browser extensions have been developed to automate the interaction with cookie banners.
In this work, we perform a large-scale measurement study comparing the effectiveness of extensions for “cookie banner interaction.” We configured the extensions to express different privacy choices (e.g., accepting all cookies, accepting functional cookies, or rejecting all cookies) to understand their capabilities to execute a user’s preferences. The results show statistically significant differences in which cookies are set, how many of them are set, and which types are set—even for extensions that aim to implement the same cookie choice. Extensions for “cookie banner interaction” can effectively reduce the number of set cookies compared to no interaction with the banners. However, all extensions increase the tracking requests significantly except when rejecting all cookies.
An automated pipeline for comprehensive calculation of intermolecular interaction energies based on molecular force-fields using the Tinker molecular modelling package is presented. Starting with non-optimized chemically intuitive monomer structures, the pipeline allows the approximation of global minimum energy monomers and dimers, configuration sampling for various monomer-monomer distances, estimation of coordination numbers by molecular dynamics simulations, and the evaluation of differential pair interaction energies. The latter are used to derive Flory-Huggins parameters and isotropic particle-particle repulsions for Dissipative Particle Dynamics (DPD). The computational results for force fields MM3, MMFF94, OPLSAA and AMOEBA09 are analyzed with Density Functional Theory (DFT) calculations and DPD simulations for a mixture of the non-ionic polyoxyethylene alkyl ether surfactant C10E4 with water to demonstrate the usefulness of the approach.
Inspired by the super-human performance of deep learning models in playing the game of Go after being presented with virtually unlimited training data, we looked into areas in chemistry where similar situations could be achieved. Encountering large amounts of training data in chemistry is still rare, so we turned to two areas where realistic training data can be fabricated in large quantities, namely a) the recognition of machine-readable structures from images of chemical diagrams and b) the conversion of IUPAC(-like) names into structures and vice versa. In this talk, we outline the challenges, technical implementation and results of this study.
Optical Chemical Structure Recognition (OCSR): Vast amounts of chemical information remain hidden in the primary literature and have yet to be curated into open-access databases. To automate the process of extracting chemical structures from scientific papers, we developed the DECIMER.ai project. This open-source platform provides an integrated solution for identifying, segmenting, and recognising chemical structure depictions in scientific literature. DECIMER.ai comprises three main components: DECIMER-Segmentation, which utilises a Mask-RCNN model to detect and segment images of chemical structure depictions; DECIMER-Image Classifier EfficientNet-based classification model identifies which images contain chemical structures and DECIMER-Image Transformer which acts as an OCSR engine which combines an encoder-decoder model to convert the segmented chemical structure images into machine-readable formats, like the SMILES string.
DECIMER.ai is data-driven, relying solely on the training data to make accurate predictions without hand-coded rules or assumptions. The latest model was trained with 127 million structures and 483 million depictions (4 different per structure) on Google TPU-V4 VMs
Name to Structure Conversion: The conversion of structures to IUPAC(-like) or systematic names has been solved algorithmically or rule-based in satisfying ways. This fact, on the other side, provided us with an opportunity to generate a name-structure training pair at a very large scale to train a proof-of-concept transformer network and evaluate its performance.
In this work, the largest model was trained using almost one billion SMILES strings. The Lexichem software utility from OpenEye was employed to generate the IUPAC names used in the training process. STOUT V2 was trained on Google TPU-V4 VMs. The model's accuracy was validated through one-to-one string matching, BLEU scores, and Tanimoto similarity calculations. To further verify the model's reliability, every IUPAC name generated by STOUT V2 was analysed for accuracy and retranslated using OPSIN, a widely used open-source software for converting IUPAC names to SMILES. This additional validation step confirmed the high fidelity of STOUT V2's translations.
The DECIMER.ai Project
(2024)
Over the past few decades, the number of publications describing chemical structures and their metadata has increased significantly. Chemists have published the majority of this information as bitmap images along with other important information as human-readable text in printed literature and have never been retained and preserved in publicly available databases as machine-readable formats. Manually extracting such data from printed literature is error-prone, time-consuming, and tedious. The recognition and translation of images of chemical structures from printed literature into machine-readable format is known as Optical Chemical Structure Recognition (OCSR). In recent years, deep-learning-based OCSR tools have become increasingly popular. While many of these tools claim to be highly accurate, they are either unavailable to the public or proprietary. Meanwhile, the available open-source tools are significantly time-consuming to set up. Furthermore, none of these offers an end-to-end workflow capable of detecting chemical structures, segmenting them, classifying them, and translating them into machine-readable formats.
To address this issue, we present the DECIMER.ai project, an open-source platform that provides an integrated solution for identifying, segmenting, and recognizing chemical structure depictions within the scientific literature. DECIMER.ai comprises three main components: DECIMER-Segmentation, which utilizes a Mask-RCNN model to detect and segment images of chemical structure depictions; DECIMER-Image Classifier EfficientNet-based classification model identifies which images contain chemical structures and DECIMER-Image Transformer which acts as an OCSR engine which combines an encoder-decoder model to convert the segmented chemical structure images into machine-readable formats, like the SMILES string.
A key strength of DECIMER.ai is that its algorithms are data-driven, relying solely on the training data to make accurate predictions without any hand-coded rules or assumptions. By offering this comprehensive, open-source, and transparent pipeline, DECIMER.ai enables automated extraction and representation of chemical data from unstructured publications, facilitating applications in chemoinformatics and drug discovery.