Multimodal Transformer-based Model for Buchwald-Hartwig and
Suzuki-Miyaura Reaction Yield Prediction
- URL: http://arxiv.org/abs/2204.14062v1
- Date: Wed, 27 Apr 2022 07:28:27 GMT
- Title: Multimodal Transformer-based Model for Buchwald-Hartwig and
Suzuki-Miyaura Reaction Yield Prediction
- Authors: Shimaa Baraka and Ahmed M. El Kerdawy
- Abstract summary: The model consists of a pre-trained bidirectional transformer-based encoder (BERT) and a multi-layer perceptron (MLP) with a regression head to predict the yield.
We tested the model's performance on out-of-sample dataset splits of Buchwald-Hartwig and achieved comparable results with the state-of-the-art.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Predicting the yield percentage of a chemical reaction is useful in many
aspects such as reducing wet-lab experimentation by giving the priority to the
reactions with a high predicted yield. In this work we investigated the use of
multiple type inputs to predict chemical reaction yield. We used simplified
molecular-input line-entry system (SMILES) as well as calculated chemical
descriptors as model inputs. The model consists of a pre-trained bidirectional
transformer-based encoder (BERT) and a multi-layer perceptron (MLP) with a
regression head to predict the yield. We experimented on two high throughput
experimentation (HTE) datasets for Buchwald-Hartwig and Suzuki-Miyaura
reactions. The experiments show improvements in the prediction on both datasets
compared to systems using only SMILES or chemical descriptors as input. We also
tested the model's performance on out-of-sample dataset splits of
Buchwald-Hartwig and achieved comparable results with the state-of-the-art. In
addition to predicting the yield, we demonstrated the model's ability to
suggest the optimum (highest yield) reaction conditions. The model was able to
suggest conditions that achieves 94% of the optimum reported yields. This
proves the model to be useful in achieving the best results in the wet lab
without expensive experimentation.
Related papers
- Investigating the Robustness of Counterfactual Learning to Rank Models: A Reproducibility Study [61.64685376882383]
Counterfactual learning to rank (CLTR) has attracted extensive attention in the IR community for its ability to leverage massive logged user interaction data to train ranking models.
This paper investigates the robustness of existing CLTR models in complex and diverse situations.
We find that the DLA models and IPS-DCM show better robustness under various simulation settings than IPS-PBM and PRS with offline propensity estimation.
arXiv Detail & Related papers (2024-04-04T10:54:38Z) - Retrosynthesis prediction enhanced by in-silico reaction data
augmentation [66.5643280109899]
We present RetroWISE, a framework that employs a base model inferred from real paired data to perform in-silico reaction generation and augmentation.
On three benchmark datasets, RetroWISE achieves the best overall performance against state-of-the-art models.
arXiv Detail & Related papers (2024-01-31T07:40:37Z) - Comparing Machine Learning Techniques for Alfalfa Biomass Yield
Prediction [0.8808021343665321]
alfalfa crop is globally important as livestock feed, so highly efficient planting and harvesting could benefit many industries.
Recent work using machine learning to predict yields for alfalfa and other crops has shown promise.
Previous efforts used remote sensing, weather, planting, and soil data to train machine learning models for yield prediction.
arXiv Detail & Related papers (2022-10-20T13:00:33Z) - MetaRF: Differentiable Random Forest for Reaction Yield Prediction with
a Few Trails [58.47364143304643]
In this paper, we focus on the reaction yield prediction problem.
We first put forth MetaRF, an attention-based differentiable random forest model specially designed for the few-shot yield prediction.
To improve the few-shot learning performance, we further introduce a dimension-reduction based sampling method.
arXiv Detail & Related papers (2022-08-22T06:40:13Z) - Prediction of liquid fuel properties using machine learning models with
Gaussian processes and probabilistic conditional generative learning [56.67751936864119]
The present work aims to construct cheap-to-compute machine learning (ML) models to act as closure equations for predicting the physical properties of alternative fuels.
Those models can be trained using the database from MD simulations and/or experimental measurements in a data-fusion-fidelity approach.
The results show that ML models can predict accurately the fuel properties of a wide range of pressure and temperature conditions.
arXiv Detail & Related papers (2021-10-18T14:43:50Z) - MoEfication: Conditional Computation of Transformer Models for Efficient
Inference [66.56994436947441]
Transformer-based pre-trained language models can achieve superior performance on most NLP tasks due to large parameter capacity, but also lead to huge computation cost.
We explore to accelerate large-model inference by conditional computation based on the sparse activation phenomenon.
We propose to transform a large model into its mixture-of-experts (MoE) version with equal model size, namely MoEfication.
arXiv Detail & Related papers (2021-10-05T02:14:38Z) - Unassisted Noise Reduction of Chemical Reaction Data Sets [59.127921057012564]
We propose a machine learning-based, unassisted approach to remove chemically wrong entries from data sets.
Our results show an improved prediction quality for models trained on the cleaned and balanced data sets.
arXiv Detail & Related papers (2021-02-02T09:34:34Z) - Data Transfer Approaches to Improve Seq-to-Seq Retrosynthesis [1.6449390849183363]
Retrosynthesis is a problem to infer reactant compounds to synthesize a given product compound through chemical reactions.
Recent studies on retrosynthesis focus on proposing more sophisticated prediction models.
The dataset to feed the models also plays an essential role in achieving the best generalizing models.
arXiv Detail & Related papers (2020-10-02T05:27:51Z) - Coupling Machine Learning and Crop Modeling Improves Crop Yield
Prediction in the US Corn Belt [2.580765958706854]
This study investigates whether coupling crop modeling and machine learning (ML) improves corn yield predictions in the US Corn Belt.
The main objectives are to explore whether a hybrid approach (crop modeling + ML) would result in better predictions, and determine the features from the crop modeling that are most effective to be integrated with ML for corn yield prediction.
arXiv Detail & Related papers (2020-07-28T16:22:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.