Holistic chemical evaluation reveals pitfalls in reaction prediction
models
- URL: http://arxiv.org/abs/2312.09004v1
- Date: Thu, 14 Dec 2023 14:54:28 GMT
- Title: Holistic chemical evaluation reveals pitfalls in reaction prediction
models
- Authors: Victor Sabanza Gil, Andres M. Bran, Malte Franke, Remi Schlama, Jeremy
S. Luterbacher, Philippe Schwaller
- Abstract summary: We propose a new assessment scheme that builds on current approaches, steering towards a more holistic evaluation.
ChoRISO is a curated dataset along with multiple tailored splits to recreate chemically relevant scenarios.
Our work paves the way towards robust prediction models that can ultimately accelerate chemical discovery.
- Score: 0.3065062372337749
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The prediction of chemical reactions has gained significant interest within
the machine learning community in recent years, owing to its complexity and
crucial applications in chemistry. However, model evaluation for this task has
been mostly limited to simple metrics like top-k accuracy, which obfuscates
fine details of a model's limitations. Inspired by progress in other fields, we
propose a new assessment scheme that builds on top of current approaches,
steering towards a more holistic evaluation. We introduce the following key
components for this goal: CHORISO, a curated dataset along with multiple
tailored splits to recreate chemically relevant scenarios, and a collection of
metrics that provide a holistic view of a model's advantages and limitations.
Application of this method to state-of-the-art models reveals important
differences on sensitive fronts, especially stereoselectivity and chemical
out-of-distribution generalization. Our work paves the way towards robust
prediction models that can ultimately accelerate chemical discovery.
Related papers
- Learning Chemical Reaction Representation with Reactant-Product Alignment [50.28123475356234]
This paper introduces modelname, a novel chemical reaction representation learning model tailored for a variety of organic-reaction-related tasks.
By integrating atomic correspondence between reactants and products, our model discerns the molecular transformations that occur during the reaction, thereby enhancing the comprehension of the reaction mechanism.
We have designed an adapter structure to incorporate reaction conditions into the chemical reaction representation, allowing the model to handle diverse reaction conditions and adapt to various datasets and downstream tasks, e.g., reaction performance prediction.
arXiv Detail & Related papers (2024-11-26T17:41:44Z) - Beyond Major Product Prediction: Reproducing Reaction Mechanisms with
Machine Learning Models Trained on a Large-Scale Mechanistic Dataset [10.968137261042715]
Mechanistic understanding of organic reactions can facilitate reaction development, impurity prediction, and in principle, reaction discovery.
While several machine learning models have sought to address the task of predicting reaction products, their extension to predicting reaction mechanisms has been impeded by the lack of a corresponding mechanistic dataset.
We construct such a dataset by imputing intermediates between experimentally reported reactants and products using expert reaction templates and train several machine learning models on the resulting dataset of 5,184,184 elementary steps.
arXiv Detail & Related papers (2024-03-07T15:26:23Z) - Predictive Churn with the Set of Good Models [64.05949860750235]
We study the effect of conflicting predictions over the set of near-optimal machine learning models.
We present theoretical results on the expected churn between models within the Rashomon set.
We show how our approach can be used to better anticipate, reduce, and avoid churn in consumer-facing applications.
arXiv Detail & Related papers (2024-02-12T16:15:25Z) - Machine Learning Small Molecule Properties in Drug Discovery [44.62264781248437]
We review a wide range of properties, including binding affinities, solubility, and ADMET (Absorption, Distribution, Metabolism, Excretion, and Toxicity)
We discuss existing popular descriptors and embeddings, such as chemical fingerprints and graph-based neural networks.
Finally, techniques to provide an understanding of model predictions, especially for critical decision-making in drug discovery are assessed.
arXiv Detail & Related papers (2023-08-02T22:18:41Z) - Bridging the Gap between Chemical Reaction Pretraining and Conditional
Molecule Generation with a Unified Model [3.3031562864527664]
We propose a unified framework that addresses both the reaction representation learning and molecule generation tasks.
Inspired by the organic chemistry mechanism, we develop a novel pretraining framework that enables us to incorporate inductive biases into the model.
Our framework achieves state-of-the-art results on challenging downstream tasks.
arXiv Detail & Related papers (2023-03-13T10:06:41Z) - Retrieval-based Controllable Molecule Generation [63.44583084888342]
We propose a new retrieval-based framework for controllable molecule generation.
We use a small set of molecules to steer the pre-trained generative model towards synthesizing molecules that satisfy the given design criteria.
Our approach is agnostic to the choice of generative models and requires no task-specific fine-tuning.
arXiv Detail & Related papers (2022-08-23T17:01:16Z) - MetaRF: Differentiable Random Forest for Reaction Yield Prediction with
a Few Trails [58.47364143304643]
In this paper, we focus on the reaction yield prediction problem.
We first put forth MetaRF, an attention-based differentiable random forest model specially designed for the few-shot yield prediction.
To improve the few-shot learning performance, we further introduce a dimension-reduction based sampling method.
arXiv Detail & Related papers (2022-08-22T06:40:13Z) - Toward Development of Machine Learned Techniques for Production of
Compact Kinetic Models [0.0]
Chemical kinetic models are an essential component in the development and optimisation of combustion devices.
We present a novel automated compute intensification methodology to produce overly-reduced and optimised chemical kinetic models.
arXiv Detail & Related papers (2022-02-16T12:31:24Z) - Improving Molecular Representation Learning with Metric
Learning-enhanced Optimal Transport [49.237577649802034]
We develop a novel optimal transport-based algorithm termed MROT to enhance their generalization capability for molecular regression problems.
MROT significantly outperforms state-of-the-art models, showing promising potential in accelerating the discovery of new substances.
arXiv Detail & Related papers (2022-02-13T04:56:18Z) - Unassisted Noise Reduction of Chemical Reaction Data Sets [59.127921057012564]
We propose a machine learning-based, unassisted approach to remove chemically wrong entries from data sets.
Our results show an improved prediction quality for models trained on the cleaned and balanced data sets.
arXiv Detail & Related papers (2021-02-02T09:34:34Z) - Data Transfer Approaches to Improve Seq-to-Seq Retrosynthesis [1.6449390849183363]
Retrosynthesis is a problem to infer reactant compounds to synthesize a given product compound through chemical reactions.
Recent studies on retrosynthesis focus on proposing more sophisticated prediction models.
The dataset to feed the models also plays an essential role in achieving the best generalizing models.
arXiv Detail & Related papers (2020-10-02T05:27:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.