Substrate Scope Contrastive Learning: Repurposing Human Bias to Learn
Atomic Representations
- URL: http://arxiv.org/abs/2402.16882v1
- Date: Mon, 19 Feb 2024 02:21:20 GMT
- Title: Substrate Scope Contrastive Learning: Repurposing Human Bias to Learn
Atomic Representations
- Authors: Wenhao Gao, Priyanka Raghavan, Ron Shprints, Connor W. Coley
- Abstract summary: We introduce a novel pre-training strategy, substrate scope contrastive learning, which learns atomic representations tailored to chemical reactivity.
We focus on 20,798 aryl halides in the CAS Content Collection spanning thousands of publications to learn a representation of aryl halide reactivity.
This work not only presents a chemistry-tailored neural network pre-training strategy to learn reactivity-aligned atomic representations, but also marks a first-of-its-kind approach to benefit from the human bias in substrate scope design.
- Score: 14.528429119352328
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Learning molecular representation is a critical step in molecular machine
learning that significantly influences modeling success, particularly in
data-scarce situations. The concept of broadly pre-training neural networks has
advanced fields such as computer vision, natural language processing, and
protein engineering. However, similar approaches for small organic molecules
have not achieved comparable success. In this work, we introduce a novel
pre-training strategy, substrate scope contrastive learning, which learns
atomic representations tailored to chemical reactivity. This method considers
the grouping of substrates and their yields in published substrate scope tables
as a measure of their similarity or dissimilarity in terms of chemical
reactivity. We focus on 20,798 aryl halides in the CAS Content Collection
spanning thousands of publications to learn a representation of aryl halide
reactivity. We validate our pre-training approach through both intuitive
visualizations and comparisons to traditional reactivity descriptors and
physical organic chemistry principles. The versatility of these embeddings is
further evidenced in their application to yield prediction, regioselectivity
prediction, and the diverse selection of new substrates. This work not only
presents a chemistry-tailored neural network pre-training strategy to learn
reactivity-aligned atomic representations, but also marks a first-of-its-kind
approach to benefit from the human bias in substrate scope design.
Related papers
- NeuralCRNs: A Natural Implementation of Learning in Chemical Reaction Networks [0.0]
We present a novel supervised learning framework constructed as a collection of deterministic chemical reaction networks (CRNs)
Unlike prior works, the NeuralCRNs framework is founded on dynamical system-based learning implementations and, thus, results in chemically compatible computations.
arXiv Detail & Related papers (2024-08-18T01:43:26Z) - Active Causal Learning for Decoding Chemical Complexities with Targeted Interventions [0.0]
We introduce an active learning approach that discerns underlying cause-effect relationships through strategic sampling.
This method identifies the smallest subset of the dataset capable of encoding the most information representative of a much larger chemical space.
The identified causal relations are then leveraged to conduct systematic interventions, optimizing the design task within a chemical space that the models have not encountered previously.
arXiv Detail & Related papers (2024-04-05T17:15:48Z) - UAlign: Pushing the Limit of Template-free Retrosynthesis Prediction with Unsupervised SMILES Alignment [51.49238426241974]
This paper introduces UAlign, a template-free graph-to-sequence pipeline for retrosynthesis prediction.
By combining graph neural networks and Transformers, our method can more effectively leverage the inherent graph structure of molecules.
arXiv Detail & Related papers (2024-03-25T03:23:03Z) - Contextual Molecule Representation Learning from Chemical Reaction
Knowledge [24.501564702095937]
We introduce REMO, a self-supervised learning framework that takes advantage of well-defined atom-combination rules in common chemistry.
REMO pre-trains graph/Transformer encoders on 1.7 million known chemical reactions in the literature.
arXiv Detail & Related papers (2024-02-21T12:58:40Z) - Atomic and Subgraph-aware Bilateral Aggregation for Molecular
Representation Learning [57.670845619155195]
We introduce a new model for molecular representation learning called the Atomic and Subgraph-aware Bilateral Aggregation (ASBA)
ASBA addresses the limitations of previous atom-wise and subgraph-wise models by incorporating both types of information.
Our method offers a more comprehensive way to learn representations for molecular property prediction and has broad potential in drug and material discovery applications.
arXiv Detail & Related papers (2023-05-22T00:56:00Z) - Discovery of structure-property relations for molecules via
hypothesis-driven active learning over the chemical space [0.0]
We introduce a novel approach for the active learning over the chemical spaces based on hypothesis learning.
We construct the hypotheses on the possible relationships between structures and functionalities of interest based on a small subset of data.
This approach combines the elements from the symbolic regression methods such as SISSO and active learning into a single framework.
arXiv Detail & Related papers (2023-01-06T14:22:43Z) - Tyger: Task-Type-Generic Active Learning for Molecular Property
Prediction [121.97742787439546]
How to accurately predict the properties of molecules is an essential problem in AI-driven drug discovery.
To reduce annotation cost, deep Active Learning methods are developed to select only the most representative and informative data for annotating.
We propose a Task-type-generic active learning framework (termed Tyger) that is able to handle different types of learning tasks in a unified manner.
arXiv Detail & Related papers (2022-05-23T12:56:12Z) - Energy-based View of Retrosynthesis [70.66156081030766]
We propose a framework that unifies sequence- and graph-based methods as energy-based models.
We present a novel dual variant within the framework that performs consistent training over Bayesian forward- and backward-prediction.
This model improves state-of-the-art performance by 9.6% for template-free approaches where the reaction type is unknown.
arXiv Detail & Related papers (2020-07-14T18:51:06Z) - Graph Neural Networks for the Prediction of Substrate-Specific Organic
Reaction Conditions [79.45090959869124]
We present a systematic investigation using graph neural networks (GNNs) to model organic chemical reactions.
We evaluate seven different GNN architectures for classification tasks pertaining to the identification of experimental reagents and conditions.
arXiv Detail & Related papers (2020-07-08T17:21:00Z) - Deep Learning for Virtual Screening: Five Reasons to Use ROC Cost
Functions [80.12620331438052]
deep learning has become an important tool for rapid screening of billions of molecules in silico for potential hits containing desired chemical features.
Despite its importance, substantial challenges persist in training these models, such as severe class imbalance, high decision thresholds, and lack of ground truth labels in some datasets.
We argue in favor of directly optimizing the receiver operating characteristic (ROC) in such cases, due to its robustness to class imbalance.
arXiv Detail & Related papers (2020-06-25T08:46:37Z) - Multi-View Self-Attention for Interpretable Drug-Target Interaction
Prediction [4.307720252429733]
In machine learning approaches, the numerical representation of molecules is critical to the performance of the model.
We propose a self-attention-based multi-view representation learning approach for modeling drug-target interactions.
arXiv Detail & Related papers (2020-05-01T14:28:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.