Improving Molecular Representation Learning with Metric
Learning-enhanced Optimal Transport
- URL: http://arxiv.org/abs/2202.06208v3
- Date: Mon, 30 Oct 2023 02:20:05 GMT
- Title: Improving Molecular Representation Learning with Metric
Learning-enhanced Optimal Transport
- Authors: Fang Wu, Nicolas Courty, Shuting Jin, Stan Z. Li
- Abstract summary: We develop a novel optimal transport-based algorithm termed MROT to enhance their generalization capability for molecular regression problems.
MROT significantly outperforms state-of-the-art models, showing promising potential in accelerating the discovery of new substances.
- Score: 49.237577649802034
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Training data are usually limited or heterogeneous in many chemical and
biological applications. Existing machine learning models for chemistry and
materials science fail to consider generalizing beyond training domains. In
this article, we develop a novel optimal transport-based algorithm termed MROT
to enhance their generalization capability for molecular regression problems.
MROT learns a continuous label of the data by measuring a new metric of domain
distances and a posterior variance regularization over the transport plan to
bridge the chemical domain gap. Among downstream tasks, we consider basic
chemical regression tasks in unsupervised and semi-supervised settings,
including chemical property prediction and materials adsorption selection.
Extensive experiments show that MROT significantly outperforms state-of-the-art
models, showing promising potential in accelerating the discovery of new
substances with desired properties.
Related papers
- Pre-trained Molecular Language Models with Random Functional Group Masking [54.900360309677794]
We propose a SMILES-based underlineem Molecular underlineem Language underlineem Model, which randomly masking SMILES subsequences corresponding to specific molecular atoms.
This technique aims to compel the model to better infer molecular structures and properties, thus enhancing its predictive capabilities.
arXiv Detail & Related papers (2024-11-03T01:56:15Z) - Analysis of Atom-level pretraining with Quantum Mechanics (QM) data for Graph Neural Networks Molecular property models [0.0]
We show how atom-level pretraining with quantum mechanics (QM) data can mitigate violations of assumptions regarding the distributional similarity between training and test data.
This is the first time that hidden state molecular representations are analyzed to compare the effects of molecule-level and atom-level pretraining on QM data.
arXiv Detail & Related papers (2024-05-23T17:51:05Z) - Active Causal Learning for Decoding Chemical Complexities with Targeted Interventions [0.0]
We introduce an active learning approach that discerns underlying cause-effect relationships through strategic sampling.
This method identifies the smallest subset of the dataset capable of encoding the most information representative of a much larger chemical space.
The identified causal relations are then leveraged to conduct systematic interventions, optimizing the design task within a chemical space that the models have not encountered previously.
arXiv Detail & Related papers (2024-04-05T17:15:48Z) - Contextual Molecule Representation Learning from Chemical Reaction
Knowledge [24.501564702095937]
We introduce REMO, a self-supervised learning framework that takes advantage of well-defined atom-combination rules in common chemistry.
REMO pre-trains graph/Transformer encoders on 1.7 million known chemical reactions in the literature.
arXiv Detail & Related papers (2024-02-21T12:58:40Z) - ChemVise: Maximizing Out-of-Distribution Chemical Detection with the
Novel Application of Zero-Shot Learning [60.02503434201552]
This research proposes learning approximations of complex exposures from training sets of simple ones.
We demonstrate this approach to synthetic sensor responses surprisingly improves the detection of out-of-distribution obscured chemical analytes.
arXiv Detail & Related papers (2023-02-09T20:19:57Z) - Transfer learning for chemically accurate interatomic neural network
potentials [0.0]
We show that pre-training the network parameters on data obtained from density functional calculations improves the sample efficiency of models trained on more accurate ab-initio data.
We provide GM-NN potentials pre-trained and fine-tuned on the ANI-1x and ANI-1ccx data sets, which can easily be fine-tuned on and applied to organic molecules.
arXiv Detail & Related papers (2022-12-07T19:21:01Z) - Retrieval-based Controllable Molecule Generation [63.44583084888342]
We propose a new retrieval-based framework for controllable molecule generation.
We use a small set of molecules to steer the pre-trained generative model towards synthesizing molecules that satisfy the given design criteria.
Our approach is agnostic to the choice of generative models and requires no task-specific fine-tuning.
arXiv Detail & Related papers (2022-08-23T17:01:16Z) - Semi-Supervised Junction Tree Variational Autoencoder for Molecular
Property Prediction [0.0]
This research modifies state-of-the-art molecule generation method - Junction Tree Variational Autoencoder (JT-VAE) to facilitate semi-supervised learning on chemical property prediction.
We leverage JT-VAE architecture to learn an interpretable representation optimal for tasks ranging from molecule property prediction to conditional molecule generation.
arXiv Detail & Related papers (2022-08-10T03:06:58Z) - Accurate Machine Learned Quantum-Mechanical Force Fields for
Biomolecular Simulations [51.68332623405432]
Molecular dynamics (MD) simulations allow atomistic insights into chemical and biological processes.
Recently, machine learned force fields (MLFFs) emerged as an alternative means to execute MD simulations.
This work proposes a general approach to constructing accurate MLFFs for large-scale molecular simulations.
arXiv Detail & Related papers (2022-05-17T13:08:28Z) - BIGDML: Towards Exact Machine Learning Force Fields for Materials [55.944221055171276]
Machine-learning force fields (MLFF) should be accurate, computationally and data efficient, and applicable to molecules, materials, and interfaces thereof.
Here, we introduce the Bravais-Inspired Gradient-Domain Machine Learning approach and demonstrate its ability to construct reliable force fields using a training set with just 10-200 atoms.
arXiv Detail & Related papers (2021-06-08T10:14:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.