Known Unknowns: Out-of-Distribution Property Prediction in Materials and Molecules
- URL: http://arxiv.org/abs/2502.05970v1
- Date: Sun, 09 Feb 2025 17:37:36 GMT
- Title: Known Unknowns: Out-of-Distribution Property Prediction in Materials and Molecules
- Authors: Nofit Segal, Aviv Netanyahu, Kevin P. Greenman, Pulkit Agrawal, Rafael Gomez-Bombarelli,
- Abstract summary: Discovery of high-performance materials and molecules requires identifying extremes with property values that fall outside the known distribution.
Our objective is to train predictor models that extrapolate zero-shot to higher ranges than in the training data.
We propose using a transductive approach to OOD property prediction, achieving improvements in prediction accuracy.
- Score: 19.071396780849344
- License:
- Abstract: Discovery of high-performance materials and molecules requires identifying extremes with property values that fall outside the known distribution. Therefore, the ability to extrapolate to out-of-distribution (OOD) property values is critical for both solid-state materials and molecular design. Our objective is to train predictor models that extrapolate zero-shot to higher ranges than in the training data, given the chemical compositions of solids or molecular graphs and their property values. We propose using a transductive approach to OOD property prediction, achieving improvements in prediction accuracy. In particular, the True Positive Rate (TPR) of OOD classification of materials and molecules improved by 3x and 2.5x, respectively, and precision improved by 2x and 1.5x compared to non-transductive baselines. Our method leverages analogical input-target relations in the training and test sets, enabling generalization beyond the training target support, and can be applied to any other material and molecular tasks.
Related papers
- Predicting ionic conductivity in solids from the machine-learned potential energy landscape [68.25662704255433]
Superionic materials are essential for advancing solid-state batteries, which offer improved energy density and safety.
Conventional computational methods for identifying such materials are resource-intensive and not easily scalable.
We propose an approach for the quick and reliable evaluation of ionic conductivity through the analysis of a universal interatomic potential.
arXiv Detail & Related papers (2024-11-11T09:01:36Z) - Molecule Design by Latent Prompt Transformer [76.2112075557233]
This work explores the challenging problem of molecule design by framing it as a conditional generative modeling task.
We propose a novel generative model comprising three components: (1) a latent vector with a learnable prior distribution; (2) a molecule generation model based on a causal Transformer, which uses the latent vector as a prompt; and (3) a property prediction model that predicts a molecule's target properties and/or constraint values using the latent prompt.
arXiv Detail & Related papers (2024-02-27T03:33:23Z) - Impact of Domain Knowledge and Multi-Modality on Intelligent Molecular Property Prediction: A Systematic Survey [22.73437302209673]
We review and quantitatively analyze recent deep learning methods based on various benchmarks.
We find that integrating molecular information significantly improves molecular property prediction (MPP) for both regression and classification tasks.
We also discover that enriching 2D graphs with 1D SMILES boosts multi-modal learning performance for regression tasks by up to 9.1%, and augmenting 2D graphs with 3D information increases performance for classification tasks by up to 13.2%.
arXiv Detail & Related papers (2024-02-11T17:29:58Z) - Classifier-free graph diffusion for molecular property targeting [14.488714063757278]
This work focuses on the task of property targeting, that is, generating molecules conditioned on target chemical properties.
We propose a classifier-free DiGress (FreeGress) which works by directly injecting the conditioning information into the training process.
We empirically show that our model yields up to 79% improvement in Mean Absolute Error with respect to DiGress on property targeting tasks.
arXiv Detail & Related papers (2023-12-28T23:34:38Z) - Towards out-of-distribution generalizable predictions of chemical
kinetics properties [61.15970601264632]
Out-Of-Distribution (OOD) kinetic property prediction is required to be generalizable.
In this paper, we categorize the OOD kinetic property prediction into three levels (structure, condition, and mechanism)
We create comprehensive datasets to benchmark the state-of-the-art ML approaches for reaction prediction in the OOD setting and the state-of-the-art graph OOD methods in kinetics property prediction problems.
arXiv Detail & Related papers (2023-10-04T20:36:41Z) - Atomic and Subgraph-aware Bilateral Aggregation for Molecular
Representation Learning [57.670845619155195]
We introduce a new model for molecular representation learning called the Atomic and Subgraph-aware Bilateral Aggregation (ASBA)
ASBA addresses the limitations of previous atom-wise and subgraph-wise models by incorporating both types of information.
Our method offers a more comprehensive way to learn representations for molecular property prediction and has broad potential in drug and material discovery applications.
arXiv Detail & Related papers (2023-05-22T00:56:00Z) - Exploring Chemical Space with Score-based Out-of-distribution Generation [57.15855198512551]
We propose a score-based diffusion scheme that incorporates out-of-distribution control in the generative differential equation (SDE)
Since some novel molecules may not meet the basic requirements of real-world drugs, MOOD performs conditional generation by utilizing the gradients from a property predictor.
We experimentally validate that MOOD is able to explore the chemical space beyond the training distribution, generating molecules that outscore ones found with existing methods, and even the top 0.01% of the original training pool.
arXiv Detail & Related papers (2022-06-06T06:17:11Z) - Property-aware Adaptive Relation Networks for Molecular Property
Prediction [34.13439007658925]
We propose a property-aware adaptive relation networks (PAR) for the few-shot molecular property prediction problem.
Our PAR is compatible with existing graph-based molecular encoders, and are further equipped with the ability to obtain property-aware molecular embedding and model molecular relation graph.
arXiv Detail & Related papers (2021-07-16T16:22:30Z) - BIGDML: Towards Exact Machine Learning Force Fields for Materials [55.944221055171276]
Machine-learning force fields (MLFF) should be accurate, computationally and data efficient, and applicable to molecules, materials, and interfaces thereof.
Here, we introduce the Bravais-Inspired Gradient-Domain Machine Learning approach and demonstrate its ability to construct reliable force fields using a training set with just 10-200 atoms.
arXiv Detail & Related papers (2021-06-08T10:14:57Z) - Stochastic Threshold Model Trees: A Tree-Based Ensemble Method for
Dealing with Extrapolation [0.0]
In the development of new materials, it is desirable to search for compounds with unprecedented physical properties.
We propose development Threshold Model Trees (STMT), which reflects the trend of the data, while maintaining the accuracy of conventional methods.
In the case of the real data, although there is no significant overall improvement in accuracy, there is one compound for which the prediction accuracy is notably improved.
arXiv Detail & Related papers (2020-09-19T05:48:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.