MUBen: Benchmarking the Uncertainty of Molecular Representation Models
- URL: http://arxiv.org/abs/2306.10060v4
- Date: Tue, 16 Apr 2024 22:40:40 GMT
- Title: MUBen: Benchmarking the Uncertainty of Molecular Representation Models
- Authors: Yinghao Li, Lingkai Kong, Yuanqi Du, Yue Yu, Yuchen Zhuang, Wenhao Mu, Chao Zhang,
- Abstract summary: Uncertainty quantification (UQ) methods can be used to improve the models' calibration of predictions.
We present MUBen, which evaluates different UQ methods for state-of-the-art backbone molecular representation models.
Our study offers insights for selecting UQ for backbone models, which can facilitate research on uncertainty-critical applications.
- Score: 32.41186397454142
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large molecular representation models pre-trained on massive unlabeled data have shown great success in predicting molecular properties. However, these models may tend to overfit the fine-tuning data, resulting in over-confident predictions on test data that fall outside of the training distribution. To address this issue, uncertainty quantification (UQ) methods can be used to improve the models' calibration of predictions. Although many UQ approaches exist, not all of them lead to improved performance. While some studies have included UQ to improve molecular pre-trained models, the process of selecting suitable backbone and UQ methods for reliable molecular uncertainty estimation remains underexplored. To address this gap, we present MUBen, which evaluates different UQ methods for state-of-the-art backbone molecular representation models to investigate their capabilities. By fine-tuning various backbones using different molecular descriptors as inputs with UQ methods from different categories, we assess the influence of architectural decisions and training strategies. Our study offers insights for selecting UQ for backbone models, which can facilitate research on uncertainty-critical applications in fields such as materials science and drug discovery.
Related papers
- Influence Functions for Scalable Data Attribution in Diffusion Models [52.92223039302037]
Diffusion models have led to significant advancements in generative modelling.
Yet their widespread adoption poses challenges regarding data attribution and interpretability.
In this paper, we aim to help address such challenges by developing an textitinfluence functions framework.
arXiv Detail & Related papers (2024-10-17T17:59:02Z) - Opinion-Unaware Blind Image Quality Assessment using Multi-Scale Deep Feature Statistics [54.08757792080732]
We propose integrating deep features from pre-trained visual models with a statistical analysis model to achieve opinion-unaware BIQA (OU-BIQA)
Our proposed model exhibits superior consistency with human visual perception compared to state-of-the-art BIQA models.
arXiv Detail & Related papers (2024-05-29T06:09:34Z) - Epistemic Uncertainty Quantification For Pre-trained Neural Network [27.444465823508715]
Epistemic uncertainty quantification (UQ) identifies where models lack knowledge.
Traditional UQ methods, often based on Bayesian neural networks, are not suitable for pre-trained non-Bayesian models.
arXiv Detail & Related papers (2024-04-15T20:21:05Z) - Unified Uncertainty Estimation for Cognitive Diagnosis Models [70.46998436898205]
We propose a unified uncertainty estimation approach for a wide range of cognitive diagnosis models.
We decompose the uncertainty of diagnostic parameters into data aspect and model aspect.
Our method is effective and can provide useful insights into the uncertainty of cognitive diagnosis.
arXiv Detail & Related papers (2024-03-09T13:48:20Z) - Learning Invariant Molecular Representation in Latent Discrete Space [52.13724532622099]
We propose a new framework for learning molecular representations that exhibit invariance and robustness against distribution shifts.
Our model achieves stronger generalization against state-of-the-art baselines in the presence of various distribution shifts.
arXiv Detail & Related papers (2023-10-22T04:06:44Z) - Machine Learning Small Molecule Properties in Drug Discovery [44.62264781248437]
We review a wide range of properties, including binding affinities, solubility, and ADMET (Absorption, Distribution, Metabolism, Excretion, and Toxicity)
We discuss existing popular descriptors and embeddings, such as chemical fingerprints and graph-based neural networks.
Finally, techniques to provide an understanding of model predictions, especially for critical decision-making in drug discovery are assessed.
arXiv Detail & Related papers (2023-08-02T22:18:41Z) - Uncertainty Quantification for Molecular Property Predictions with Graph Neural Architecture Search [2.711812013460678]
We introduce AutoGNNUQ, an automated uncertainty quantification (UQ) approach for molecular property prediction.
Our approach employs variance decomposition to separate data (aleatoric) and model (epistemic) uncertainties, providing valuable insights for reducing them.
AutoGNNUQ has broad applicability in domains such as drug discovery and materials science, where accurate uncertainty quantification is crucial for decision-making.
arXiv Detail & Related papers (2023-07-19T20:03:42Z) - Evaluating Point-Prediction Uncertainties in Neural Networks for Drug
Discovery [0.26385121748044166]
Neural Network (NN) models provide potential to speed up the drug discovery process and reduce its failure rates.
The success of NN models require uncertainty quantification (UQ) as drug discovery explores chemical space beyond the training data distribution.
In this paper, we examine UQ methods that estimate different sources of predictive uncertainty for NN models aiming at drug discovery.
arXiv Detail & Related papers (2022-10-31T03:45:11Z) - Learning continuous models for continuous physics [94.42705784823997]
We develop a test based on numerical analysis theory to validate machine learning models for science and engineering applications.
Our results illustrate how principled numerical analysis methods can be coupled with existing ML training/testing methodologies to validate models for science and engineering applications.
arXiv Detail & Related papers (2022-02-17T07:56:46Z) - Molecular Attributes Transfer from Non-Parallel Data [57.010952598634944]
We formulate molecular optimization as a style transfer problem and present a novel generative model that could automatically learn internal differences between two groups of non-parallel data.
Experiments on two molecular optimization tasks, toxicity modification and synthesizability improvement, demonstrate that our model significantly outperforms several state-of-the-art methods.
arXiv Detail & Related papers (2021-11-30T06:10:22Z) - Uncertainty Quantification Using Neural Networks for Molecular Property
Prediction [33.34534208450156]
We systematically evaluate several methods on five benchmark datasets using multiple complementary performance metrics.
None of the methods we tested is unequivocally superior to all others, and none produces a particularly reliable ranking of errors across multiple datasets.
We conclude with a practical recommendation as to which existing techniques seem to perform well relative to others.
arXiv Detail & Related papers (2020-05-20T13:31:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.