Kermut: Composite kernel regression for protein variant effects
- URL: http://arxiv.org/abs/2407.00002v3
- Date: Thu, 31 Oct 2024 14:52:28 GMT
- Title: Kermut: Composite kernel regression for protein variant effects
- Authors: Peter Mørch Groth, Mads Herbert Kerrn, Lars Olsen, Jesper Salomon, Wouter Boomsma,
- Abstract summary: We provide a process regression model, Kermut, with a novel composite kernel for modeling mutation similarity.
An analysis of the quality of the uncertainty estimates demonstrates that our model provides meaningful levels of overall calibration.
- Score: 0.9262403397108374
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Reliable prediction of protein variant effects is crucial for both protein optimization and for advancing biological understanding. For practical use in protein engineering, it is important that we can also provide reliable uncertainty estimates for our predictions, and while prediction accuracy has seen much progress in recent years, uncertainty metrics are rarely reported. We here provide a Gaussian process regression model, Kermut, with a novel composite kernel for modeling mutation similarity, which obtains state-of-the-art performance for supervised protein variant effect prediction while also offering estimates of uncertainty through its posterior. An analysis of the quality of the uncertainty estimates demonstrates that our model provides meaningful levels of overall calibration, but that instance-specific uncertainty calibration remains more challenging.
Related papers
- Mass Balance Approximation of Unfolding Improves Potential-Like Methods for Protein Stability Predictions [0.0]
Deep-learning strategies have pushed the field forward, but their use in standard methods remains limited due to resource demands.
This study shows that incorporating a mass-balance correction (MBC) to account for the unfolded state significantly enhances these methods.
While many machine learning models partially model this balance, our analysis suggests that a refined representation of the unfolded state may improve the predictive performance.
arXiv Detail & Related papers (2025-04-09T11:53:02Z) - Evaluation of uncertainty estimations for Gaussian process regression based machine learning interatomic potentials [0.0]
Uncertainty estimations for machine learning interatomic potentials (MLIPs) are crucial for quantifying model error.
We evaluate uncertainty estimations of GPR-based MLIPs, including the predictive GPR standard deviation and ensemble-based uncertainties.
arXiv Detail & Related papers (2024-10-27T10:06:09Z) - Efficiently Predicting Protein Stability Changes Upon Single-point
Mutation with Large Language Models [51.57843608615827]
The ability to precisely predict protein thermostability is pivotal for various subfields and applications in biochemistry.
We introduce an ESM-assisted efficient approach that integrates protein sequence and structural features to predict the thermostability changes in protein upon single-point mutations.
arXiv Detail & Related papers (2023-12-07T03:25:49Z) - Selective Nonparametric Regression via Testing [54.20569354303575]
We develop an abstention procedure via testing the hypothesis on the value of the conditional variance at a given point.
Unlike existing methods, the proposed one allows to account not only for the value of the variance itself but also for the uncertainty of the corresponding variance predictor.
arXiv Detail & Related papers (2023-09-28T13:04:11Z) - Graph Neural Network Interatomic Potential Ensembles with Calibrated
Aleatoric and Epistemic Uncertainty on Energy and Forces [9.378581265532006]
We present a complete framework for training and recalibrating graph neural network ensemble models to produce accurate predictions of energy and forces.
The proposed method considers both epistemic and aleatoric uncertainty and the total uncertainties are recalibrated post hoc.
A detailed analysis of the predictive performance and uncertainty calibration is provided.
arXiv Detail & Related papers (2023-05-10T13:03:06Z) - On Calibrated Model Uncertainty in Deep Learning [0.0]
We extend the approximate inference for the loss-calibrated Bayesian framework to dropweights based Bayesian neural networks.
We show that decisions informed by loss-calibrated uncertainty can improve diagnostic performance to a greater extent than straightforward alternatives.
arXiv Detail & Related papers (2022-06-15T20:16:32Z) - Dense Uncertainty Estimation via an Ensemble-based Conditional Latent
Variable Model [68.34559610536614]
We argue that the aleatoric uncertainty is an inherent attribute of the data and can only be correctly estimated with an unbiased oracle model.
We propose a new sampling and selection strategy at train time to approximate the oracle model for aleatoric uncertainty estimation.
Our results show that our solution achieves both accurate deterministic results and reliable uncertainty estimation.
arXiv Detail & Related papers (2021-11-22T08:54:10Z) - Dense Uncertainty Estimation [62.23555922631451]
In this paper, we investigate neural networks and uncertainty estimation techniques to achieve both accurate deterministic prediction and reliable uncertainty estimation.
We work on two types of uncertainty estimations solutions, namely ensemble based methods and generative model based methods, and explain their pros and cons while using them in fully/semi/weakly-supervised framework.
arXiv Detail & Related papers (2021-10-13T01:23:48Z) - When in Doubt: Neural Non-Parametric Uncertainty Quantification for
Epidemic Forecasting [70.54920804222031]
Most existing forecasting models disregard uncertainty quantification, resulting in mis-calibrated predictions.
Recent works in deep neural models for uncertainty-aware time-series forecasting also have several limitations.
We model the forecasting task as a probabilistic generative process and propose a functional neural process model called EPIFNP.
arXiv Detail & Related papers (2021-06-07T18:31:47Z) - Aleatoric uncertainty for Errors-in-Variables models in deep regression [0.48733623015338234]
We show how the concept of Errors-in-Variables can be used in Bayesian deep regression.
We discuss the approach along various simulated and real examples.
arXiv Detail & Related papers (2021-05-19T12:37:02Z) - Bayesian Uncertainty Estimation of Learned Variational MRI
Reconstruction [63.202627467245584]
We introduce a Bayesian variational framework to quantify the model-immanent (epistemic) uncertainty.
We demonstrate that our approach yields competitive results for undersampled MRI reconstruction.
arXiv Detail & Related papers (2021-02-12T18:08:14Z) - Calibrated Reliable Regression using Maximum Mean Discrepancy [45.45024203912822]
Modern deep neural networks still produce unreliable predictive uncertainty.
In this paper, we are concerned with getting well-calibrated predictions in regression tasks.
Experiments on non-trivial real datasets show that our method can produce well-calibrated and sharp prediction intervals.
arXiv Detail & Related papers (2020-06-18T03:38:12Z) - Maximum likelihood estimation and uncertainty quantification for
Gaussian process approximation of deterministic functions [10.319367855067476]
This article provides one of the first theoretical analyses in the context of Gaussian process regression with a noiseless dataset.
We show that the maximum likelihood estimation of the scale parameter alone provides significant adaptation against misspecification of the Gaussian process model.
arXiv Detail & Related papers (2020-01-29T17:20:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.