Perturbative partial moment matching and gradient-flow adaptive importance sampling transformations for Bayesian leave one out cross-validation
- URL: http://arxiv.org/abs/2402.08151v3
- Date: Mon, 02 Jun 2025 14:06:11 GMT
- Title: Perturbative partial moment matching and gradient-flow adaptive importance sampling transformations for Bayesian leave one out cross-validation
- Authors: Joshua C Chang, Xiangting Li, Shixin Xu, Hao-Ren Yao, Julia Porcino, Carson Chow,
- Abstract summary: We motivate the use of perturbative transformations of the form $T(boldsymboltheta)=boldsymboltheta + h Q(boldsymboltheta),$ for $0hll 1,$.<n>We derive closed-form expressions in the case of logistic regression and shallow ReLU activated neural networks.
- Score: 0.9895793818721335
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Importance sampling (IS) allows one to approximate leave one out (LOO) cross-validation for a Bayesian model, without refitting, by inverting the Bayesian update equation to subtract a given data point from a model posterior. For each data point, one computes expectations under the corresponding LOO posterior by weighted averaging over the full data posterior. This task sometimes requires weight stabilization in the form of adapting the posterior distribution via transformation. So long as one is successful in finding a suitable transformation, one avoids refitting. To this end, we motivate the use of bijective perturbative transformations of the form $T(\boldsymbol{\theta})=\boldsymbol{\theta} + h Q(\boldsymbol{\theta}),$ for $0<h\ll 1,$ and introduce two classes of such transformations: 1) partial moment matching and 2) gradient flow evolution. The former extends prior literature on moment-matching under the recognition that adaptation for LOO is a small perturbation on the full data posterior. The latter class of methods define transformations based on relaxing various statistical objectives: in our case the variance of the IS estimator and the KL divergence between the transformed distribution and the statistics of the LOO fold. Being model-specific, the gradient flow transformations require evaluating Jacobian determinants. While these quantities are generally readily available through auto-differentiation, we derive closed-form expressions in the case of logistic regression and shallow ReLU activated neural networks. We tested the methodology on an $n\ll p$ dataset that is known to produce unstable LOO IS weights.
Related papers
- Unsupervised Representation Learning from Sparse Transformation Analysis [79.94858534887801]
We propose to learn representations from sequence data by factorizing the transformations of the latent variables into sparse components.
Input data are first encoded as distributions of latent activations and subsequently transformed using a probability flow model.
arXiv Detail & Related papers (2024-10-07T23:53:25Z) - von Mises Quasi-Processes for Bayesian Circular Regression [57.88921637944379]
We explore a family of expressive and interpretable distributions over circle-valued random functions.
The resulting probability model has connections with continuous spin models in statistical physics.
For posterior inference, we introduce a new Stratonovich-like augmentation that lends itself to fast Markov Chain Monte Carlo sampling.
arXiv Detail & Related papers (2024-06-19T01:57:21Z) - Scaling and renormalization in high-dimensional regression [72.59731158970894]
This paper presents a succinct derivation of the training and generalization performance of a variety of high-dimensional ridge regression models.
We provide an introduction and review of recent results on these topics, aimed at readers with backgrounds in physics and deep learning.
arXiv Detail & Related papers (2024-05-01T15:59:00Z) - Variational Bayesian surrogate modelling with application to robust design optimisation [0.9626666671366836]
Surrogate models provide a quick-to-evaluate approximation to complex computational models.
We consider Bayesian inference for constructing statistical surrogates with input uncertainties and dimensionality reduction.
We demonstrate intrinsic and robust structural optimisation problems where cost functions depend on a weighted sum of the mean and standard deviation of model outputs.
arXiv Detail & Related papers (2024-04-23T09:22:35Z) - Fusion of Gaussian Processes Predictions with Monte Carlo Sampling [61.31380086717422]
In science and engineering, we often work with models designed for accurate prediction of variables of interest.
Recognizing that these models are approximations of reality, it becomes desirable to apply multiple models to the same data and integrate their outcomes.
arXiv Detail & Related papers (2024-03-03T04:21:21Z) - Out of the Ordinary: Spectrally Adapting Regression for Covariate Shift [12.770658031721435]
We propose a method for adapting the weights of the last layer of a pre-trained neural regression model to perform better on input data originating from a different distribution.
We demonstrate how this lightweight spectral adaptation procedure can improve out-of-distribution performance for synthetic and real-world datasets.
arXiv Detail & Related papers (2023-12-29T04:15:58Z) - A probabilistic, data-driven closure model for RANS simulations with aleatoric, model uncertainty [1.8416014644193066]
We propose a data-driven, closure model for Reynolds-averaged Navier-Stokes (RANS) simulations that incorporates aleatoric, model uncertainty.
A fully Bayesian formulation is proposed, combined with a sparsity-inducing prior in order to identify regions in the problem domain where the parametric closure is insufficient.
arXiv Detail & Related papers (2023-07-05T16:53:31Z) - Monte Carlo inference for semiparametric Bayesian regression [5.488491124945426]
This paper introduces a simple, general, and efficient strategy for joint posterior inference of an unknown transformation and all regression model parameters.
It delivers (1) joint posterior consistency under general conditions, including multiple model misspecifications, and (2) efficient Monte Carlo (not Markov chain Monte Carlo) inference for the transformation and all parameters for important special cases.
arXiv Detail & Related papers (2023-06-08T18:42:42Z) - Learning to solve Bayesian inverse problems: An amortized variational inference approach using Gaussian and Flow guides [0.0]
We develop a methodology that enables real-time inference by learning the Bayesian inverse map, i.e., the map from data to posteriors.
Our approach provides the posterior distribution for a given observation just at the cost of a forward pass of the neural network.
arXiv Detail & Related papers (2023-05-31T16:25:07Z) - Refining Amortized Posterior Approximations using Gradient-Based Summary
Statistics [0.9176056742068814]
We present an iterative framework to improve the amortized approximations of posterior distributions in the context of inverse problems.
We validate our method in a controlled setting by applying it to a stylized problem, and observe improved posterior approximations with each iteration.
arXiv Detail & Related papers (2023-05-15T15:47:19Z) - Variational Laplace Autoencoders [53.08170674326728]
Variational autoencoders employ an amortized inference model to approximate the posterior of latent variables.
We present a novel approach that addresses the limited posterior expressiveness of fully-factorized Gaussian assumption.
We also present a general framework named Variational Laplace Autoencoders (VLAEs) for training deep generative models.
arXiv Detail & Related papers (2022-11-30T18:59:27Z) - Score-based Continuous-time Discrete Diffusion Models [102.65769839899315]
We extend diffusion models to discrete variables by introducing a Markov jump process where the reverse process denoises via a continuous-time Markov chain.
We show that an unbiased estimator can be obtained via simple matching the conditional marginal distributions.
We demonstrate the effectiveness of the proposed method on a set of synthetic and real-world music and image benchmarks.
arXiv Detail & Related papers (2022-11-30T05:33:29Z) - Bias-variance decomposition of overparameterized regression with random
linear features [0.0]
"Over parameterized models" avoid overfitting even when the number of fit parameters is large enough to perfectly fit the training data.
We show how each transition arises due to small nonzero eigenvalues in the Hessian matrix.
We compare and contrast the phase diagram of the random linear features model to the random nonlinear features model and ordinary regression.
arXiv Detail & Related papers (2022-03-10T16:09:21Z) - Inverting brain grey matter models with likelihood-free inference: a
tool for trustable cytoarchitecture measurements [62.997667081978825]
characterisation of the brain grey matter cytoarchitecture with quantitative sensitivity to soma density and volume remains an unsolved challenge in dMRI.
We propose a new forward model, specifically a new system of equations, requiring a few relatively sparse b-shells.
We then apply modern tools from Bayesian analysis known as likelihood-free inference (LFI) to invert our proposed model.
arXiv Detail & Related papers (2021-11-15T09:08:27Z) - Performance of Bayesian linear regression in a model with mismatch [8.60118148262922]
We analyze the performance of an estimator given by the mean of a log-concave Bayesian posterior distribution with gaussian prior.
This inference model can be rephrased as a version of the Gardner model in spin glasses.
arXiv Detail & Related papers (2021-07-14T18:50:13Z) - Variational Bayesian Unlearning [54.26984662139516]
We study the problem of approximately unlearning a Bayesian model from a small subset of the training data to be erased.
We show that it is equivalent to minimizing an evidence upper bound which trades off between fully unlearning from erased data vs. not entirely forgetting the posterior belief.
In model training with VI, only an approximate (instead of exact) posterior belief given the full data can be obtained, which makes unlearning even more challenging.
arXiv Detail & Related papers (2020-10-24T11:53:00Z) - Model Fusion with Kullback--Leibler Divergence [58.20269014662046]
We propose a method to fuse posterior distributions learned from heterogeneous datasets.
Our algorithm relies on a mean field assumption for both the fused model and the individual dataset posteriors.
arXiv Detail & Related papers (2020-07-13T03:27:45Z) - Implicit Bias of Gradient Descent for Mean Squared Error Regression with
Two-Layer Wide Neural Networks [1.3706331473063877]
We show that the solution of training a width-$n$ shallow ReLU network is within $n- 1/2$ of the function which fits the training data.
We also show that the training trajectories are captured by trajectories of smoothing splines with decreasing regularization strength.
arXiv Detail & Related papers (2020-06-12T17:46:40Z) - Path Sample-Analytic Gradient Estimators for Stochastic Binary Networks [78.76880041670904]
In neural networks with binary activations and or binary weights the training by gradient descent is complicated.
We propose a new method for this estimation problem combining sampling and analytic approximation steps.
We experimentally show higher accuracy in gradient estimation and demonstrate a more stable and better performing training in deep convolutional models.
arXiv Detail & Related papers (2020-06-04T21:51:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.