Related papers: Gradient-flow adaptive importance sampling for Bayesian leave one out cross-validation for sigmoidal classification models

Gradient-flow adaptive importance sampling for Bayesian leave one out cross-validation for sigmoidal classification models

URL: http://arxiv.org/abs/2402.08151v1
Date: Tue, 13 Feb 2024 01:03:39 GMT
Title: Gradient-flow adaptive importance sampling for Bayesian leave one out cross-validation for sigmoidal classification models
Authors: Joshua C Chang, Xiangting Li, Shixin Xu, Hao-Ren Yao, Julia Porcino, Carson Chow
Abstract summary: gradient-flow-guided adaptive importance sampling (IS) transformations to stabilize Monte-Carlo approximations of point-wise leave one out cross-validated (LOO) predictions for Bayesian classification models. We derive closed-form exact formulae for Jacobian determinants in the cases of logistic regression and shallow ReLU-activated artificial neural networks.
Score: 1.0485739694839669
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We introduce a set of gradient-flow-guided adaptive importance sampling (IS) transformations to stabilize Monte-Carlo approximations of point-wise leave one out cross-validated (LOO) predictions for Bayesian classification models. One can leverage this methodology for assessing model generalizability by for instance computing a LOO analogue to the AIC or computing LOO ROC/PRC curves and derived metrics like the AUROC and AUPRC. By the calculus of variations and gradient flow, we derive two simple nonlinear single-step transformations that utilize gradient information to shift a model's pre-trained full-data posterior closer to the target LOO posterior predictive distributions. In doing so, the transformations stabilize importance weights. Because the transformations involve the gradient of the likelihood function, the resulting Monte Carlo integral depends on Jacobian determinants with respect to the model Hessian. We derive closed-form exact formulae for these Jacobian determinants in the cases of logistic regression and shallow ReLU-activated artificial neural networks, and provide a simple approximation that sidesteps the need to compute full Hessian matrices and their spectra. We test the methodology on an $n\ll p$ dataset that is known to produce unstable LOO IS weights.

Related papers

Unsupervised Representation Learning from Sparse Transformation Analysis [79.94858534887801]
We propose to learn representations from sequence data by factorizing the transformations of the latent variables into sparse components. Input data are first encoded as distributions of latent activations and subsequently transformed using a probability flow model.
arXiv Detail & Related papers (2024-10-07T23:53:25Z)
von Mises Quasi-Processes for Bayesian Circular Regression [57.88921637944379]
We explore a family of expressive and interpretable distributions over circle-valued random functions. The resulting probability model has connections with continuous spin models in statistical physics. For posterior inference, we introduce a new Stratonovich-like augmentation that lends itself to fast Markov Chain Monte Carlo sampling.
arXiv Detail & Related papers (2024-06-19T01:57:21Z)
Scaling and renormalization in high-dimensional regression [72.59731158970894]
This paper presents a succinct derivation of the training and generalization performance of a variety of high-dimensional ridge regression models. We provide an introduction and review of recent results on these topics, aimed at readers with backgrounds in physics and deep learning.
arXiv Detail & Related papers (2024-05-01T15:59:00Z)
Variational Bayesian surrogate modelling with application to robust design optimisation [0.9626666671366836]
Surrogate models provide a quick-to-evaluate approximation to complex computational models. We consider Bayesian inference for constructing statistical surrogates with input uncertainties and dimensionality reduction. We demonstrate intrinsic and robust structural optimisation problems where cost functions depend on a weighted sum of the mean and standard deviation of model outputs.
arXiv Detail & Related papers (2024-04-23T09:22:35Z)
Fusion of Gaussian Processes Predictions with Monte Carlo Sampling [61.31380086717422]
In science and engineering, we often work with models designed for accurate prediction of variables of interest. Recognizing that these models are approximations of reality, it becomes desirable to apply multiple models to the same data and integrate their outcomes.
arXiv Detail & Related papers (2024-03-03T04:21:21Z)
Out of the Ordinary: Spectrally Adapting Regression for Covariate Shift [12.770658031721435]
We propose a method for adapting the weights of the last layer of a pre-trained neural regression model to perform better on input data originating from a different distribution. We demonstrate how this lightweight spectral adaptation procedure can improve out-of-distribution performance for synthetic and real-world datasets.
arXiv Detail & Related papers (2023-12-29T04:15:58Z)
A probabilistic, data-driven closure model for RANS simulations with aleatoric, model uncertainty [1.8416014644193066]
We propose a data-driven, closure model for Reynolds-averaged Navier-Stokes (RANS) simulations that incorporates aleatoric, model uncertainty. A fully Bayesian formulation is proposed, combined with a sparsity-inducing prior in order to identify regions in the problem domain where the parametric closure is insufficient.
arXiv Detail & Related papers (2023-07-05T16:53:31Z)
Monte Carlo inference for semiparametric Bayesian regression [5.488491124945426]
This paper introduces a simple, general, and efficient strategy for joint posterior inference of an unknown transformation and all regression model parameters. It delivers (1) joint posterior consistency under general conditions, including multiple model misspecifications, and (2) efficient Monte Carlo (not Markov chain Monte Carlo) inference for the transformation and all parameters for important special cases.
arXiv Detail & Related papers (2023-06-08T18:42:42Z)
Learning to solve Bayesian inverse problems: An amortized variational inference approach using Gaussian and Flow guides [0.0]
We develop a methodology that enables real-time inference by learning the Bayesian inverse map, i.e., the map from data to posteriors. Our approach provides the posterior distribution for a given observation just at the cost of a forward pass of the neural network.
arXiv Detail & Related papers (2023-05-31T16:25:07Z)
Refining Amortized Posterior Approximations using Gradient-Based Summary Statistics [0.9176056742068814]
We present an iterative framework to improve the amortized approximations of posterior distributions in the context of inverse problems. We validate our method in a controlled setting by applying it to a stylized problem, and observe improved posterior approximations with each iteration.
arXiv Detail & Related papers (2023-05-15T15:47:19Z)
Variational Laplace Autoencoders [53.08170674326728]
Variational autoencoders employ an amortized inference model to approximate the posterior of latent variables. We present a novel approach that addresses the limited posterior expressiveness of fully-factorized Gaussian assumption. We also present a general framework named Variational Laplace Autoencoders (VLAEs) for training deep generative models.
arXiv Detail & Related papers (2022-11-30T18:59:27Z)
Score-based Continuous-time Discrete Diffusion Models [102.65769839899315]
We extend diffusion models to discrete variables by introducing a Markov jump process where the reverse process denoises via a continuous-time Markov chain. We show that an unbiased estimator can be obtained via simple matching the conditional marginal distributions. We demonstrate the effectiveness of the proposed method on a set of synthetic and real-world music and image benchmarks.
arXiv Detail & Related papers (2022-11-30T05:33:29Z)
Bias-variance decomposition of overparameterized regression with random linear features [0.0]
"Over parameterized models" avoid overfitting even when the number of fit parameters is large enough to perfectly fit the training data. We show how each transition arises due to small nonzero eigenvalues in the Hessian matrix. We compare and contrast the phase diagram of the random linear features model to the random nonlinear features model and ordinary regression.
arXiv Detail & Related papers (2022-03-10T16:09:21Z)
Inverting brain grey matter models with likelihood-free inference: a tool for trustable cytoarchitecture measurements [62.997667081978825]
characterisation of the brain grey matter cytoarchitecture with quantitative sensitivity to soma density and volume remains an unsolved challenge in dMRI. We propose a new forward model, specifically a new system of equations, requiring a few relatively sparse b-shells. We then apply modern tools from Bayesian analysis known as likelihood-free inference (LFI) to invert our proposed model.
arXiv Detail & Related papers (2021-11-15T09:08:27Z)
Performance of Bayesian linear regression in a model with mismatch [8.60118148262922]
We analyze the performance of an estimator given by the mean of a log-concave Bayesian posterior distribution with gaussian prior. This inference model can be rephrased as a version of the Gardner model in spin glasses.
arXiv Detail & Related papers (2021-07-14T18:50:13Z)
Variational Bayesian Unlearning [54.26984662139516]
We study the problem of approximately unlearning a Bayesian model from a small subset of the training data to be erased. We show that it is equivalent to minimizing an evidence upper bound which trades off between fully unlearning from erased data vs. not entirely forgetting the posterior belief. In model training with VI, only an approximate (instead of exact) posterior belief given the full data can be obtained, which makes unlearning even more challenging.
arXiv Detail & Related papers (2020-10-24T11:53:00Z)
Model Fusion with Kullback--Leibler Divergence [58.20269014662046]
We propose a method to fuse posterior distributions learned from heterogeneous datasets. Our algorithm relies on a mean field assumption for both the fused model and the individual dataset posteriors.
arXiv Detail & Related papers (2020-07-13T03:27:45Z)
Implicit Bias of Gradient Descent for Mean Squared Error Regression with Two-Layer Wide Neural Networks [1.3706331473063877]
We show that the solution of training a width-$n$ shallow ReLU network is within $n- 1/2$ of the function which fits the training data. We also show that the training trajectories are captured by trajectories of smoothing splines with decreasing regularization strength.
arXiv Detail & Related papers (2020-06-12T17:46:40Z)
Path Sample-Analytic Gradient Estimators for Stochastic Binary Networks [78.76880041670904]
In neural networks with binary activations and or binary weights the training by gradient descent is complicated. We propose a new method for this estimation problem combining sampling and analytic approximation steps. We experimentally show higher accuracy in gradient estimation and demonstrate a more stable and better performing training in deep convolutional models.
arXiv Detail & Related papers (2020-06-04T21:51:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.