Related papers: A Computable Measure of Suboptimality for Entropy-Regularised Variational Objectives

A Computable Measure of Suboptimality for Entropy-Regularised Variational Objectives

URL: http://arxiv.org/abs/2509.10393v2
Date: Fri, 17 Oct 2025 11:51:16 GMT
Title: A Computable Measure of Suboptimality for Entropy-Regularised Variational Objectives
Authors: Clémentine Chazal, Heishiro Kanagawa, Zheyang Shen, Anna Korba, Chris. J. Oates,
Abstract summary: Several emerging post-Bayesian methods target a probability distribution for which an entropy-regularised variational objective is minimised.<n>This increased flexibility introduces a computational challenge, as one loses access to an explicit unnormalised density for the target.<n>We introduce a novel measure of suboptimality called 'gradient discrepancy', and in particular a'Kernel gradient discrepancy' that can be explicitly computed.
Score: 17.212481754312048
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Several emerging post-Bayesian methods target a probability distribution for which an entropy-regularised variational objective is minimised. This increased flexibility introduces a computational challenge, as one loses access to an explicit unnormalised density for the target. To mitigate this difficulty, we introduce a novel measure of suboptimality called 'gradient discrepancy', and in particular a 'kernel' gradient discrepancy (KGD) that can be explicitly computed. In the standard Bayesian context, KGD coincides with the kernel Stein discrepancy (KSD), and we obtain a novel characterisation of KSD as measuring the size of a variational gradient. Outside this familiar setting, KGD enables novel sampling algorithms to be developed and compared, even when unnormalised densities cannot be obtained. To illustrate this point several novel algorithms are proposed and studied, including a natural generalisation of Stein variational gradient descent, with applications to mean-field neural networks and predictively oriented posteriors presented. On the theoretical side, our principal contribution is to establish sufficient conditions for desirable properties of KGD, such as continuity and convergence control.

Related papers

Revisiting Zeroth-Order Optimization: Minimum-Variance Two-Point Estimators and Directionally Aligned Perturbations [57.179679246370114]
We identify the distribution of random perturbations that minimizes the estimator's variance as the perturbation stepsize tends to zero.<n>Our findings reveal that such desired perturbations can align directionally with the true gradient, instead of maintaining a fixed length.
arXiv Detail & Related papers (2025-10-22T19:06:39Z)
Flow-Induced Diagonal Gaussian Processes [7.720921989821054]
Flow-Induced Diagonal Gaussian Processes (FiD-GP) is a compression framework that incorporates a compact inducing weight matrix.<n>We show how FiD-GP can help to design a single-pass projection for Out-of-Distribution (OoD) detection.
arXiv Detail & Related papers (2025-09-21T16:49:09Z)
Spectral Algorithms in Misspecified Regression: Convergence under Covariate Shift [0.2578242050187029]
spectral algorithms are a class of regularization methods originating from inverse problems.<n>In this paper, we investigate the convergence properties of spectral algorithms under covariate shift.<n>We provide a theoretical analysis of the more challenging misspecified case, in which the target function does not belong to the kernel reproducing Hilbert space (RKHS)
arXiv Detail & Related papers (2025-09-05T13:42:27Z)
Precise Bayesian Neural Networks [0.0]
We develop a lightweight, implementation-ready variational unit that fits modern normalized architectures and improves calibration without sacrificing accuracy.<n>In short, by aligning the variational posterior with the network's intrinsic geometry, BNNs can be simultaneously principled, practical, and precise.
arXiv Detail & Related papers (2025-06-24T15:42:00Z)
Distributionally Robust Optimization via Iterative Algorithms in Continuous Probability Spaces [6.992239210938067]
We consider a minimax problem motivated by distributionally robust optimization (DRO) when the worst-case distribution is continuous.<n>Recent research has explored learning the worst-case distribution using neural network-based generative networks.<n>This paper bridges this theoretical challenge by presenting an iterative algorithm to solve such a minimax problem.
arXiv Detail & Related papers (2024-12-29T19:31:23Z)
Deep Out-of-Distribution Uncertainty Quantification via Weight Entropy Maximization [7.182234028830364]
This paper deals with uncertainty quantification and out-of-distribution detection in deep learning using Bayesian and ensemble methods.<n>Considering neural networks, a practical optimization is derived to build such a distribution, defined as a trade-off between the average empirical risk and the weight distribution entropy.
arXiv Detail & Related papers (2023-09-27T14:46:10Z)
Sobolev Space Regularised Pre Density Models [51.558848491038916]
We propose a new approach to non-parametric density estimation that is based on regularizing a Sobolev norm of the density. This method is statistically consistent, and makes the inductive validation model clear and consistent.
arXiv Detail & Related papers (2023-07-25T18:47:53Z)
Adaptive Annealed Importance Sampling with Constant Rate Progress [68.8204255655161]
Annealed Importance Sampling (AIS) synthesizes weighted samples from an intractable distribution. We propose the Constant Rate AIS algorithm and its efficient implementation for $alpha$-divergences.
arXiv Detail & Related papers (2023-06-27T08:15:28Z)
Differentiable Annealed Importance Sampling and the Perils of Gradient Noise [68.44523807580438]
Annealed importance sampling (AIS) and related algorithms are highly effective tools for marginal likelihood estimation. Differentiability is a desirable property as it would admit the possibility of optimizing marginal likelihood as an objective. We propose a differentiable algorithm by abandoning Metropolis-Hastings steps, which further unlocks mini-batch computation.
arXiv Detail & Related papers (2021-07-21T17:10:14Z)
Spectral clustering under degree heterogeneity: a case for the random walk Laplacian [83.79286663107845]
This paper shows that graph spectral embedding using the random walk Laplacian produces vector representations which are completely corrected for node degree. In the special case of a degree-corrected block model, the embedding concentrates about K distinct points, representing communities.
arXiv Detail & Related papers (2021-05-03T16:36:27Z)
Benign Overfitting of Constant-Stepsize SGD for Linear Regression [122.70478935214128]
inductive biases are central in preventing overfitting empirically. This work considers this issue in arguably the most basic setting: constant-stepsize SGD for linear regression. We reflect on a number of notable differences between the algorithmic regularization afforded by (unregularized) SGD in comparison to ordinary least squares.
arXiv Detail & Related papers (2021-03-23T17:15:53Z)
Amortized Conditional Normalized Maximum Likelihood: Reliable Out of Distribution Uncertainty Estimation [99.92568326314667]
We propose the amortized conditional normalized maximum likelihood (ACNML) method as a scalable general-purpose approach for uncertainty estimation. Our algorithm builds on the conditional normalized maximum likelihood (CNML) coding scheme, which has minimax optimal properties according to the minimum description length principle. We demonstrate that ACNML compares favorably to a number of prior techniques for uncertainty estimation in terms of calibration on out-of-distribution inputs.
arXiv Detail & Related papers (2020-11-05T08:04:34Z)
Nonconvex sparse regularization for deep neural networks and its optimality [1.9798034349981162]
Deep neural network (DNN) estimators can attain optimal convergence rates for regression and classification problems. We propose a novel penalized estimation method for sparse DNNs. We prove that the sparse-penalized estimator can adaptively attain minimax convergence rates for various nonparametric regression problems.
arXiv Detail & Related papers (2020-03-26T07:15:28Z)
GradientDICE: Rethinking Generalized Offline Estimation of Stationary Values [75.17074235764757]
We present GradientDICE for estimating the density ratio between the state distribution of the target policy and the sampling distribution. GenDICE is the state-of-the-art for estimating such density ratios.
arXiv Detail & Related papers (2020-01-29T22:10:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.