Related papers: Maximum Weight Entropy

Maximum Weight Entropy

URL: http://arxiv.org/abs/2309.15704v1
Date: Wed, 27 Sep 2023 14:46:10 GMT
Title: Maximum Weight Entropy
Authors: Antoine de Mathelin, Fran\c{c}ois Deheeger, Mathilde Mougeot, Nicolas Vayatis
Abstract summary: This paper deals with uncertainty quantification and out-of-distribution detection in deep learning using Bayesian and ensemble methods. Considering neural networks, a practical optimization is derived to build such a distribution, defined as a trade-off between the average empirical risk and the weight distribution entropy.
Score: 6.821961232645206
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper deals with uncertainty quantification and out-of-distribution detection in deep learning using Bayesian and ensemble methods. It proposes a practical solution to the lack of prediction diversity observed recently for standard approaches when used out-of-distribution (Ovadia et al., 2019; Liu et al., 2021). Considering that this issue is mainly related to a lack of weight diversity, we claim that standard methods sample in "over-restricted" regions of the weight space due to the use of "over-regularization" processes, such as weight decay and zero-mean centered Gaussian priors. We propose to solve the problem by adopting the maximum entropy principle for the weight distribution, with the underlying idea to maximize the weight diversity. Under this paradigm, the epistemic uncertainty is described by the weight distribution of maximal entropy that produces neural networks "consistent" with the training observations. Considering stochastic neural networks, a practical optimization is derived to build such a distribution, defined as a trade-off between the average empirical risk and the weight distribution entropy. We develop a novel weight parameterization for the stochastic model, based on the singular value decomposition of the neural network's hidden representations, which enables a large increase of the weight entropy for a small empirical risk penalization. We provide both theoretical and numerical results to assess the efficiency of the approach. In particular, the proposed algorithm appears in the top three best methods in all configurations of an extensive out-of-distribution detection benchmark including more than thirty competitors.

Related papers

Optimization and Generalization Guarantees for Weight Normalization [19.965963460750206]
We provide the first theoretical characterizations of both optimization and generalization of deep WeightNorm models. We present experimental results which illustrate how the normalization terms and other quantities of theoretical interest relate to the training of WeightNorm networks.
arXiv Detail & Related papers (2024-09-13T15:55:05Z)
Constrained Reweighting of Distributions: an Optimal Transport Approach [8.461214317999321]
We introduce a nonparametrically imbued distributional constraints on the weights, and develop a general framework leveraging the maximum entropy principle and tools from optimal transport. The framework is demonstrated in the context of three disparate applications: portfolio allocation, semi-parametric inference for complex surveys, and ensuring algorithmic fairness in machine learning algorithms.
arXiv Detail & Related papers (2023-10-19T03:54:31Z)
Variational autoencoder with weighted samples for high-dimensional non-parametric adaptive importance sampling [0.0]
We extend the existing framework to the case of weighted samples by introducing a new objective function. In order to add flexibility to the model and to be able to learn multimodal distributions, we consider a learnable prior distribution. We exploit the proposed procedure in existing adaptive importance sampling algorithms to draw points from a target distribution and to estimate a rare event probability in high dimension.
arXiv Detail & Related papers (2023-10-13T15:40:55Z)
Deep Anti-Regularized Ensembles provide reliable out-of-distribution uncertainty quantification [4.750521042508541]
Deep ensemble often return overconfident estimates outside the training domain. We show that an ensemble of networks with large weights fitting the training data are likely to meet these two objectives. We derive a theoretical framework for this approach and show that the proposed optimization can be seen as a "water-filling" problem.
arXiv Detail & Related papers (2023-04-08T15:25:12Z)
Improved uncertainty quantification for neural networks with Bayesian last layer [0.0]
Uncertainty quantification is an important task in machine learning. We present a reformulation of the log-marginal likelihood of a NN with BLL which allows for efficient training using backpropagation.
arXiv Detail & Related papers (2023-02-21T20:23:56Z)
Quantifying Model Predictive Uncertainty with Perturbation Theory [21.591460685054546]
We propose a framework for predictive uncertainty quantification of a neural network. We use perturbation theory from quantum physics to formulate a moment decomposition problem. Our approach provides fast model predictive uncertainty estimates with much greater precision and calibration.
arXiv Detail & Related papers (2021-09-22T17:55:09Z)
Multivariate Deep Evidential Regression [77.34726150561087]
A new approach with uncertainty-aware neural networks shows promise over traditional deterministic methods. We discuss three issues with a proposed solution to extract aleatoric and epistemic uncertainties from regression-based neural networks.
arXiv Detail & Related papers (2021-04-13T12:20:18Z)
Sampling-free Variational Inference for Neural Networks with Multiplicative Activation Noise [51.080620762639434]
We propose a more efficient parameterization of the posterior approximation for sampling-free variational inference. Our approach yields competitive results for standard regression problems and scales well to large-scale image classification tasks.
arXiv Detail & Related papers (2021-03-15T16:16:18Z)
Continuous Wasserstein-2 Barycenter Estimation without Minimax Optimization [94.18714844247766]
Wasserstein barycenters provide a geometric notion of the weighted average of probability measures based on optimal transport. We present a scalable algorithm to compute Wasserstein-2 barycenters given sample access to the input measures.
arXiv Detail & Related papers (2021-02-02T21:01:13Z)
Amortized Conditional Normalized Maximum Likelihood: Reliable Out of Distribution Uncertainty Estimation [99.92568326314667]
We propose the amortized conditional normalized maximum likelihood (ACNML) method as a scalable general-purpose approach for uncertainty estimation. Our algorithm builds on the conditional normalized maximum likelihood (CNML) coding scheme, which has minimax optimal properties according to the minimum description length principle. We demonstrate that ACNML compares favorably to a number of prior techniques for uncertainty estimation in terms of calibration on out-of-distribution inputs.
arXiv Detail & Related papers (2020-11-05T08:04:34Z)
A One-step Approach to Covariate Shift Adaptation [82.01909503235385]
A default assumption in many machine learning scenarios is that the training and test samples are drawn from the same probability distribution. We propose a novel one-step approach that jointly learns the predictive model and the associated weights in one optimization.
arXiv Detail & Related papers (2020-07-08T11:35:47Z)
Mean-Field Approximation to Gaussian-Softmax Integral with Application to Uncertainty Estimation [23.38076756988258]
We propose a new single-model based approach to quantify uncertainty in deep neural networks. We use a mean-field approximation formula to compute an analytically intractable integral. Empirically, the proposed approach performs competitively when compared to state-of-the-art methods.
arXiv Detail & Related papers (2020-06-13T07:32:38Z)
Bayesian Deep Learning and a Probabilistic Perspective of Generalization [56.69671152009899]
We show that deep ensembles provide an effective mechanism for approximate Bayesian marginalization. We also propose a related approach that further improves the predictive distribution by marginalizing within basins of attraction.
arXiv Detail & Related papers (2020-02-20T15:13:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.