Related papers: Disentangling the Predictive Variance of Deep Ensembles through the Neural Tangent Kernel

Disentangling the Predictive Variance of Deep Ensembles through the Neural Tangent Kernel

URL: http://arxiv.org/abs/2210.09818v1
Date: Tue, 18 Oct 2022 12:55:53 GMT
Title: Disentangling the Predictive Variance of Deep Ensembles through the Neural Tangent Kernel
Authors: Seijin Kobayashi, Pau Vilimelis Aceituno, Johannes von Oswald
Abstract summary: We study deep ensembles with large layer widths operating in simplified linear training regimes. We identify two sources of noise, each inducing a distinct inductive bias in the predictive variance. We propose practical ways to eliminate part of these noise sources leading to significant changes and improved OOD detection in trained deep ensembles.
Score: 4.515606790756141
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Identifying unfamiliar inputs, also known as out-of-distribution (OOD) detection, is a crucial property of any decision making process. A simple and empirically validated technique is based on deep ensembles where the variance of predictions over different neural networks acts as a substitute for input uncertainty. Nevertheless, a theoretical understanding of the inductive biases leading to the performance of deep ensemble's uncertainty estimation is missing. To improve our description of their behavior, we study deep ensembles with large layer widths operating in simplified linear training regimes, in which the functions trained with gradient descent can be described by the neural tangent kernel. We identify two sources of noise, each inducing a distinct inductive bias in the predictive variance at initialization. We further show theoretically and empirically that both noise sources affect the predictive variance of non-linear deep ensembles in toy models and realistic settings after training. Finally, we propose practical ways to eliminate part of these noise sources leading to significant changes and improved OOD detection in trained deep ensembles.

Related papers

Towards stable real-world equation discovery with assessing differentiating quality influence [52.2980614912553]
We propose alternatives to the commonly used finite differences-based method. We evaluate these methods in terms of applicability to problems, similar to the real ones, and their ability to ensure the convergence of equation discovery algorithms.
arXiv Detail & Related papers (2023-11-09T23:32:06Z)
Learning Linear Causal Representations from Interventions under General Nonlinear Mixing [52.66151568785088]
We prove strong identifiability results given unknown single-node interventions without access to the intervention targets. This is the first instance of causal identifiability from non-paired interventions for deep neural network embeddings.
arXiv Detail & Related papers (2023-06-04T02:32:12Z)
Uncertainty Estimation by Fisher Information-based Evidential Deep Learning [61.94125052118442]
Uncertainty estimation is a key factor that makes deep learning reliable in practical applications. We propose a novel method, Fisher Information-based Evidential Deep Learning ($mathcalI$-EDL) In particular, we introduce Fisher Information Matrix (FIM) to measure the informativeness of evidence carried by each sample, according to which we can dynamically reweight the objective loss terms to make the network more focused on the representation learning of uncertain classes.
arXiv Detail & Related papers (2023-03-03T16:12:59Z)
Density Regression and Uncertainty Quantification with Bayesian Deep Noise Neural Networks [4.376565880192482]
Deep neural network (DNN) models have achieved state-of-the-art predictive accuracy in a wide range of supervised learning applications. accurately quantifying the uncertainty in DNN predictions remains a challenging task. We propose the Bayesian Deep Noise Neural Network (B-DeepNoise), which generalizes standard Bayesian DNNs by extending the random noise variable to all hidden layers. We evaluate B-DeepNoise against existing methods on benchmark regression datasets, demonstrating its superior performance in terms of prediction accuracy, uncertainty quantification accuracy, and uncertainty quantification efficiency.
arXiv Detail & Related papers (2022-06-12T02:47:29Z)
Multivariate Deep Evidential Regression [77.34726150561087]
A new approach with uncertainty-aware neural networks shows promise over traditional deterministic methods. We discuss three issues with a proposed solution to extract aleatoric and epistemic uncertainties from regression-based neural networks.
arXiv Detail & Related papers (2021-04-13T12:20:18Z)
A Random Matrix Theory Approach to Damping in Deep Learning [0.7614628596146599]
We conjecture that the inherent difference in generalisation between adaptive and non-adaptive gradient methods in deep learning stems from the increased estimation noise. We develop a novel random matrix theory based damping learner for second order optimiser inspired by linear shrinkage estimation.
arXiv Detail & Related papers (2020-11-15T18:19:42Z)
Efficient Variational Inference for Sparse Deep Learning with Theoretical Guarantee [20.294908538266867]
Sparse deep learning aims to address the challenge of huge storage consumption by deep neural networks. In this paper, we train sparse deep neural networks with a fully Bayesian treatment under spike-and-slab priors. We develop a set of computationally efficient variational inferences via continuous relaxation of Bernoulli distribution.
arXiv Detail & Related papers (2020-11-15T03:27:54Z)
Ramifications of Approximate Posterior Inference for Bayesian Deep Learning in Adversarial and Out-of-Distribution Settings [7.476901945542385]
We show that Bayesian deep learning models on certain occasions marginally outperform conventional neural networks. Preliminary investigations indicate the potential inherent role of bias due to choices of initialisation, architecture or activation functions.
arXiv Detail & Related papers (2020-09-03T16:58:15Z)
Revisiting One-vs-All Classifiers for Predictive Uncertainty and Out-of-Distribution Detection in Neural Networks [22.34227625637843]
We investigate how the parametrization of the probabilities in discriminative classifiers affects the uncertainty estimates. We show that one-vs-all formulations can improve calibration on image classification tasks.
arXiv Detail & Related papers (2020-07-10T01:55:02Z)
Reintroducing Straight-Through Estimators as Principled Methods for Stochastic Binary Networks [85.94999581306827]
Training neural networks with binary weights and activations is a challenging problem due to the lack of gradients and difficulty of optimization over discrete weights. Many successful experimental results have been achieved with empirical straight-through (ST) approaches. At the same time, ST methods can be truly derived as estimators in the binary network (SBN) model with Bernoulli weights.
arXiv Detail & Related papers (2020-06-11T23:58:18Z)
Multiplicative noise and heavy tails in stochastic optimization [62.993432503309485]
empirical optimization is central to modern machine learning, but its role in its success is still unclear. We show that it commonly arises in parameters of discrete multiplicative noise due to variance. A detailed analysis is conducted in which we describe on key factors, including recent step size, and data, all exhibit similar results on state-of-the-art neural network models.
arXiv Detail & Related papers (2020-06-11T09:58:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.