On the Pitfalls of Heteroscedastic Uncertainty Estimation with
Probabilistic Neural Networks
- URL: http://arxiv.org/abs/2203.09168v1
- Date: Thu, 17 Mar 2022 08:46:17 GMT
- Title: On the Pitfalls of Heteroscedastic Uncertainty Estimation with
Probabilistic Neural Networks
- Authors: Maximilian Seitzer, Arash Tavakoli, Dimitrije Antic, Georg Martius
- Abstract summary: We present a synthetic example illustrating how this approach can lead to very poor but stable estimates.
We identify the culprit to be the log-likelihood loss, along with certain conditions that exacerbate the issue.
We present an alternative formulation, termed $beta$-NLL, in which each data point's contribution to the loss is weighted by the $beta$-exponentiated variance estimate.
- Score: 23.502721524477444
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Capturing aleatoric uncertainty is a critical part of many machine learning
systems. In deep learning, a common approach to this end is to train a neural
network to estimate the parameters of a heteroscedastic Gaussian distribution
by maximizing the logarithm of the likelihood function under the observed data.
In this work, we examine this approach and identify potential hazards
associated with the use of log-likelihood in conjunction with gradient-based
optimizers. First, we present a synthetic example illustrating how this
approach can lead to very poor but stable parameter estimates. Second, we
identify the culprit to be the log-likelihood loss, along with certain
conditions that exacerbate the issue. Third, we present an alternative
formulation, termed $\beta$-NLL, in which each data point's contribution to the
loss is weighted by the $\beta$-exponentiated variance estimate. We show that
using an appropriate $\beta$ largely mitigates the issue in our illustrative
example. Fourth, we evaluate this approach on a range of domains and tasks and
show that it achieves considerable improvements and performs more robustly
concerning hyperparameters, both in predictive RMSE and log-likelihood
criteria.
Related papers
- Efficient and Sharp Off-Policy Evaluation in Robust Markov Decision Processes [44.974100402600165]
We study the evaluation of a policy best-parametric and worst-case perturbations to a decision process (MDP)
We use transition observations from the original MDP, whether they are generated under the same or a different policy.
Our estimator is also estimated statistical inference using Wald confidence intervals.
arXiv Detail & Related papers (2024-03-29T18:11:49Z) - Online non-parametric likelihood-ratio estimation by Pearson-divergence
functional minimization [55.98760097296213]
We introduce a new framework for online non-parametric LRE (OLRE) for the setting where pairs of iid observations $(x_t sim p, x'_t sim q)$ are observed over time.
We provide theoretical guarantees for the performance of the OLRE method along with empirical validation in synthetic experiments.
arXiv Detail & Related papers (2023-11-03T13:20:11Z) - Distributionally Robust Optimization with Bias and Variance Reduction [9.341215359733601]
We show that Prospect, a gradient-based algorithm, enjoys linear convergence for smooth regularized losses.
We also show that Prospect can converge 2-3$times$ faster than baselines such as gradient-based methods.
arXiv Detail & Related papers (2023-10-21T00:03:54Z) - A Robustness Analysis of Blind Source Separation [91.3755431537592]
Blind source separation (BSS) aims to recover an unobserved signal from its mixture $X=f(S)$ under the condition that the transformation $f$ is invertible but unknown.
We present a general framework for analysing such violations and quantifying their impact on the blind recovery of $S$ from $X$.
We show that a generic BSS-solution in response to general deviations from its defining structural assumptions can be profitably analysed in the form of explicit continuity guarantees.
arXiv Detail & Related papers (2023-03-17T16:30:51Z) - Semantic Strengthening of Neuro-Symbolic Learning [85.6195120593625]
Neuro-symbolic approaches typically resort to fuzzy approximations of a probabilistic objective.
We show how to compute this efficiently for tractable circuits.
We test our approach on three tasks: predicting a minimum-cost path in Warcraft, predicting a minimum-cost perfect matching, and solving Sudoku puzzles.
arXiv Detail & Related papers (2023-02-28T00:04:22Z) - Robust computation of optimal transport by $\beta$-potential
regularization [79.24513412588745]
Optimal transport (OT) has become a widely used tool in the machine learning field to measure the discrepancy between probability distributions.
We propose regularizing OT with the beta-potential term associated with the so-called $beta$-divergence.
We experimentally demonstrate that the transport matrix computed with our algorithm helps estimate a probability distribution robustly even in the presence of outliers.
arXiv Detail & Related papers (2022-12-26T18:37:28Z) - Tractable and Near-Optimal Adversarial Algorithms for Robust Estimation
in Contaminated Gaussian Models [1.609950046042424]
Consider the problem of simultaneous estimation of location and variance matrix under Huber's contaminated Gaussian model.
First, we study minimum $f$-divergence estimation at the population level, corresponding to a generative adversarial method with a nonparametric discriminator.
We develop tractable adversarial algorithms with simple spline discriminators, which can be implemented via nested optimization.
The proposed methods are shown to achieve minimax optimal rates or near-optimal rates depending on the $f$-divergence and the penalty used.
arXiv Detail & Related papers (2021-12-24T02:46:51Z) - Heavy-tailed Streaming Statistical Estimation [58.70341336199497]
We consider the task of heavy-tailed statistical estimation given streaming $p$ samples.
We design a clipped gradient descent and provide an improved analysis under a more nuanced condition on the noise of gradients.
arXiv Detail & Related papers (2021-08-25T21:30:27Z) - Distributionally Robust Parametric Maximum Likelihood Estimation [13.09499764232737]
We propose a distributionally robust maximum likelihood estimator that minimizes the worst-case expected log-loss uniformly over a parametric nominal distribution.
Our novel robust estimator also enjoys statistical consistency and delivers promising empirical results in both regression and classification tasks.
arXiv Detail & Related papers (2020-10-11T19:05:49Z) - $\gamma$-ABC: Outlier-Robust Approximate Bayesian Computation Based on a
Robust Divergence Estimator [95.71091446753414]
We propose to use a nearest-neighbor-based $gamma$-divergence estimator as a data discrepancy measure.
Our method achieves significantly higher robustness than existing discrepancy measures.
arXiv Detail & Related papers (2020-06-13T06:09:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.