Understanding the bias-variance tradeoff of Bregman divergences
- URL: http://arxiv.org/abs/2202.04167v2
- Date: Thu, 10 Feb 2022 02:02:06 GMT
- Title: Understanding the bias-variance tradeoff of Bregman divergences
- Authors: Ben Adlam, Neha Gupta, Zelda Mariet, Jamie Smith
- Abstract summary: This paper builds upon the work of Pfau (2013), which generalized the bias variance tradeoff to any Bregman divergence loss function.
We show that, similarly to the label, the central prediction can be interpreted as the mean of a random variable, where the mean operates in a dual space defined by the loss function itself.
- Score: 13.006468721874372
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper builds upon the work of Pfau (2013), which generalized the bias
variance tradeoff to any Bregman divergence loss function. Pfau (2013) showed
that for Bregman divergences, the bias and variances are defined with respect
to a central label, defined as the mean of the label variable, and a central
prediction, of a more complex form. We show that, similarly to the label, the
central prediction can be interpreted as the mean of a random variable, where
the mean operates in a dual space defined by the loss function itself. Viewing
the bias-variance tradeoff through operations taken in dual space, we
subsequently derive several results of interest. In particular, (a) the
variance terms satisfy a generalized law of total variance; (b) if a source of
randomness cannot be controlled, its contribution to the bias and variance has
a closed form; (c) there exist natural ensembling operations in the label and
prediction spaces which reduce the variance and do not affect the bias.
Related papers
- Unbiased Estimating Equation on Inverse Divergence and Its Conditions [0.10742675209112622]
This paper focuses on the Bregman divergence defined by the reciprocal function, called the inverse divergence.
For the loss function defined by the monotonically increasing function $f$ and inverse divergence, the conditions for the statistical model and function $f$ under which the estimating equation is unbiased are clarified.
arXiv Detail & Related papers (2024-04-25T11:22:48Z) - TIC-TAC: A Framework for Improved Covariance Estimation in Deep Heteroscedastic Regression [109.69084997173196]
Deepscedastic regression involves jointly optimizing the mean and covariance of the predicted distribution using the negative log-likelihood.
Recent works show that this may result in sub-optimal convergence due to the challenges associated with covariance estimation.
We study two questions: (1) Does the predicted covariance truly capture the randomness of the predicted mean?
Our results show that not only does TIC accurately learn the covariance, it additionally facilitates an improved convergence of the negative log-likelihood.
arXiv Detail & Related papers (2023-10-29T09:54:03Z) - It's an Alignment, Not a Trade-off: Revisiting Bias and Variance in Deep
Models [51.66015254740692]
We show that for an ensemble of deep learning based classification models, bias and variance are emphaligned at a sample level.
We study this phenomenon from two theoretical perspectives: calibration and neural collapse.
arXiv Detail & Related papers (2023-10-13T17:06:34Z) - On the Strong Correlation Between Model Invariance and Generalization [54.812786542023325]
Generalization captures a model's ability to classify unseen data.
Invariance measures consistency of model predictions on transformations of the data.
From a dataset-centric view, we find a certain model's accuracy and invariance linearly correlated on different test sets.
arXiv Detail & Related papers (2022-07-14T17:08:25Z) - Predicting with Confidence on Unseen Distributions [90.68414180153897]
We connect domain adaptation and predictive uncertainty literature to predict model accuracy on challenging unseen distributions.
We find that the difference of confidences (DoC) of a classifier's predictions successfully estimates the classifier's performance change over a variety of shifts.
We specifically investigate the distinction between synthetic and natural distribution shifts and observe that despite its simplicity DoC consistently outperforms other quantifications of distributional difference.
arXiv Detail & Related papers (2021-07-07T15:50:18Z) - Invariance Principle Meets Information Bottleneck for
Out-of-Distribution Generalization [77.24152933825238]
We show that for linear classification tasks we need stronger restrictions on the distribution shifts, or otherwise OOD generalization is impossible.
We prove that a form of the information bottleneck constraint along with invariance helps address key failures when invariant features capture all the information about the label and also retains the existing success when they do not.
arXiv Detail & Related papers (2021-06-11T20:42:27Z) - Understanding Generalization in Adversarial Training via the
Bias-Variance Decomposition [39.108491135488286]
We decompose the test risk into its bias and variance components.
We find that the bias increases monotonically with perturbation size and is the dominant term in the risk.
We show that popular explanations for the generalization gap instead predict the variance to be monotonic.
arXiv Detail & Related papers (2021-03-17T23:30:00Z) - Latent Causal Invariant Model [128.7508609492542]
Current supervised learning can learn spurious correlation during the data-fitting process.
We propose a Latent Causal Invariance Model (LaCIM) which pursues causal prediction.
arXiv Detail & Related papers (2020-11-04T10:00:27Z) - Unbiased Estimation Equation under $f$-Separable Bregman Distortion
Measures [0.3553493344868413]
We discuss unbiased estimation equations in a class of objective function using a monotonically increasing function $f$ and Bregman divergence.
The choice of the function $f$ gives desirable properties such as robustness against outliers.
In this study, we clarify the combination of Bregman divergence, statistical model, and function $f$ in which the bias correction term vanishes.
arXiv Detail & Related papers (2020-10-23T10:33:55Z) - On lower bounds for the bias-variance trade-off [0.0]
It is a common phenomenon that for high-dimensional statistical models, rate-optimal estimators balance squared bias and variance.
We propose a general strategy to obtain lower bounds on the variance of any estimator with bias smaller than a prespecified bound.
This shows to which extent the bias-variance trade-off is unavoidable and allows to quantify the loss of performance for methods that do not obey it.
arXiv Detail & Related papers (2020-05-30T14:07:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.