On lower bounds for the bias-variance trade-off
- URL: http://arxiv.org/abs/2006.00278v4
- Date: Mon, 20 Mar 2023 09:04:47 GMT
- Title: On lower bounds for the bias-variance trade-off
- Authors: Alexis Derumigny and Johannes Schmidt-Hieber
- Abstract summary: It is a common phenomenon that for high-dimensional statistical models, rate-optimal estimators balance squared bias and variance.
We propose a general strategy to obtain lower bounds on the variance of any estimator with bias smaller than a prespecified bound.
This shows to which extent the bias-variance trade-off is unavoidable and allows to quantify the loss of performance for methods that do not obey it.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: It is a common phenomenon that for high-dimensional and nonparametric
statistical models, rate-optimal estimators balance squared bias and variance.
Although this balancing is widely observed, little is known whether methods
exist that could avoid the trade-off between bias and variance. We propose a
general strategy to obtain lower bounds on the variance of any estimator with
bias smaller than a prespecified bound. This shows to which extent the
bias-variance trade-off is unavoidable and allows to quantify the loss of
performance for methods that do not obey it. The approach is based on a number
of abstract lower bounds for the variance involving the change of expectation
with respect to different probability measures as well as information measures
such as the Kullback-Leibler or $\chi^2$-divergence. In a second part of the
article, the abstract lower bounds are applied to several statistical models
including the Gaussian white noise model, a boundary estimation problem, the
Gaussian sequence model and the high-dimensional linear regression model. For
these specific statistical applications, different types of bias-variance
trade-offs occur that vary considerably in their strength. For the trade-off
between integrated squared bias and integrated variance in the Gaussian white
noise model, we propose to combine the general strategy for lower bounds with a
reduction technique. This allows us to reduce the original problem to a lower
bound on the bias-variance trade-off for estimators with additional symmetry
properties in a simpler statistical model. In the Gaussian sequence model,
different phase transitions of the bias-variance trade-off occur. Although
there is a non-trivial interplay between bias and variance, the rate of the
squared bias and the variance do not have to be balanced in order to achieve
the minimax estimation rate.
Related papers
- Multivariate root-n-consistent smoothing parameter free matching estimators and estimators of inverse density weighted expectations [51.000851088730684]
We develop novel modifications of nearest-neighbor and matching estimators which converge at the parametric $sqrt n $-rate.
We stress that our estimators do not involve nonparametric function estimators and in particular do not rely on sample-size dependent parameters smoothing.
arXiv Detail & Related papers (2024-07-11T13:28:34Z) - It's an Alignment, Not a Trade-off: Revisiting Bias and Variance in Deep
Models [51.66015254740692]
We show that for an ensemble of deep learning based classification models, bias and variance are emphaligned at a sample level.
We study this phenomenon from two theoretical perspectives: calibration and neural collapse.
arXiv Detail & Related papers (2023-10-13T17:06:34Z) - Statistical Estimation Under Distribution Shift: Wasserstein
Perturbations and Minimax Theory [24.540342159350015]
We focus on Wasserstein distribution shifts, where every data point may undergo a slight perturbation.
We consider perturbations that are either independent or coordinated joint shifts across data points.
We analyze several important statistical problems, including location estimation, linear regression, and non-parametric density estimation.
arXiv Detail & Related papers (2023-08-03T16:19:40Z) - On the Strong Correlation Between Model Invariance and Generalization [54.812786542023325]
Generalization captures a model's ability to classify unseen data.
Invariance measures consistency of model predictions on transformations of the data.
From a dataset-centric view, we find a certain model's accuracy and invariance linearly correlated on different test sets.
arXiv Detail & Related papers (2022-07-14T17:08:25Z) - Equivariance Discovery by Learned Parameter-Sharing [153.41877129746223]
We study how to discover interpretable equivariances from data.
Specifically, we formulate this discovery process as an optimization problem over a model's parameter-sharing schemes.
Also, we theoretically analyze the method for Gaussian data and provide a bound on the mean squared gap between the studied discovery scheme and the oracle scheme.
arXiv Detail & Related papers (2022-04-07T17:59:19Z) - Learning to Estimate Without Bias [57.82628598276623]
Gauss theorem states that the weighted least squares estimator is a linear minimum variance unbiased estimation (MVUE) in linear models.
In this paper, we take a first step towards extending this result to non linear settings via deep learning with bias constraints.
A second motivation to BCE is in applications where multiple estimates of the same unknown are averaged for improved performance.
arXiv Detail & Related papers (2021-10-24T10:23:51Z) - Uncertainty Principles in Risk-Aware Statistical Estimation [4.721069729610892]
We present a new uncertainty principle for risk-aware statistical estimation.
It effectively quantifying the inherent trade-off between mean squared error ($mse$) and risk.
arXiv Detail & Related papers (2021-04-29T12:06:53Z) - Reducing the Variance of Variational Estimates of Mutual Information by
Limiting the Critic's Hypothesis Space to RKHS [0.0]
Mutual information (MI) is an information-theoretic measure of dependency between two random variables.
Recent methods realize parametric probability distributions or critic as a neural network to approximate unknown density ratios.
We argue that the high variance characteristic is due to the uncontrolled complexity of the critic's hypothesis space.
arXiv Detail & Related papers (2020-11-17T14:32:48Z) - Memorizing without overfitting: Bias, variance, and interpolation in
over-parameterized models [0.0]
The bias-variance trade-off is a central concept in supervised learning.
Modern Deep Learning methods flout this dogma, achieving state-of-the-art performance.
arXiv Detail & Related papers (2020-10-26T22:31:04Z) - Rao-Blackwellizing the Straight-Through Gumbel-Softmax Gradient
Estimator [93.05919133288161]
We show that the variance of the straight-through variant of the popular Gumbel-Softmax estimator can be reduced through Rao-Blackwellization.
This provably reduces the mean squared error.
We empirically demonstrate that this leads to variance reduction, faster convergence, and generally improved performance in two unsupervised latent variable models.
arXiv Detail & Related papers (2020-10-09T22:54:38Z) - Rethinking Bias-Variance Trade-off for Generalization of Neural Networks [40.04927952870877]
We provide a simple explanation for this by measuring the bias and variance of neural networks.
We find that variance unimodality occurs robustly for all models we considered.
Deeper models decrease bias and increase variance for both in-distribution and out-of-distribution data.
arXiv Detail & Related papers (2020-02-26T07:21:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.