A Generalized Bias-Variance Decomposition for Bregman Divergences
- URL: http://arxiv.org/abs/2511.08789v1
- Date: Thu, 13 Nov 2025 01:07:54 GMT
- Title: A Generalized Bias-Variance Decomposition for Bregman Divergences
- Authors: David Pfau,
- Abstract summary: We present a generalization of the bias-variance decomposition where the prediction error is a Bregman divergence.<n>While the result is already known, there was not previously a clear, standalone derivation, so we provide one for pedagogical purposes.
- Score: 1.9483779836489397
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The bias-variance decomposition is a central result in statistics and machine learning, but is typically presented only for the squared error. We present a generalization of the bias-variance decomposition where the prediction error is a Bregman divergence, which is relevant to maximum likelihood estimation with exponential families. While the result is already known, there was not previously a clear, standalone derivation, so we provide one for pedagogical purposes. A version of this note previously appeared on the author's personal website without context. Here we provide additional discussion and references to the relevant prior literature.
Related papers
- Variational Prediction [95.00085314353436]
We present a technique for learning a variational approximation to the posterior predictive distribution using a variational bound.
This approach can provide good predictive distributions without test time marginalization costs.
arXiv Detail & Related papers (2023-07-14T18:19:31Z) - A Statistical Model for Predicting Generalization in Few-Shot
Classification [6.158812834002346]
We introduce a Gaussian model of the feature distribution to predict the generalization error.
We show that our approach outperforms alternatives such as the leave-one-out cross-validation strategy.
arXiv Detail & Related papers (2022-12-13T10:21:15Z) - Instance-Dependent Generalization Bounds via Optimal Transport [51.71650746285469]
Existing generalization bounds fail to explain crucial factors that drive the generalization of modern neural networks.
We derive instance-dependent generalization bounds that depend on the local Lipschitz regularity of the learned prediction function in the data space.
We empirically analyze our generalization bounds for neural networks, showing that the bound values are meaningful and capture the effect of popular regularization methods during training.
arXiv Detail & Related papers (2022-11-02T16:39:42Z) - Uncertainty Estimates of Predictions via a General Bias-Variance
Decomposition [7.811916700683125]
We introduce a bias-variance decomposition for proper scores, giving rise to the Bregman Information as the variance term.
We showcase the practical relevance of this decomposition on several downstream tasks, including model ensembles and confidence regions.
arXiv Detail & Related papers (2022-10-21T21:24:37Z) - ER: Equivariance Regularizer for Knowledge Graph Completion [107.51609402963072]
We propose a new regularizer, namely, Equivariance Regularizer (ER)
ER can enhance the generalization ability of the model by employing the semantic equivariance between the head and tail entities.
The experimental results indicate a clear and substantial improvement over the state-of-the-art relation prediction methods.
arXiv Detail & Related papers (2022-06-24T08:18:05Z) - Ensembling over Classifiers: a Bias-Variance Perspective [13.006468721874372]
We build upon the extension to the bias-variance decomposition by Pfau (2013) in order to gain crucial insights into the behavior of ensembles of classifiers.
We show that conditional estimates necessarily incur an irreducible error.
Empirically, standard ensembling reducesthe bias, leading us to hypothesize that ensembles of classifiers may perform well in part because of this unexpected reduction.
arXiv Detail & Related papers (2022-06-21T17:46:35Z) - Understanding the bias-variance tradeoff of Bregman divergences [13.006468721874372]
This paper builds upon the work of Pfau (2013), which generalized the bias variance tradeoff to any Bregman divergence loss function.
We show that, similarly to the label, the central prediction can be interpreted as the mean of a random variable, where the mean operates in a dual space defined by the loss function itself.
arXiv Detail & Related papers (2022-02-08T22:06:16Z) - Which Invariance Should We Transfer? A Causal Minimax Learning Approach [18.71316951734806]
We present a comprehensive minimax analysis from a causal perspective.
We propose an efficient algorithm to search for the subset with minimal worst-case risk.
The effectiveness and efficiency of our methods are demonstrated on synthetic data and the diagnosis of Alzheimer's disease.
arXiv Detail & Related papers (2021-07-05T09:07:29Z) - Invariance Principle Meets Information Bottleneck for
Out-of-Distribution Generalization [77.24152933825238]
We show that for linear classification tasks we need stronger restrictions on the distribution shifts, or otherwise OOD generalization is impossible.
We prove that a form of the information bottleneck constraint along with invariance helps address key failures when invariant features capture all the information about the label and also retains the existing success when they do not.
arXiv Detail & Related papers (2021-06-11T20:42:27Z) - Benign Overfitting of Constant-Stepsize SGD for Linear Regression [122.70478935214128]
inductive biases are central in preventing overfitting empirically.
This work considers this issue in arguably the most basic setting: constant-stepsize SGD for linear regression.
We reflect on a number of notable differences between the algorithmic regularization afforded by (unregularized) SGD in comparison to ordinary least squares.
arXiv Detail & Related papers (2021-03-23T17:15:53Z) - Unbiased Estimation Equation under $f$-Separable Bregman Distortion
Measures [0.3553493344868413]
We discuss unbiased estimation equations in a class of objective function using a monotonically increasing function $f$ and Bregman divergence.
The choice of the function $f$ gives desirable properties such as robustness against outliers.
In this study, we clarify the combination of Bregman divergence, statistical model, and function $f$ in which the bias correction term vanishes.
arXiv Detail & Related papers (2020-10-23T10:33:55Z) - Mitigating Gender Bias Amplification in Distribution by Posterior
Regularization [75.3529537096899]
We investigate the gender bias amplification issue from the distribution perspective.
We propose a bias mitigation approach based on posterior regularization.
Our study sheds the light on understanding the bias amplification.
arXiv Detail & Related papers (2020-05-13T11:07:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.