Information-theoretic Generalization Analysis for Expected Calibration Error
- URL: http://arxiv.org/abs/2405.15709v1
- Date: Fri, 24 May 2024 16:59:29 GMT
- Title: Information-theoretic Generalization Analysis for Expected Calibration Error
- Authors: Futoshi Futami, Masahiro Fujisawa,
- Abstract summary: We present the first comprehensive analysis of the estimation bias in the two common binning strategies, uniform mass and uniform width binning.
Our bounds reveal, for the first time, the optimal number of bins to minimize the estimation bias.
We extend our bias analysis to generalization error analysis based on the information-theoretic approach.
- Score: 4.005483185111992
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: While the expected calibration error (ECE), which employs binning, is widely adopted to evaluate the calibration performance of machine learning models, theoretical understanding of its estimation bias is limited. In this paper, we present the first comprehensive analysis of the estimation bias in the two common binning strategies, uniform mass and uniform width binning. Our analysis establishes upper bounds on the bias, achieving an improved convergence rate. Moreover, our bounds reveal, for the first time, the optimal number of bins to minimize the estimation bias. We further extend our bias analysis to generalization error analysis based on the information-theoretic approach, deriving upper bounds that enable the numerical evaluation of how small the ECE is for unknown data. Experiments using deep learning models show that our bounds are nonvacuous thanks to this information-theoretic generalization analysis approach.
Related papers
- PAC-Bayes Analysis for Recalibration in Classification [4.005483185111992]
We conduct a generalization analysis of the calibration error under the probably approximately correct (PAC) Bayes framework.
We then propose a generalization-aware recalibration algorithm based on our generalization theory.
arXiv Detail & Related papers (2024-06-10T12:53:13Z) - A Statistical Theory of Regularization-Based Continual Learning [10.899175512941053]
We provide a statistical analysis of regularization-based continual learning on a sequence of linear regression tasks.
We first derive the convergence rate for the oracle estimator obtained as if all data were available simultaneously.
A byproduct of our theoretical analysis is the equivalence between early stopping and generalized $ell$-regularization.
arXiv Detail & Related papers (2024-06-10T12:25:13Z) - Overcoming Saturation in Density Ratio Estimation by Iterated Regularization [11.244546184962996]
We show that a class of kernel methods for density ratio estimation suffers from error saturation.
We introduce iterated regularization in density ratio estimation to achieve fast error rates.
arXiv Detail & Related papers (2024-02-21T16:02:14Z) - Generalized Simultaneous Perturbation-based Gradient Search with Reduced
Estimator Bias [7.372983005764439]
We present a family of generalized simultaneous perturbation-based search (GSPGS) estimators that use noisy function measurements.
The number of function measurements required by each estimator is guided by the desired level of accuracy.
arXiv Detail & Related papers (2022-12-20T17:50:36Z) - Vector-Valued Least-Squares Regression under Output Regularity
Assumptions [73.99064151691597]
We propose and analyse a reduced-rank method for solving least-squares regression problems with infinite dimensional output.
We derive learning bounds for our method, and study under which setting statistical performance is improved in comparison to full-rank method.
arXiv Detail & Related papers (2022-11-16T15:07:00Z) - Instance-Dependent Generalization Bounds via Optimal Transport [51.71650746285469]
Existing generalization bounds fail to explain crucial factors that drive the generalization of modern neural networks.
We derive instance-dependent generalization bounds that depend on the local Lipschitz regularity of the learned prediction function in the data space.
We empirically analyze our generalization bounds for neural networks, showing that the bound values are meaningful and capture the effect of popular regularization methods during training.
arXiv Detail & Related papers (2022-11-02T16:39:42Z) - Learning to Estimate Without Bias [57.82628598276623]
Gauss theorem states that the weighted least squares estimator is a linear minimum variance unbiased estimation (MVUE) in linear models.
In this paper, we take a first step towards extending this result to non linear settings via deep learning with bias constraints.
A second motivation to BCE is in applications where multiple estimates of the same unknown are averaged for improved performance.
arXiv Detail & Related papers (2021-10-24T10:23:51Z) - Information-Theoretic Generalization Bounds for Iterative
Semi-Supervised Learning [81.1071978288003]
In particular, we seek to understand the behaviour of the em generalization error of iterative SSL algorithms using information-theoretic principles.
Our theoretical results suggest that when the class conditional variances are not too large, the upper bound on the generalization error decreases monotonically with the number of iterations, but quickly saturates.
arXiv Detail & Related papers (2021-10-03T05:38:49Z) - Expected Validation Performance and Estimation of a Random Variable's
Maximum [48.83713377993604]
We analyze three statistical estimators for expected validation performance.
We find the unbiased estimator has the highest variance, and the estimator with the smallest variance has the largest bias.
We find that the two biased estimators lead to the fewest incorrect conclusions.
arXiv Detail & Related papers (2021-10-01T18:48:47Z) - Generalization Bounds For Meta-Learning: An Information-Theoretic
Analysis [8.028776552383365]
We propose a generic understanding of both the conventional learning-to-learn framework and the modern model-agnostic meta-learning algorithms.
We provide a data-dependent generalization bound for a variant of MAML, which is non-vacuous for deep few-shot learning.
arXiv Detail & Related papers (2021-09-29T17:45:54Z) - Good Classifiers are Abundant in the Interpolating Regime [64.72044662855612]
We develop a methodology to compute precisely the full distribution of test errors among interpolating classifiers.
We find that test errors tend to concentrate around a small typical value $varepsilon*$, which deviates substantially from the test error of worst-case interpolating model.
Our results show that the usual style of analysis in statistical learning theory may not be fine-grained enough to capture the good generalization performance observed in practice.
arXiv Detail & Related papers (2020-06-22T21:12:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.