Related papers: On the Variance, Admissibility, and Stability of Empirical Risk Minimization

On the Variance, Admissibility, and Stability of Empirical Risk Minimization

URL: http://arxiv.org/abs/2305.18508v2
Date: Sun, 02 Nov 2025 02:23:40 GMT
Title: On the Variance, Admissibility, and Stability of Empirical Risk Minimization
Authors: Gil Kur, Eli Putterman, Alexander Rakhlin,
Abstract summary: Empirical Risk Minimization (ERM) may attain minimax suboptimal rates in terms of the mean squared error.<n>We prove that under relatively mild assumptions, the suboptimality of ERM must be due to its large bias.
Score: 57.63331017830154
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: It is well known that Empirical Risk Minimization (ERM) may attain minimax suboptimal rates in terms of the mean squared error (Birg\'e and Massart, 1993). In this paper, we prove that, under relatively mild assumptions, the suboptimality of ERM must be due to its large bias. Namely, the variance error term of ERM is bounded by the minimax rate. In the fixed design setting, we provide an elementary proof of this result using the probabilistic method. Then, we extend our proof to the random design setting for various models. In addition, we provide a simple proof of Chatterjee's admissibility theorem (Chatterjee, 2014, Theorem 1.4), which states that in the fixed design setting, ERM cannot be ruled out as an optimal method, and then we extend this result to the random design setting. We also show that our estimates imply the stability of ERM, complementing the main result of Caponnetto and Rakhlin (2006) for non-Donsker classes. Finally, we highlight the somewhat irregular nature of the loss landscape of ERM in the non-Donsker regime, by showing that functions can be close to ERM, in terms of $L_2$ distance, while still being far from almost-minimizers of the empirical loss.

Related papers

A Researcher's Guide to Empirical Risk Minimization [3.891921282474929]
This guide provides a reference for high-probability regret bounds in empirical risk minimization.<n>We begin with intuition and general proof strategies, then state broadly applicable guarantees under high-level conditions.
arXiv Detail & Related papers (2026-02-25T02:26:23Z)
Causal Inference as Distribution Adaptation: Optimizing ATE Risk under Propensity Uncertainty [0.0]
We reframing ATE estimation as a textitdomain adaptation problem under distribution shift.<n>We propose the textbfJoint Robust Estimator (JRE) to train outcome models jointly.
arXiv Detail & Related papers (2025-12-19T21:40:46Z)
Reweighting Improves Conditional Risk Bounds [12.944919903533957]
We show that under a general balanceable" Bernstein condition, one can design a weighted ERM estimator to achieve superior performance in certain sub-regions. Our findings are supported by evidence from synthetic data experiments.
arXiv Detail & Related papers (2025-01-04T18:16:21Z)
Revisiting Essential and Nonessential Settings of Evidential Deep Learning [70.82728812001807]
Evidential Deep Learning (EDL) is an emerging method for uncertainty estimation. We propose Re-EDL, a simplified yet more effective variant of EDL.
arXiv Detail & Related papers (2024-10-01T04:27:07Z)
Nonparametric logistic regression with deep learning [1.2509746979383698]
In the nonparametric logistic regression, the Kullback-Leibler divergence could diverge easily. Instead of analyzing the excess risk itself, it suffices to show the consistency of the maximum likelihood estimator. As an important application, we derive the convergence rates of the NPMLE with deep neural networks.
arXiv Detail & Related papers (2024-01-23T04:31:49Z)
Efficient Stochastic Approximation of Minimax Excess Risk Optimization [36.68685001551774]
We develop efficient approximation approaches which directly target MERO. We demonstrate that the bias, caused by the estimation error of the minimal risk, is under-control. We also investigate a practical scenario where the quantity of samples drawn from each distribution may differ, and propose an approach that delivers distribution-dependent convergence rates.
arXiv Detail & Related papers (2023-05-31T02:21:11Z)
Robustness and Accuracy Could Be Reconcilable by (Proper) Definition [109.62614226793833]
The trade-off between robustness and accuracy has been widely studied in the adversarial literature. We find that it may stem from the improperly defined robust error, which imposes an inductive bias of local invariance. By definition, SCORE facilitates the reconciliation between robustness and accuracy, while still handling the worst-case uncertainty.
arXiv Detail & Related papers (2022-02-21T10:36:09Z)
Learning to Estimate Without Bias [57.82628598276623]
Gauss theorem states that the weighted least squares estimator is a linear minimum variance unbiased estimation (MVUE) in linear models. In this paper, we take a first step towards extending this result to non linear settings via deep learning with bias constraints. A second motivation to BCE is in applications where multiple estimates of the same unknown are averaged for improved performance.
arXiv Detail & Related papers (2021-10-24T10:23:51Z)
On the Minimal Error of Empirical Risk Minimization [90.09093901700754]
We study the minimal error of the Empirical Risk Minimization (ERM) procedure in the task of regression. Our sharp lower bounds shed light on the possibility (or impossibility) of adapting to simplicity of the model generating the data.
arXiv Detail & Related papers (2021-02-24T04:47:55Z)
Does Invariant Risk Minimization Capture Invariance? [23.399091822468407]
We show that the Invariant Risk Minimization (IRM) formulation of Arjovsky et al. can fail to capture "natural" invariances. This can lead to worse generalization on new environments.
arXiv Detail & Related papers (2021-01-04T18:02:45Z)
The Risks of Invariant Risk Minimization [52.7137956951533]
Invariant Risk Minimization is an objective based on the idea for learning deep, invariant features of data. We present the first analysis of classification under the IRM objective--as well as these recently proposed alternatives--under a fairly natural and general model. We show that IRM can fail catastrophically unless the test data are sufficiently similar to the training distribution--this is precisely the issue that it was intended to solve.
arXiv Detail & Related papers (2020-10-12T14:54:32Z)
Unbiased Risk Estimators Can Mislead: A Case Study of Learning with Complementary Labels [92.98756432746482]
We study a weakly supervised problem called learning with complementary labels. We show that the quality of gradient estimation matters more in risk minimization. We propose a novel surrogate complementary loss(SCL) framework that trades zero bias with reduced variance.
arXiv Detail & Related papers (2020-07-05T04:19:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.