A Researcher's Guide to Empirical Risk Minimization
- URL: http://arxiv.org/abs/2602.21501v2
- Date: Tue, 03 Mar 2026 16:28:45 GMT
- Title: A Researcher's Guide to Empirical Risk Minimization
- Authors: Lars van der Laan,
- Abstract summary: This guide provides a reference for high-probability regret bounds in empirical risk minimization.<n>We begin with intuition and general proof strategies, then state broadly applicable guarantees under high-level conditions.
- Score: 3.891921282474929
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This guide provides a reference for high-probability regret bounds in empirical risk minimization (ERM). The presentation is modular: we begin with intuition and general proof strategies, then state broadly applicable guarantees under high-level conditions and provide tools for verifying them for specific losses and function classes. We emphasize that many ERM rate derivations can be organized around a three-step recipe -- a basic inequality, a uniform local concentration bound, and a fixed-point argument -- which yields regret bounds in terms of a critical radius, defined via localized Rademacher complexity, under a mild Bernstein-type variance-risk condition. To make these bounds concrete, we upper bound the critical radius using local maximal inequalities and metric-entropy integrals, thereby recovering familiar rates for VC-subgraph, Sobolev/Hölder, and bounded-variation classes. We also study ERM with nuisance components -- including weighted ERM and Neyman-orthogonal losses -- as they arise in causal inference, missing data, and domain adaptation. Following the orthogonal statistical learning framework, we highlight that these problems often admit regret-transfer bounds linking regret under an estimated loss to population regret under the target loss. These bounds typically decompose the regret into (i) statistical error under the estimated loss and (ii) approximation error due to nuisance estimation. Under sample splitting or cross-fitting, the first term can be controlled using standard fixed-loss ERM regret bounds, while the second depends only on nuisance-estimation accuracy. As a novel contribution, we also treat the in-sample regime, in which the nuisances and the ERM are fit on the same data, deriving regret bounds and showing that fast oracle rates remain attainable under suitable smoothness and Donsker-type conditions.
Related papers
- On the Generalization and Robustness in Conditional Value-at-Risk [12.253712889424584]
We develop a learning-theoretic analysis of Conditional Value-at-Risk (CVaR)-based empirical risk minimization under heavy-tailed and contaminated data.<n>We establish sharp, high-probability generalization and excess risk bounds under minimal moment assumptions.<n>We show that CVaR decisions themselves can be intrinsically unstable under heavy tails.
arXiv Detail & Related papers (2026-02-20T08:10:11Z) - Learning bounds for doubly-robust covariate shift adaptation [8.24901041136559]
Distribution shift between the training domain and the test domain poses a key challenge for machine learning.<n> doubly-robust (DR) estimator combines density ratio estimation with a pilot regression model.<n>This paper establishes the first non-asymptotic learning bounds for the DR estimator.
arXiv Detail & Related papers (2025-11-14T06:46:23Z) - Decision from Suboptimal Classifiers: Excess Risk Pre- and Post-Calibration [52.70324949884702]
We quantify the excess risk incurred using approximate posterior probabilities in batch binary decision-making.<n>We identify regimes where recalibration alone addresses most of the regret, and regimes where the regret is dominated by the grouping loss.<n>On NLP experiments, we show that these quantities identify when the expected gain of more advanced post-training is worth the operational cost.
arXiv Detail & Related papers (2025-03-23T10:52:36Z) - Reweighting Improves Conditional Risk Bounds [12.944919903533957]
We show that under a general balanceable" Bernstein condition, one can design a weighted ERM estimator to achieve superior performance in certain sub-regions.<n>Our findings are supported by evidence from synthetic data experiments.
arXiv Detail & Related papers (2025-01-04T18:16:21Z) - Efficient Transfer Learning via Causal Bounds [8.981637739384674]
We analyze how causal side-information accelerates online learning, and experiments on data reduction.<n>Our analysis precisely characterizes when how causal side-information accelerates online learning, and experiments on data reduction.
arXiv Detail & Related papers (2023-08-07T13:24:50Z) - On the Variance, Admissibility, and Stability of Empirical Risk Minimization [57.63331017830154]
Empirical Risk Minimization (ERM) may attain minimax suboptimal rates in terms of the mean squared error.<n>We prove that under relatively mild assumptions, the suboptimality of ERM must be due to its large bias.
arXiv Detail & Related papers (2023-05-29T15:25:48Z) - The Decaying Missing-at-Random Framework: Model Doubly Robust Causal Inference with Partially Labeled Data [8.916614661563893]
We introduce a missing-at-random (decaying MAR) framework and associated approaches for doubly robust causal inference.<n>This simultaneously addresses selection bias in the labeling mechanism and the extreme imbalance between labeled and unlabeled groups.<n>To ensure robust causal conclusions, we propose a bias-reduced SS estimator for the average treatment effect.
arXiv Detail & Related papers (2023-05-22T07:37:12Z) - Risk Minimization from Adaptively Collected Data: Guarantees for
Supervised and Policy Learning [57.88785630755165]
Empirical risk minimization (ERM) is the workhorse of machine learning, but its model-agnostic guarantees can fail when we use adaptively collected data.
We study a generic importance sampling weighted ERM algorithm for using adaptively collected data to minimize the average of a loss function over a hypothesis class.
For policy learning, we provide rate-optimal regret guarantees that close an open gap in the existing literature whenever exploration decays to zero.
arXiv Detail & Related papers (2021-06-03T09:50:13Z) - On Lower Bounds for Standard and Robust Gaussian Process Bandit
Optimization [55.937424268654645]
We consider algorithm-independent lower bounds for the problem of black-box optimization of functions having a bounded norm.
We provide a novel proof technique for deriving lower bounds on the regret, with benefits including simplicity, versatility, and an improved dependence on the error probability.
arXiv Detail & Related papers (2020-08-20T03:48:14Z) - Unbiased Risk Estimators Can Mislead: A Case Study of Learning with
Complementary Labels [92.98756432746482]
We study a weakly supervised problem called learning with complementary labels.
We show that the quality of gradient estimation matters more in risk minimization.
We propose a novel surrogate complementary loss(SCL) framework that trades zero bias with reduced variance.
arXiv Detail & Related papers (2020-07-05T04:19:37Z) - GenDICE: Generalized Offline Estimation of Stationary Values [108.17309783125398]
We show that effective estimation can still be achieved in important applications.
Our approach is based on estimating a ratio that corrects for the discrepancy between the stationary and empirical distributions.
The resulting algorithm, GenDICE, is straightforward and effective.
arXiv Detail & Related papers (2020-02-21T00:27:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.