Asymptotic Theory of Iterated Empirical Risk Minimization, with Applications to Active Learning
- URL: http://arxiv.org/abs/2601.23031v1
- Date: Fri, 30 Jan 2026 14:39:51 GMT
- Title: Asymptotic Theory of Iterated Empirical Risk Minimization, with Applications to Active Learning
- Authors: Hugo Cui, Yue M. Lu,
- Abstract summary: We study a class of iterated empirical risk (ERM) procedures in which two successive ERMs are performed on the same dataset.<n>For linear models trained with a broad class of convex losses on Gaussian mixture data, we derive a sharp characterization of the test error.<n>We uncover a fundamental tradeoff in how the labeling budget should be allocated across stages, and demonstrate a double-descent behavior of the test error driven purely by data selection.
- Score: 15.858234832499585
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We study a class of iterated empirical risk minimization (ERM) procedures in which two successive ERMs are performed on the same dataset, and the predictions of the first estimator enter as an argument in the loss function of the second. This setting, which arises naturally in active learning and reweighting schemes, introduces intricate statistical dependencies across samples and fundamentally distinguishes the problem from classical single-stage ERM analyses. For linear models trained with a broad class of convex losses on Gaussian mixture data, we derive a sharp asymptotic characterization of the test error in the high-dimensional regime where the sample size and ambient dimension scale proportionally. Our results provide explicit, fully asymptotic predictions for the performance of the second-stage estimator despite the reuse of data and the presence of prediction-dependent losses. We apply this theory to revisit a well-studied pool-based active learning problem, removing oracle and sample-splitting assumptions made in prior work. We uncover a fundamental tradeoff in how the labeling budget should be allocated across stages, and demonstrate a double-descent behavior of the test error driven purely by data selection, rather than model size or sample count.
Related papers
- Conformalized Regression for Continuous Bounded Outcomes [0.0]
Regression problems with bounded continuous outcomes frequently arise in real-world statistical and machine learning applications.<n>Most of the existing statistical and machine learning literature has focused either on point prediction of bounded outcomes or on interval prediction based on approximations.<n>We develop conformal prediction intervals for bounded outcomes based on transformation models and beta regression.
arXiv Detail & Related papers (2025-07-18T15:51:48Z) - Improved Sample Complexity For Diffusion Model Training Without Empirical Risk Minimizer Access [47.96419637803502]
We present a principled theoretical framework analyzing diffusion models, providing a state-of-the-art sample complexity bound to $widetildemathcalO(epsilon-4)$.<n>Our structured decomposition of the score estimation error into statistical and optimization components offers critical insights into how diffusion models can be trained efficiently.
arXiv Detail & Related papers (2025-05-23T20:02:15Z) - Pre-validation Revisited [79.92204034170092]
We show properties and benefits of pre-validation in prediction, inference and error estimation by simulations and applications.<n>We propose not only an analytical distribution of the test statistic for the pre-validated predictor under certain models, but also a generic bootstrap procedure to conduct inference.
arXiv Detail & Related papers (2025-05-21T00:20:14Z) - Risk and cross validation in ridge regression with correlated samples [72.59731158970894]
We provide training examples for the in- and out-of-sample risks of ridge regression when the data points have arbitrary correlations.<n>We demonstrate that in this setting, the generalized cross validation estimator (GCV) fails to correctly predict the out-of-sample risk.<n>We further extend our analysis to the case where the test point has nontrivial correlations with the training set, a setting often encountered in time series forecasting.
arXiv Detail & Related papers (2024-08-08T17:27:29Z) - On semi-supervised estimation using exponential tilt mixture models [12.347498345854715]
Consider a semi-supervised setting with a labeled dataset of binary responses and predictors and an unlabeled dataset with only predictors.
For semi-supervised estimation, we develop further analysis and understanding of a statistical approach using exponential tilt mixture (ETM) models.
arXiv Detail & Related papers (2023-11-14T19:53:26Z) - Selective Nonparametric Regression via Testing [54.20569354303575]
We develop an abstention procedure via testing the hypothesis on the value of the conditional variance at a given point.
Unlike existing methods, the proposed one allows to account not only for the value of the variance itself but also for the uncertainty of the corresponding variance predictor.
arXiv Detail & Related papers (2023-09-28T13:04:11Z) - Coordinated Double Machine Learning [8.808993671472349]
This paper argues that a carefully coordinated learning algorithm for deep neural networks may reduce the estimation bias.
The improved empirical performance of the proposed method is demonstrated through numerical experiments on both simulated and real data.
arXiv Detail & Related papers (2022-06-02T05:56:21Z) - Mitigating multiple descents: A model-agnostic framework for risk
monotonization [84.6382406922369]
We develop a general framework for risk monotonization based on cross-validation.
We propose two data-driven methodologies, namely zero- and one-step, that are akin to bagging and boosting.
arXiv Detail & Related papers (2022-05-25T17:41:40Z) - Optimal regularizations for data generation with probabilistic graphical
models [0.0]
Empirically, well-chosen regularization schemes dramatically improve the quality of the inferred models.
We consider the particular case of L 2 and L 1 regularizations in the Maximum A Posteriori (MAP) inference of generative pairwise graphical models.
arXiv Detail & Related papers (2021-12-02T14:45:16Z) - Efficient Estimation and Evaluation of Prediction Rules in
Semi-Supervised Settings under Stratified Sampling [6.930951733450623]
We propose a two-step semi-supervised learning (SSL) procedure for evaluating a prediction rule derived from a working binary regression model.
In step I, we impute the missing labels via weighted regression with nonlinear basis functions to account for nonrandom sampling.
In step II, we augment the initial imputations to ensure the consistency of the resulting estimators.
arXiv Detail & Related papers (2020-10-19T12:54:45Z) - Good Classifiers are Abundant in the Interpolating Regime [64.72044662855612]
We develop a methodology to compute precisely the full distribution of test errors among interpolating classifiers.
We find that test errors tend to concentrate around a small typical value $varepsilon*$, which deviates substantially from the test error of worst-case interpolating model.
Our results show that the usual style of analysis in statistical learning theory may not be fine-grained enough to capture the good generalization performance observed in practice.
arXiv Detail & Related papers (2020-06-22T21:12:31Z) - Asymptotic Analysis of an Ensemble of Randomly Projected Linear
Discriminants [94.46276668068327]
In [1], an ensemble of randomly projected linear discriminants is used to classify datasets.
We develop a consistent estimator of the misclassification probability as an alternative to the computationally-costly cross-validation estimator.
We also demonstrate the use of our estimator for tuning the projection dimension on both real and synthetic data.
arXiv Detail & Related papers (2020-04-17T12:47:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.