Why Self-Training Helps and Hurts: Denoising vs. Signal Forgetting
- URL: http://arxiv.org/abs/2602.14029v1
- Date: Sun, 15 Feb 2026 07:28:12 GMT
- Title: Why Self-Training Helps and Hurts: Denoising vs. Signal Forgetting
- Authors: Mingqi Wu, Archer Y. Yang, Qiang Sun,
- Abstract summary: Iterative self-training repeatedly refits a model on pseudo-labels generated by its own predictions.<n>We derive deterministic-equivalent recursions for the prediction risk and effective noise across iterations.
- Score: 6.369253528507392
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Iterative self-training (self-distillation) repeatedly refits a model on pseudo-labels generated by its own predictions. We study this procedure in overparameterized linear regression: an initial estimator is trained on noisy labels, and each subsequent iterate is trained on fresh covariates with noiseless pseudo-labels from the previous model. In the high-dimensional regime, we derive deterministic-equivalent recursions for the prediction risk and effective noise across iterations, and prove that the empirical quantities concentrate sharply around these limits. The recursion separates two competing forces: a systematic component that grows with iteration due to progressive signal forgetting, and a stochastic component that decays due to denoising via repeated data-dependent projections. Their interaction yields a $U$-shaped test-risk curve and an optimal early-stopping time. In spiked covariance models, iteration further acts as an iteration-dependent spectral filter that preserves strong eigendirections while suppressing weaker ones, inducing an implicit form of soft feature selection distinct from ridge regression. Finally, we propose an iterated generalized cross-validation criterion and prove its uniform consistency for estimating the risk along the self-training trajectory, enabling fully data-driven selection of the stopping time and regularization. Experiments on synthetic covariances validate the theory and illustrate the predicted denoising-forgetting trade-off.
Related papers
- Combating Noisy Labels through Fostering Self- and Neighbor-Consistency [120.4394402099635]
Label noise is pervasive in various real-world scenarios, posing challenges in supervised deep learning.<n>We propose a noise-robust method named Jo-SNC (textbfJoint sample selection and model regularization based on textbfSelf- and textbfNeighbor-textbfConsistency)<n>We design a self-adaptive, data-driven thresholding scheme to adjust per-class selection thresholds.
arXiv Detail & Related papers (2026-01-19T07:55:29Z) - Self-diffusion for Solving Inverse Problems [3.8870795921263728]
We propose self-diffusion, a novel framework for solving inverse problems without relying on pretrained generative models.<n>Self-diffusion exploits the spectral bias of neural networks and modulates it through a scheduled noise process.<n>We demonstrate the effectiveness of our approach on a variety of linear inverse problems, showing that self-diffusion achieves competitive or superior performance compared to other methods.
arXiv Detail & Related papers (2025-10-24T12:57:22Z) - Self-Boost via Optimal Retraining: An Analysis via Approximate Message Passing [58.52119063742121]
Retraining a model using its own predictions together with the original, potentially noisy labels is a well-known strategy for improving the model performance.<n>This paper addresses the question of how to optimally combine the model's predictions and the provided labels.<n>Our main contribution is the derivation of the Bayes optimal aggregator function to combine the current model's predictions and the given labels.
arXiv Detail & Related papers (2025-05-21T07:16:44Z) - Memorization and Regularization in Generative Diffusion Models [5.128303432235475]
Diffusion models have emerged as a powerful framework for generative modeling.<n>The analysis highlights the need for regularization to avoid reproducing the analytically tractable minimizer.<n>Experiments are evaluated in the context of memorization, and directions for future development of regularization are highlighted.
arXiv Detail & Related papers (2025-01-27T05:17:06Z) - Fréchet regression with implicit denoising and multicollinearity reduction [1.5771347525430772]
Fr'echet regression extends linear regression to model complex responses in metric spaces.<n>We present an extension of the Global Fr'echet re gression model that enables explicit modeling of relationships between input variables and multiple responses.
arXiv Detail & Related papers (2024-12-24T08:02:28Z) - Permutation recovery of spikes in noisy high-dimensional tensor estimation [5.464669506214196]
We study the dynamics of flow in high dimensions for the multi-spiked tensor problem.<n>Our work builds on our companion paper [Ben A, Gerbelot, Langevin], which determines the sample separation conditions for the SNRs necessary for ensuring exact recovery.
arXiv Detail & Related papers (2024-12-19T08:59:49Z) - Risk and cross validation in ridge regression with correlated samples [72.59731158970894]
We provide training examples for the in- and out-of-sample risks of ridge regression when the data points have arbitrary correlations.<n>We demonstrate that in this setting, the generalized cross validation estimator (GCV) fails to correctly predict the out-of-sample risk.<n>We further extend our analysis to the case where the test point has nontrivial correlations with the training set, a setting often encountered in time series forecasting.
arXiv Detail & Related papers (2024-08-08T17:27:29Z) - Noisy Correspondence Learning with Self-Reinforcing Errors Mitigation [63.180725016463974]
Cross-modal retrieval relies on well-matched large-scale datasets that are laborious in practice.
We introduce a novel noisy correspondence learning framework, namely textbfSelf-textbfReinforcing textbfErrors textbfMitigation (SREM)
arXiv Detail & Related papers (2023-12-27T09:03:43Z) - Regularized Vector Quantization for Tokenized Image Synthesis [126.96880843754066]
Quantizing images into discrete representations has been a fundamental problem in unified generative modeling.
deterministic quantization suffers from severe codebook collapse and misalignment with inference stage while quantization suffers from low codebook utilization and reconstruction objective.
This paper presents a regularized vector quantization framework that allows to mitigate perturbed above issues effectively by applying regularization from two perspectives.
arXiv Detail & Related papers (2023-03-11T15:20:54Z) - Score-based Continuous-time Discrete Diffusion Models [102.65769839899315]
We extend diffusion models to discrete variables by introducing a Markov jump process where the reverse process denoises via a continuous-time Markov chain.
We show that an unbiased estimator can be obtained via simple matching the conditional marginal distributions.
We demonstrate the effectiveness of the proposed method on a set of synthetic and real-world music and image benchmarks.
arXiv Detail & Related papers (2022-11-30T05:33:29Z) - Detecting Label Noise via Leave-One-Out Cross Validation [0.0]
We present a simple algorithm for identifying and correcting real-valued noisy labels from a mixture of clean and corrupted samples.
A heteroscedastic noise model is employed, in which additive Gaussian noise terms with independent variances are associated with each and all of the observed labels.
We show that the presented method can pinpoint corrupted samples and lead to better regression models when trained on synthetic and real-world scientific data sets.
arXiv Detail & Related papers (2021-03-21T10:02:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.