Related papers: The Malignant Tail: Spectral Segregation of Label Noise in Over-Parameterized Networks

The Malignant Tail: Spectral Segregation of Label Noise in Over-Parameterized Networks

URL: http://arxiv.org/abs/2603.02293v1
Date: Mon, 02 Mar 2026 16:39:42 GMT
Title: The Malignant Tail: Spectral Segregation of Label Noise in Over-Parameterized Networks
Authors: Zice Wang,
Abstract summary: We experimentally isolate the Malignant Tail, a failure mode where networks functionally segregate signal and noise.<n>We show that untrained networks actively segregate noise, allowing post-hoc Explicit Spectral Truncation to surgically prune the noise-dominated subspace.<n>Our findings suggest that under label noise, excess spectral capacity is not harmless redundancy but a latent structural liability.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: While implicit regularization facilitates benign overfitting in low-noise regimes, recent theoretical work predicts a sharp phase transition to harmful overfitting as the noise-to-signal ratio increases. We experimentally isolate the geometric mechanism of this transition: the Malignant Tail, a failure mode where networks functionally segregate signal and noise, reducing coherent semantic features into low-rank subspaces while pushing stochastic label noise into high-frequency orthogonal components, distinct from systematic or corruption-aligned noise. Through a Spectral Linear Probe of training dynamics, we demonstrate that Stochastic Gradient Descent (SGD) fails to suppress this noise, instead implicitly biasing it toward high-frequency orthogonal subspaces, effectively preserving signal-noise separability. We show that this geometric separation is distinct from simple variance reduction in untrained models. In trained networks, SGD actively segregates noise, allowing post-hoc Explicit Spectral Truncation (d << D) to surgically prune the noise-dominated subspace. This approach recovers the optimal generalization capability latent in the converged model. Unlike unstable temporal early stopping, Geometric Truncation provides a stable post-hoc intervention. Our findings suggest that under label noise, excess spectral capacity is not harmless redundancy but a latent structural liability that allows for noise memorization, necessitating explicit rank constraints to filter stochastic corruptions for robust generalization.

Related papers

Stabilizing Diffusion Posterior Sampling by Noise--Frequency Continuation [52.736416985173776]
At high noise, data-consistency gradients computed from inaccurate estimates can be geometrically incongruent with the posterior geometry.<n>We propose a noise--frequency Continuation framework that constructs a continuous family of intermediate posteriors whose likelihood enforces measurement consistency only within a noise-dependent frequency band.<n>Our method achieves state-of-the-art performance and improves motion deblurring PSNR by up to 5 dB over strong baselines.
arXiv Detail & Related papers (2026-01-30T03:14:01Z)
PRISM: Deriving the Transformer as a Signal-Denoising Operator via Maximum Coding Rate Reduction [0.0]
We propose Prism, a white-box attention-based architecture for deep learning.<n>We show that Prism spontaneously specializes its attention heads into spectrally distinct regimes.<n>Our results suggest that interpretability and performance are not a trade-off, but can be unified through principled construction.
arXiv Detail & Related papers (2026-01-21T23:52:36Z)
Noise-Adaptive Regularization for Robust Multi-Label Remote Sensing Image Classification [5.658568324275769]
We propose NAR, a noise-adaptive regularization method that distinguishes between additive and subtractive noise.<n> NAR consistently improves robustness compared with existing methods.<n>Performance improvements are most pronounced under subtractive and mixed noise.
arXiv Detail & Related papers (2026-01-13T11:16:45Z)
The Homogeneity Trap: Spectral Collapse in Doubly-Stochastic Deep Networks [1.7523718031184992]
We identify a critical spectral degradation phenomenon inherent to structure-preserving deep architectures.<n>We show that maximum-entropy bias drives the mixing operator towards the uniform barycenter, suppressing the subdominant singular value .<n>We derive a spectral bound linking to the network's effective depth, showing that high-entropy constraints restrict feature transformation to a shallow receptive field.
arXiv Detail & Related papers (2026-01-05T13:09:42Z)
Mitigating the Noise Shift for Denoising Generative Models via Noise Awareness Guidance [54.88271057438763]
Noise Awareness Guidance (NAG) is a correction method that explicitly steers sampling trajectories to remain consistent with the pre-defined noise schedule.<n>NAG consistently mitigates noise shift and substantially improves the generation quality of mainstream diffusion models.
arXiv Detail & Related papers (2025-10-14T13:31:34Z)
Noise Balance and Stationary Distribution of Stochastic Gradient Descent [10.621129623557884]
We show that the minibatch noise of SGD regularizes the solution towards a noise-balanced solution whenever the loss function contains a rescaling parameter symmetry.<n>Because the difference between a simple diffusion process and SGD dynamics is the most significant when symmetries are present, our theory implies that the loss function symmetries constitute an essential probe of how SGD works.<n>We then apply this result to derive the stationary distribution of gradient flow for a diagonal linear network with arbitrary depth and width.
arXiv Detail & Related papers (2023-08-13T03:13:03Z)
DiffusionAD: Norm-guided One-step Denoising Diffusion for Anomaly Detection [80.20339155618612]
DiffusionAD is a novel anomaly detection pipeline comprising a reconstruction sub-network and a segmentation sub-network.<n>A rapid one-step denoising paradigm achieves hundreds of times acceleration while preserving comparable reconstruction quality.<n>Considering the diversity in the manifestation of anomalies, we propose a norm-guided paradigm to integrate the benefits of multiple noise scales.
arXiv Detail & Related papers (2023-03-15T16:14:06Z)
Latent Class-Conditional Noise Model [54.56899309997246]
We introduce a Latent Class-Conditional Noise model (LCCN) to parameterize the noise transition under a Bayesian framework. We then deduce a dynamic label regression method for LCCN, whose Gibbs sampler allows us efficiently infer the latent true labels. Our approach safeguards the stable update of the noise transition, which avoids previous arbitrarily tuning from a mini-batch of samples.
arXiv Detail & Related papers (2023-02-19T15:24:37Z)
High-Order Qubit Dephasing at Sweet Spots by Non-Gaussian Fluctuators: Symmetry Breaking and Floquet Protection [55.41644538483948]
We study the qubit dephasing caused by the non-Gaussian fluctuators. We predict a symmetry-breaking effect that is unique to the non-Gaussian noise.
arXiv Detail & Related papers (2022-06-06T18:02:38Z)
Resonance in Weight Space: Covariate Shift Can Drive Divergence of SGD with Momentum [26.25434025410027]
Existing work has shown that SGDm with a decaying step-size can converge under Markovian temporal correlation. In this work, we show that SGDm under covariate shift with a fixed step-size can be unstable and diverge. We approximate the learning system as a time varying system of ordinary differential equations, and leverage existing theory to characterize the system's divergence/convergence as resonant/nonresonant modes.
arXiv Detail & Related papers (2022-03-22T18:38:13Z)
Shape Matters: Understanding the Implicit Bias of the Noise Covariance [76.54300276636982]
Noise in gradient descent provides a crucial implicit regularization effect for training over parameterized models. We show that parameter-dependent noise -- induced by mini-batches or label perturbation -- is far more effective than Gaussian noise. Our analysis reveals that parameter-dependent noise introduces a bias towards local minima with smaller noise variance, whereas spherical Gaussian noise does not.
arXiv Detail & Related papers (2020-06-15T18:31:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.