Related papers: Noise Stability of Transformer Models

Noise Stability of Transformer Models

URL: http://arxiv.org/abs/2602.08287v1
Date: Mon, 09 Feb 2026 05:43:22 GMT
Title: Noise Stability of Transformer Models
Authors: Themistoklis Haris, Zihan Zhang, Yuichi Yoshida,
Abstract summary: We argue that average sensitivity lacks a natural generalization to real-valued domains.<n>Noise stability expresses a model's robustness to correlated noise applied to coordinates simultaneously.<n>Our results sculpt a new connection between signal propagation in neural networks and interpretability.
Score: 28.608164171197483
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Understanding simplicity biases in deep learning offers a promising path toward developing reliable AI. A common metric for this, inspired by Boolean function analysis, is average sensitivity, which captures a model's robustness to single-token perturbations. We argue that average sensitivity has two key limitations: it lacks a natural generalization to real-valued domains and fails to explain the "junta-like" input dependence we empirically observe in modern LLMs. To address these limitations, we propose noise stability as a more comprehensive simplicity metric. Noise stability expresses a model's robustness to correlated noise applied to all input coordinates simultaneously. We provide a theoretical analysis of noise stability for single-layer attention and ReLU MLP layers and tackle the multi-layer propagation problem with a covariance interval propagation approach. Building on this theory, we develop a practical noise stability regularization method. Experiments on algorithmic and next-token-prediction tasks show that our regularizer consistently catalyzes grokking and accelerates training by approximately $35\%$ and $75\%$ respectively. Our results sculpt a new connection between signal propagation in neural networks and interpretability, with noise stability emerging as a powerful tool for understanding and improving modern Transformers.

Related papers

Test-time Adaptive Hierarchical Co-enhanced Denoising Network for Reliable Multimodal Classification [55.56234913868664]
We propose Test-time Adaptive Hierarchical Co-enhanced Denoising Network (TAHCD) for reliable learning on multimodal data.<n>The proposed method achieves superior classification performance, robustness, and generalization compared with state-of-the-art reliable multimodal learning approaches.
arXiv Detail & Related papers (2026-01-12T03:14:12Z)
Drift No More? Context Equilibria in Multi-Turn LLM Interactions [58.69551510148673]
contexts drift is the gradual divergence of a model's outputs from goal-consistent behavior across turns.<n>Unlike single-turn errors, drift unfolds temporally and is poorly captured by static evaluation metrics.<n>We show that multi-turn drift can be understood as a controllable equilibrium phenomenon rather than as inevitable decay.
arXiv Detail & Related papers (2025-10-09T04:48:49Z)
Embedding-Aware Noise Modeling of Quantum Annealing [3.446273281167618]
We develop a framework that connects embedding-induced overhead with hardware noise in D-Wave's Zephyr topology.<n>Our findings establish a general embedding-aware noise framework that explains the trade-off between chain stability and logical coupler fidelity.
arXiv Detail & Related papers (2025-10-06T08:48:25Z)
Towards Robust Image Denoising with Scale Equivariance [10.894808298340994]
We argue that incorporating scale-equivariant structures enables models to better adapt from training on spatially uniform noise to inference on spatially non-uniform degradations.<n>We propose a robust blind denoising framework equipped with two key components: a Heterogeneous Normalization Module (HNM) and an Interactive Gating Module (IGM)
arXiv Detail & Related papers (2025-08-05T00:06:28Z)
Noise Augmented Fine Tuning for Mitigating Hallucinations in Large Language Models [1.0579965347526206]
Large language models (LLMs) often produce inaccurate or misleading content-hallucinations.<n>Noise-Augmented Fine-Tuning (NoiseFiT) is a novel framework that leverages adaptive noise injection to enhance model robustness.<n>NoiseFiT selectively perturbs layers identified as either high-SNR (more robust) or low-SNR (potentially under-regularized) using a dynamically scaled Gaussian noise.
arXiv Detail & Related papers (2025-04-04T09:27:19Z)
Stable Neighbor Denoising for Source-free Domain Adaptive Segmentation [91.83820250747935]
Pseudo-label noise is mainly contained in unstable samples in which predictions of most pixels undergo significant variations during self-training. We introduce the Stable Neighbor Denoising (SND) approach, which effectively discovers highly correlated stable and unstable samples. SND consistently outperforms state-of-the-art methods in various SFUDA semantic segmentation settings.
arXiv Detail & Related papers (2024-06-10T21:44:52Z)
Improve Noise Tolerance of Robust Loss via Noise-Awareness [60.34670515595074]
We propose a meta-learning method which is capable of adaptively learning a hyper parameter prediction function, called Noise-Aware-Robust-Loss-Adjuster (NARL-Adjuster for brevity) Four SOTA robust loss functions are attempted to be integrated with our algorithm, and comprehensive experiments substantiate the general availability and effectiveness of the proposed method in both its noise tolerance and performance.
arXiv Detail & Related papers (2023-01-18T04:54:58Z)
Non-Singular Adversarial Robustness of Neural Networks [58.731070632586594]
Adrial robustness has become an emerging challenge for neural network owing to its over-sensitivity to small input perturbations. We formalize the notion of non-singular adversarial robustness for neural networks through the lens of joint perturbations to data inputs as well as model weights.
arXiv Detail & Related papers (2021-02-23T20:59:30Z)
Noisy Recurrent Neural Networks [45.94390701863504]
We study recurrent neural networks (RNNs) trained by injecting noise into hidden states as discretizations of differential equations driven by input data. We find that, under reasonable assumptions, this implicit regularization promotes flatter minima; it biases towards models with more stable dynamics; and, in classification tasks, it favors models with larger classification margin. Our theory is supported by empirical results which demonstrate improved robustness with respect to various input perturbations, while maintaining state-of-the-art performance.
arXiv Detail & Related papers (2021-02-09T15:20:50Z)
Neural Control Variates [71.42768823631918]
We show that a set of neural networks can face the challenge of finding a good approximation of the integrand. We derive a theoretically optimal, variance-minimizing loss function, and propose an alternative, composite loss for stable online training in practice. Specifically, we show that the learned light-field approximation is of sufficient quality for high-order bounces, allowing us to omit the error correction and thereby dramatically reduce the noise at the cost of negligible visible bias.
arXiv Detail & Related papers (2020-06-02T11:17:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.