Related papers: Spectral Gradient Descent Mitigates Anisotropy-Driven Misalignment: A Case Study in Phase Retrieval

Spectral Gradient Descent Mitigates Anisotropy-Driven Misalignment: A Case Study in Phase Retrieval

URL: http://arxiv.org/abs/2601.22652v1
Date: Fri, 30 Jan 2026 07:12:58 GMT
Title: Spectral Gradient Descent Mitigates Anisotropy-Driven Misalignment: A Case Study in Phase Retrieval
Authors: Guillaume Braun, Han Bao, Wei Huang, Masaaki Imaizumi,
Abstract summary: Spectral gradient methods modify gradient updates by preserving directional information while discarding scale.<n>We investigate the mechanisms underlying these gains through a dynamical analysis of a nonlinear phase retrieval model.
Score: 13.218607858857295
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Spectral gradient methods, such as the Muon optimizer, modify gradient updates by preserving directional information while discarding scale, and have shown strong empirical performance in deep learning. We investigate the mechanisms underlying these gains through a dynamical analysis of a nonlinear phase retrieval model with anisotropic Gaussian inputs, equivalent to training a two-layer neural network with the quadratic activation and fixed second-layer weights. Focusing on a spiked covariance setting where the dominant variance direction is orthogonal to the signal, we show that gradient descent (GD) suffers from a variance-induced misalignment: during the early escaping stage, the high-variance but uninformative spike direction is multiplicatively amplified, degrading alignment with the true signal under strong anisotropy. In contrast, spectral gradient descent (SpecGD) removes this spike amplification effect, leading to stable alignment and accelerated noise contraction. Numerical experiments confirm the theory and show that these phenomena persist under broader anisotropic covariances.

Related papers

On the Convergence of Stochastic Gradient Descent with Perturbed Forward-Backward Passes [15.63629978994481]
We present the first comprehensive theoretical analysis of this gradient cascade setting.<n>We identify conditions under which perturbations do not deteriorate the gradient convergence order.
arXiv Detail & Related papers (2026-02-24T07:47:15Z)
Laplacian-LoRA: Delaying Oversmoothing in Deep GCNs via Spectral Low-Rank Adaptation [0.0]
We propose Laplacian-LoRA, a low-rank adaptation of standard graph convolutional networks (GCNs)<n>Rather than redesigning message passing, Laplacian-LoRA introduces a learnable, spectrally anchored correction to the fixed Laplacian propagation operator.<n>We show that Laplacian-LoRA consistently delays the onset of oversmoothing, extending the effective depth of GCNs by up to a factor of two.
arXiv Detail & Related papers (2026-02-07T00:03:19Z)
Fast Escape, Slow Convergence: Learning Dynamics of Phase Retrieval under Power-Law Data [15.766916122461923]
Scaling laws describe how learning performance improves with data, compute, or training time, and have become a central theme in modern deep learning.<n>We study this phenomenon in a canonical nonlinear model: phase retrieval with anisotropic Gaussian inputs whose covariance spectrum follows a power law.<n>Unlike the isotropic case, where dynamics collapse to a two-dimensional system, anisotropy yields a qualitatively new regime in which an infinite hierarchy of equations governs the evolution of the summary statistics.
arXiv Detail & Related papers (2025-11-24T00:21:17Z)
Revisiting Zeroth-Order Optimization: Minimum-Variance Two-Point Estimators and Directionally Aligned Perturbations [57.179679246370114]
We identify the distribution of random perturbations that minimizes the estimator's variance as the perturbation stepsize tends to zero.<n>Our findings reveal that such desired perturbations can align directionally with the true gradient, instead of maintaining a fixed length.
arXiv Detail & Related papers (2025-10-22T19:06:39Z)
TAG:Tangential Amplifying Guidance for Hallucination-Resistant Diffusion Sampling [53.61290359948953]
Tangential Amplifying Guidance (TAG) operates solely on trajectory signals without modifying the underlying diffusion model.<n>We formalize this guidance process by leveraging a first-order Taylor expansion.<n> TAG is a plug-and-play, architecture-agnostic module that improves diffusion sampling fidelity with minimal computational addition.
arXiv Detail & Related papers (2025-10-06T06:53:29Z)
Generative Model Inversion Through the Lens of the Manifold Hypothesis [98.37040155914595]
Model inversion attacks (MIAs) aim to reconstruct class-representative samples from trained models.<n>Recent generative MIAs utilize generative adversarial networks to learn image priors that guide the inversion process.
arXiv Detail & Related papers (2025-09-24T14:39:25Z)
Kernel-Smoothed Scores for Denoising Diffusion: A Bias-Variance Study [3.265950484493743]
Diffusion models can be prone to memorization.<n>Regularization on the score has the same effect as increasing the size of the training dataset.<n>This perspective highlights two regularization mechanisms taking place in denoising diffusions.
arXiv Detail & Related papers (2025-05-28T20:22:18Z)
Gradient Normalization Provably Benefits Nonconvex SGD under Heavy-Tailed Noise [60.92029979853314]
We investigate the roles of gradient normalization and clipping in ensuring the convergence of Gradient Descent (SGD) under heavy-tailed noise. Our work provides the first theoretical evidence demonstrating the benefits of gradient normalization in SGD under heavy-tailed noise. We introduce an accelerated SGD variant incorporating gradient normalization and clipping, further enhancing convergence rates under heavy-tailed noise.
arXiv Detail & Related papers (2024-10-21T22:40:42Z)
Learning in PINNs: Phase transition, total diffusion, and generalization [1.8802875123957965]
We investigate the learning dynamics of fully-connected neural networks through the lens of gradient signal-to-noise ratio (SNR) We identify a third phase termed total diffusion" We explore the information-induced compression phenomenon, pinpointing a significant compression of activations at the total diffusion phase.
arXiv Detail & Related papers (2024-03-27T12:10:30Z)
Adaptive Federated Learning Over the Air [108.62635460744109]
We propose a federated version of adaptive gradient methods, particularly AdaGrad and Adam, within the framework of over-the-air model training. Our analysis shows that the AdaGrad-based training algorithm converges to a stationary point at the rate of $mathcalO( ln(T) / T 1 - frac1alpha ).
arXiv Detail & Related papers (2024-03-11T09:10:37Z)
Towards Training Without Depth Limits: Batch Normalization Without Gradient Explosion [83.90492831583997]
We show that a batch-normalized network can keep the optimal signal propagation properties, but avoid exploding gradients in depth. We use a Multi-Layer Perceptron (MLP) with linear activations and batch-normalization that provably has bounded depth. We also design an activation shaping scheme that empirically achieves the same properties for certain non-linear activations.
arXiv Detail & Related papers (2023-10-03T12:35:02Z)
Gradient-Based Feature Learning under Structured Data [57.76552698981579]
In the anisotropic setting, the commonly used spherical gradient dynamics may fail to recover the true direction. We show that appropriate weight normalization that is reminiscent of batch normalization can alleviate this issue. In particular, under the spiked model with a suitably large spike, the sample complexity of gradient-based training can be made independent of the information exponent.
arXiv Detail & Related papers (2023-09-07T16:55:50Z)
On regularization of gradient descent, layer imbalance and flat minima [9.08659783613403]
We analyze the training dynamics for deep linear networks using a new metric - imbalance - which defines the flatness of a solution. We demonstrate that different regularization methods, such as weight decay or noise data augmentation, behave in a similar way.
arXiv Detail & Related papers (2020-07-18T00:09:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.