Related papers: Gradient Descent as a Shrinkage Operator for Spectral Bias

Gradient Descent as a Shrinkage Operator for Spectral Bias

URL: http://arxiv.org/abs/2504.18207v1
Date: Fri, 25 Apr 2025 09:36:17 GMT
Title: Gradient Descent as a Shrinkage Operator for Spectral Bias
Authors: Simon Lucey,
Abstract summary: gradient descent (GD) can be reinterpreted as a shrinkage operator that masks the singular values of a neural network's Jacobian.<n>We show how GD implicitly selects the number of frequency components to retain, thereby controlling the spectral bias.
Score: 38.24632044926936
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We generalize the connection between activation function and spline regression/smoothing and characterize how this choice may influence spectral bias within a 1D shallow network. We then demonstrate how gradient descent (GD) can be reinterpreted as a shrinkage operator that masks the singular values of a neural network's Jacobian. Viewed this way, GD implicitly selects the number of frequency components to retain, thereby controlling the spectral bias. An explicit relationship is proposed between the choice of GD hyperparameters (learning rate & number of iterations) and bandwidth (the number of active components). GD regularization is shown to be effective only with monotonic activation functions. Finally, we highlight the utility of non-monotonic activation functions (sinc, Gaussian) as iteration-efficient surrogates for spectral bias.

Related papers

Sinusoidal Approximation Theorem for Kolmogorov-Arnold Networks [0.0]
Kolmogorov-Arnold Networks (KANs) have been recently proposed as an alternative to multilayer perceptrons.<n>We propose a novel KAN variant by replacing both the inner and outer functions in the Kolmogorov-Arnold representation with weighted sinusoidal functions of learnable frequencies.<n>Inspired by simplifications introduced by Lorentz and Sprecher, we fix the phases of the sinusoidal activations to linearly spaced constant values and provide a proof of its theoretical validity.
arXiv Detail & Related papers (2025-08-01T01:16:09Z)
The Spectral Bias of Shallow Neural Network Learning is Shaped by the Choice of Non-linearity [0.7499722271664144]
We study how non-linear activation functions contribute to shaping neural networks' implicit bias.<n>We show that local dynamical attractors facilitate the formation of clusters of hyperplanes where the input to a neuron's activation function is zero.
arXiv Detail & Related papers (2025-03-13T17:36:46Z)
Making Sense Of Distributed Representations With Activation Spectroscopy [44.94093096989921]
There is growing evidence to suggest that relevant features are encoded across many neurons in a distributed fashion. This work explores one feasible path to both detecting and tracing the joint influence of neurons in a distributed representation.
arXiv Detail & Related papers (2025-01-26T07:33:42Z)
Point-Calibrated Spectral Neural Operators [54.13671100638092]
We introduce Point-Calibrated Spectral Transform, which learns operator mappings by approximating functions with the point-level adaptive spectral basis. Point-Calibrated Spectral Neural Operators learn operator mappings by approximating functions with the point-level adaptive spectral basis.
arXiv Detail & Related papers (2024-10-15T08:19:39Z)
A Non-negative VAE:the Generalized Gamma Belief Network [49.970917207211556]
The gamma belief network (GBN) has demonstrated its potential for uncovering multi-layer interpretable latent representations in text data. We introduce the generalized gamma belief network (Generalized GBN) in this paper, which extends the original linear generative model to a more expressive non-linear generative model. We also propose an upward-downward Weibull inference network to approximate the posterior distribution of the latent variables.
arXiv Detail & Related papers (2024-08-06T18:18:37Z)
FINER++: Building a Family of Variable-periodic Functions for Activating Implicit Neural Representation [39.116375158815515]
Implicit Neural Representation (INR) is causing a revolution in the field of signal processing. INR techniques suffer from the "frequency"-specified spectral bias and capacity-convergence gap. We propose the FINER++ framework by extending existing periodic/non-periodic activation functions to variable-periodic ones.
arXiv Detail & Related papers (2024-07-28T09:24:57Z)
Understanding the Spectral Bias of Coordinate Based MLPs Via Training Dynamics [2.9443230571766854]
We study the connection between the computations of ReLU networks, and the speed of gradient descent convergence. We then use this formulation to study the severity of spectral bias in low dimensional settings, and how positional encoding overcomes this.
arXiv Detail & Related papers (2023-01-14T04:21:25Z)
Spectral Feature Augmentation for Graph Contrastive Learning and Beyond [64.78221638149276]
We present a novel spectral feature argumentation for contrastive learning on graphs (and images) For each data view, we estimate a low-rank approximation per feature map and subtract that approximation from the map to obtain its complement. This is achieved by the proposed herein incomplete power iteration, a non-standard power regime which enjoys two valuable byproducts (under mere one or two iterations) Experiments on graph/image datasets show that our spectral feature augmentation outperforms baselines.
arXiv Detail & Related papers (2022-12-02T08:48:11Z)
On the Activation Function Dependence of the Spectral Bias of Neural Networks [0.0]
We study the phenomenon from the point of view of the spectral bias of neural networks. We provide a theoretical explanation for the spectral bias of ReLU neural networks by leveraging connections with the theory of finite element methods. We show that neural networks with the Hat activation function are trained significantly faster using gradient descent and ADAM.
arXiv Detail & Related papers (2022-08-09T17:40:57Z)
Momentum Diminishes the Effect of Spectral Bias in Physics-Informed Neural Networks [72.09574528342732]
Physics-informed neural network (PINN) algorithms have shown promising results in solving a wide range of problems involving partial differential equations (PDEs) They often fail to converge to desirable solutions when the target function contains high-frequency features, due to a phenomenon known as spectral bias. In the present work, we exploit neural tangent kernels (NTKs) to investigate the training dynamics of PINNs evolving under gradient descent with momentum (SGDM)
arXiv Detail & Related papers (2022-06-29T19:03:10Z)
Implicit Bias of MSE Gradient Optimization in Underparameterized Neural Networks [0.0]
We study the dynamics of a neural network in function space when optimizing the mean squared error via gradient flow. We show that the network learns eigenfunctions of an integral operator $T_Kinfty$ determined by the Neural Tangent Kernel (NTK) We conclude that damped deviations offers a simple and unifying perspective of the dynamics when optimizing the squared error.
arXiv Detail & Related papers (2022-01-12T23:28:41Z)
Asymmetric Heavy Tails and Implicit Bias in Gaussian Noise Injections [73.95786440318369]
We focus on the so-called implicit effect' of GNIs, which is the effect of the injected noise on the dynamics of gradient descent (SGD) We show that this effect induces an asymmetric heavy-tailed noise on gradient updates. We then formally prove that GNIs induce an implicit bias', which varies depending on the heaviness of the tails and the level of asymmetry.
arXiv Detail & Related papers (2021-02-13T21:28:09Z)
When Does Preconditioning Help or Hurt Generalization? [74.25170084614098]
We show how the textitimplicit bias of first and second order methods affects the comparison of generalization properties. We discuss several approaches to manage the bias-variance tradeoff, and the potential benefit of interpolating between GD and NGD.
arXiv Detail & Related papers (2020-06-18T17:57:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.