Related papers: Understanding the Spectral Bias of Coordinate Based MLPs Via Training Dynamics

Understanding the Spectral Bias of Coordinate Based MLPs Via Training Dynamics

URL: http://arxiv.org/abs/2301.05816v4
Date: Thu, 4 May 2023 01:46:24 GMT
Title: Understanding the Spectral Bias of Coordinate Based MLPs Via Training Dynamics
Authors: John Lazzari, Xiuwen Liu
Abstract summary: We study the connection between the computations of ReLU networks, and the speed of gradient descent convergence. We then use this formulation to study the severity of spectral bias in low dimensional settings, and how positional encoding overcomes this.
Score: 2.9443230571766854
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Spectral bias is an important observation of neural network training, stating that the network will learn a low frequency representation of the target function before converging to higher frequency components. This property is interesting due to its link to good generalization in over-parameterized networks. However, in low dimensional settings, a severe spectral bias occurs that obstructs convergence to high frequency components entirely. In order to overcome this limitation, one can encode the inputs using a high frequency sinusoidal encoding. Previous works attempted to explain this phenomenon using Neural Tangent Kernel (NTK) and Fourier analysis. However, NTK does not capture real network dynamics, and Fourier analysis only offers a global perspective on the network properties that induce this bias. In this paper, we provide a novel approach towards understanding spectral bias by directly studying ReLU MLP training dynamics. Specifically, we focus on the connection between the computations of ReLU networks (activation regions), and the speed of gradient descent convergence. We study these dynamics in relation to the spatial information of the signal to understand how they influence spectral bias. We then use this formulation to study the severity of spectral bias in low dimensional settings, and how positional encoding overcomes this.

Related papers

LOGLO-FNO: Efficient Learning of Local and Global Features in Fourier Neural Operators [20.77877474840923]
High-frequency information is a critical challenge in machine learning. Deep neural nets exhibit the so-called spectral bias toward learning low-frequency components. We propose a novel frequency-sensitive loss term based on radially binned spectral errors.
arXiv Detail & Related papers (2025-04-05T19:35:04Z)
On the study of frequency control and spectral bias in Wavelet-Based Kolmogorov Arnold networks: A path to physics-informed KANs [0.35998666903987897]
Spectral bias, the tendency of neural networks to prioritize learning low-frequency components of functions during the initial training stages, poses a significant challenge when approximating solutions with high-frequency details. We analyze the eigenvalues of the neural tangent kernel (NTK) of Wavelet Kolmogorov Arnold Networks (Wav-KANs) to enhance their ability to converge on high-frequency components.
arXiv Detail & Related papers (2025-02-01T02:35:12Z)
A Scalable Walsh-Hadamard Regularizer to Overcome the Low-degree Spectral Bias of Neural Networks [79.28094304325116]
Despite the capacity of neural nets to learn arbitrary functions, models trained through gradient descent often exhibit a bias towards simpler'' functions. We show how this spectral bias towards low-degree frequencies can in fact hurt the neural network's generalization on real-world datasets. We propose a new scalable functional regularization scheme that aids the neural network to learn higher degree frequencies.
arXiv Detail & Related papers (2023-05-16T20:06:01Z)
Neural networks trained with SGD learn distributions of increasing complexity [78.30235086565388]
We show that neural networks trained using gradient descent initially classify their inputs using lower-order input statistics. We then exploit higher-order statistics only later during training. We discuss the relation of DSB to other simplicity biases and consider its implications for the principle of universality in learning.
arXiv Detail & Related papers (2022-11-21T15:27:22Z)
Momentum Diminishes the Effect of Spectral Bias in Physics-Informed Neural Networks [72.09574528342732]
Physics-informed neural network (PINN) algorithms have shown promising results in solving a wide range of problems involving partial differential equations (PDEs) They often fail to converge to desirable solutions when the target function contains high-frequency features, due to a phenomenon known as spectral bias. In the present work, we exploit neural tangent kernels (NTKs) to investigate the training dynamics of PINNs evolving under gradient descent with momentum (SGDM)
arXiv Detail & Related papers (2022-06-29T19:03:10Z)
Overcoming the Spectral Bias of Neural Value Approximation [17.546011419043644]
Value approximation using deep neural networks is often the primary module that provides learning signals to the rest of the algorithm. Recent works in neural kernel regression suggest the presence of a spectral bias, where fitting high-frequency components of the value function requires exponentially more gradient update steps than the low-frequency ones. We re-examine off-policy reinforcement learning through the lens of kernel regression and propose to overcome such bias via a composite neural kernel.
arXiv Detail & Related papers (2022-06-09T17:59:57Z)
The Spectral Bias of Polynomial Neural Networks [63.27903166253743]
Polynomial neural networks (PNNs) have been shown to be particularly effective at image generation and face recognition, where high-frequency information is critical. Previous studies have revealed that neural networks demonstrate a $textitspectral bias$ towards low-frequency functions, which yields faster learning of low-frequency components during training. Inspired by such studies, we conduct a spectral analysis of the Tangent Kernel (NTK) of PNNs. We find that the $Pi$-Net family, i.e., a recently proposed parametrization of PNNs, speeds up the
arXiv Detail & Related papers (2022-02-27T23:12:43Z)
Spectral Complexity-scaled Generalization Bound of Complex-valued Neural Networks [78.64167379726163]
This paper is the first work that proves a generalization bound for the complex-valued neural network. We conduct experiments by training complex-valued convolutional neural networks on different datasets.
arXiv Detail & Related papers (2021-12-07T03:25:25Z)
Understanding Layer-wise Contributions in Deep Neural Networks through Spectral Analysis [6.0158981171030685]
We analyze the layer-wise spectral bias of Deep Neural Networks and relate it to the contributions of different layers in the reduction of error for a given target function. We provide empirical results validating our theory in high dimensional datasets for Deep Neural Networks.
arXiv Detail & Related papers (2021-11-06T22:49:46Z)
Spectral Bias in Practice: The Role of Function Frequency in Generalization [10.7218588164913]
We propose methodologies for measuring spectral bias in modern image classification networks. We find that networks that generalize well strike a balance between having enough complexity to fit the data while being simple enough to avoid overfitting. Our work enables measuring and ultimately controlling the spectral behavior of neural networks used for image classification.
arXiv Detail & Related papers (2021-10-06T00:16:10Z)
Fourier Features Let Networks Learn High Frequency Functions in Low Dimensional Domains [69.62456877209304]
We show that passing input points through a simple Fourier feature mapping enables a multilayer perceptron to learn high-frequency functions. Results shed light on advances in computer vision and graphics that achieve state-of-the-art results.
arXiv Detail & Related papers (2020-06-18T17:59:11Z)
Frequency Bias in Neural Networks for Input of Non-Uniform Density [27.75835200173761]
We use the Neural Tangent Kernel (NTK) model to explore the effect of variable density on training dynamics. Our results show convergence at a point $x in Sphered-1$ occurs in time $O(kappad/p(x))$ where $p(x)$ denotes the local density at $x$.
arXiv Detail & Related papers (2020-03-10T07:20:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.