A Scalable Walsh-Hadamard Regularizer to Overcome the Low-degree
Spectral Bias of Neural Networks
- URL: http://arxiv.org/abs/2305.09779v2
- Date: Sat, 10 Jun 2023 09:10:14 GMT
- Title: A Scalable Walsh-Hadamard Regularizer to Overcome the Low-degree
Spectral Bias of Neural Networks
- Authors: Ali Gorji, Andisheh Amrollahi, Andreas Krause
- Abstract summary: Despite the capacity of neural nets to learn arbitrary functions, models trained through gradient descent often exhibit a bias towards simpler'' functions.
We show how this spectral bias towards low-degree frequencies can in fact hurt the neural network's generalization on real-world datasets.
We propose a new scalable functional regularization scheme that aids the neural network to learn higher degree frequencies.
- Score: 79.28094304325116
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Despite the capacity of neural nets to learn arbitrary functions, models
trained through gradient descent often exhibit a bias towards ``simpler''
functions. Various notions of simplicity have been introduced to characterize
this behavior. Here, we focus on the case of neural networks with discrete
(zero-one), high-dimensional, inputs through the lens of their Fourier
(Walsh-Hadamard) transforms, where the notion of simplicity can be captured
through the degree of the Fourier coefficients. We empirically show that neural
networks have a tendency to learn lower-degree frequencies. We show how this
spectral bias towards low-degree frequencies can in fact hurt the neural
network's generalization on real-world datasets. To remedy this we propose a
new scalable functional regularization scheme that aids the neural network to
learn higher degree frequencies. Our regularizer also helps avoid erroneous
identification of low-degree frequencies, which further improves
generalization. We extensively evaluate our regularizer on synthetic datasets
to gain insights into its behavior. Finally, we show significantly improved
generalization on four different datasets compared to standard neural networks
and other relevant baselines.
Related papers
- Graph Neural Networks for Learning Equivariant Representations of Neural Networks [55.04145324152541]
We propose to represent neural networks as computational graphs of parameters.
Our approach enables a single model to encode neural computational graphs with diverse architectures.
We showcase the effectiveness of our method on a wide range of tasks, including classification and editing of implicit neural representations.
arXiv Detail & Related papers (2024-03-18T18:01:01Z) - Implicit Bias of Gradient Descent for Two-layer ReLU and Leaky ReLU
Networks on Nearly-orthogonal Data [66.1211659120882]
The implicit bias towards solutions with favorable properties is believed to be a key reason why neural networks trained by gradient-based optimization can generalize well.
While the implicit bias of gradient flow has been widely studied for homogeneous neural networks (including ReLU and leaky ReLU networks), the implicit bias of gradient descent is currently only understood for smooth neural networks.
arXiv Detail & Related papers (2023-10-29T08:47:48Z) - Fourier Sensitivity and Regularization of Computer Vision Models [11.79852671537969]
We study the frequency sensitivity characteristics of deep neural networks using a principled approach.
We find that computer vision models are consistently sensitive to particular frequencies dependent on the dataset, training method and architecture.
arXiv Detail & Related papers (2023-01-31T10:05:35Z) - Understanding the Spectral Bias of Coordinate Based MLPs Via Training
Dynamics [2.9443230571766854]
We study the connection between the computations of ReLU networks, and the speed of gradient descent convergence.
We then use this formulation to study the severity of spectral bias in low dimensional settings, and how positional encoding overcomes this.
arXiv Detail & Related papers (2023-01-14T04:21:25Z) - Neural networks trained with SGD learn distributions of increasing
complexity [78.30235086565388]
We show that neural networks trained using gradient descent initially classify their inputs using lower-order input statistics.
We then exploit higher-order statistics only later during training.
We discuss the relation of DSB to other simplicity biases and consider its implications for the principle of universality in learning.
arXiv Detail & Related papers (2022-11-21T15:27:22Z) - Understanding robustness and generalization of artificial neural
networks through Fourier masks [8.94889125739046]
Recent literature suggests that robust networks with good generalization properties tend to be biased towards processing low frequencies in images.
We develop an algorithm that allows us to learn modulatory masks highlighting the essential input frequencies needed for preserving a trained network's performance.
arXiv Detail & Related papers (2022-03-16T17:32:00Z) - Data-driven emergence of convolutional structure in neural networks [83.4920717252233]
We show how fully-connected neural networks solving a discrimination task can learn a convolutional structure directly from their inputs.
By carefully designing data models, we show that the emergence of this pattern is triggered by the non-Gaussian, higher-order local structure of the inputs.
arXiv Detail & Related papers (2022-02-01T17:11:13Z) - Spectral Bias in Practice: The Role of Function Frequency in
Generalization [10.7218588164913]
We propose methodologies for measuring spectral bias in modern image classification networks.
We find that networks that generalize well strike a balance between having enough complexity to fit the data while being simple enough to avoid overfitting.
Our work enables measuring and ultimately controlling the spectral behavior of neural networks used for image classification.
arXiv Detail & Related papers (2021-10-06T00:16:10Z) - Redundant representations help generalization in wide neural networks [71.38860635025907]
We study the last hidden layer representations of various state-of-the-art convolutional neural networks.
We find that if the last hidden representation is wide enough, its neurons tend to split into groups that carry identical information, and differ from each other only by statistically independent noise.
arXiv Detail & Related papers (2021-06-07T10:18:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.