Related papers: A Scalable Walsh-Hadamard Regularizer to Overcome the Low-degree Spectral Bias of Neural Networks

A Scalable Walsh-Hadamard Regularizer to Overcome the Low-degree Spectral Bias of Neural Networks

URL: http://arxiv.org/abs/2305.09779v2
Date: Sat, 10 Jun 2023 09:10:14 GMT
Title: A Scalable Walsh-Hadamard Regularizer to Overcome the Low-degree Spectral Bias of Neural Networks
Authors: Ali Gorji, Andisheh Amrollahi, Andreas Krause
Abstract summary: Despite the capacity of neural nets to learn arbitrary functions, models trained through gradient descent often exhibit a bias towards simpler'' functions. We show how this spectral bias towards low-degree frequencies can in fact hurt the neural network's generalization on real-world datasets. We propose a new scalable functional regularization scheme that aids the neural network to learn higher degree frequencies.
Score: 79.28094304325116
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Despite the capacity of neural nets to learn arbitrary functions, models trained through gradient descent often exhibit a bias towards ``simpler'' functions. Various notions of simplicity have been introduced to characterize this behavior. Here, we focus on the case of neural networks with discrete (zero-one), high-dimensional, inputs through the lens of their Fourier (Walsh-Hadamard) transforms, where the notion of simplicity can be captured through the degree of the Fourier coefficients. We empirically show that neural networks have a tendency to learn lower-degree frequencies. We show how this spectral bias towards low-degree frequencies can in fact hurt the neural network's generalization on real-world datasets. To remedy this we propose a new scalable functional regularization scheme that aids the neural network to learn higher degree frequencies. Our regularizer also helps avoid erroneous identification of low-degree frequencies, which further improves generalization. We extensively evaluate our regularizer on synthetic datasets to gain insights into its behavior. Finally, we show significantly improved generalization on four different datasets compared to standard neural networks and other relevant baselines.

Related papers

Stable Learning Using Spiking Neural Networks Equipped With Affine Encoders and Decoders [2.0072624123275533]
We study the learning problem associated with spiking neural networks. We focus on spiking neural networks composed of simple spiking neurons having only positive synaptic weights. In particular, we show in theory and simulations that affine spiking neural networks are capable of approximating shallow ReLU neural networks.
arXiv Detail & Related papers (2024-04-06T08:17:07Z)
Graph Neural Networks for Learning Equivariant Representations of Neural Networks [55.04145324152541]
We propose to represent neural networks as computational graphs of parameters. Our approach enables a single model to encode neural computational graphs with diverse architectures. We showcase the effectiveness of our method on a wide range of tasks, including classification and editing of implicit neural representations.
arXiv Detail & Related papers (2024-03-18T18:01:01Z)
Implicit Bias of Gradient Descent for Two-layer ReLU and Leaky ReLU Networks on Nearly-orthogonal Data [66.1211659120882]
The implicit bias towards solutions with favorable properties is believed to be a key reason why neural networks trained by gradient-based optimization can generalize well. While the implicit bias of gradient flow has been widely studied for homogeneous neural networks (including ReLU and leaky ReLU networks), the implicit bias of gradient descent is currently only understood for smooth neural networks.
arXiv Detail & Related papers (2023-10-29T08:47:48Z)
Fourier Sensitivity and Regularization of Computer Vision Models [11.79852671537969]
We study the frequency sensitivity characteristics of deep neural networks using a principled approach. We find that computer vision models are consistently sensitive to particular frequencies dependent on the dataset, training method and architecture.
arXiv Detail & Related papers (2023-01-31T10:05:35Z)
Understanding the Spectral Bias of Coordinate Based MLPs Via Training Dynamics [2.9443230571766854]
We study the connection between the computations of ReLU networks, and the speed of gradient descent convergence. We then use this formulation to study the severity of spectral bias in low dimensional settings, and how positional encoding overcomes this.
arXiv Detail & Related papers (2023-01-14T04:21:25Z)
Neural networks trained with SGD learn distributions of increasing complexity [78.30235086565388]
We show that neural networks trained using gradient descent initially classify their inputs using lower-order input statistics. We then exploit higher-order statistics only later during training. We discuss the relation of DSB to other simplicity biases and consider its implications for the principle of universality in learning.
arXiv Detail & Related papers (2022-11-21T15:27:22Z)
Understanding robustness and generalization of artificial neural networks through Fourier masks [8.94889125739046]
Recent literature suggests that robust networks with good generalization properties tend to be biased towards processing low frequencies in images. We develop an algorithm that allows us to learn modulatory masks highlighting the essential input frequencies needed for preserving a trained network's performance.
arXiv Detail & Related papers (2022-03-16T17:32:00Z)
Data-driven emergence of convolutional structure in neural networks [83.4920717252233]
We show how fully-connected neural networks solving a discrimination task can learn a convolutional structure directly from their inputs. By carefully designing data models, we show that the emergence of this pattern is triggered by the non-Gaussian, higher-order local structure of the inputs.
arXiv Detail & Related papers (2022-02-01T17:11:13Z)
Spectral Bias in Practice: The Role of Function Frequency in Generalization [10.7218588164913]
We propose methodologies for measuring spectral bias in modern image classification networks. We find that networks that generalize well strike a balance between having enough complexity to fit the data while being simple enough to avoid overfitting. Our work enables measuring and ultimately controlling the spectral behavior of neural networks used for image classification.
arXiv Detail & Related papers (2021-10-06T00:16:10Z)
Redundant representations help generalization in wide neural networks [71.38860635025907]
We study the last hidden layer representations of various state-of-the-art convolutional neural networks. We find that if the last hidden representation is wide enough, its neurons tend to split into groups that carry identical information, and differ from each other only by statistically independent noise.
arXiv Detail & Related papers (2021-06-07T10:18:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.