On the Activation Function Dependence of the Spectral Bias of Neural
Networks
- URL: http://arxiv.org/abs/2208.04924v2
- Date: Wed, 10 Aug 2022 18:38:28 GMT
- Title: On the Activation Function Dependence of the Spectral Bias of Neural
Networks
- Authors: Qingguo Hong and Qinyang Tan and Jonathan W. Siegel and Jinchao Xu
- Abstract summary: We study the phenomenon from the point of view of the spectral bias of neural networks.
We provide a theoretical explanation for the spectral bias of ReLU neural networks by leveraging connections with the theory of finite element methods.
We show that neural networks with the Hat activation function are trained significantly faster using gradient descent and ADAM.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Neural networks are universal function approximators which are known to
generalize well despite being dramatically overparameterized. We study this
phenomenon from the point of view of the spectral bias of neural networks. Our
contributions are two-fold. First, we provide a theoretical explanation for the
spectral bias of ReLU neural networks by leveraging connections with the theory
of finite element methods. Second, based upon this theory we predict that
switching the activation function to a piecewise linear B-spline, namely the
Hat function, will remove this spectral bias, which we verify empirically in a
variety of settings. Our empirical studies also show that neural networks with
the Hat activation function are trained significantly faster using stochastic
gradient descent and ADAM. Combined with previous work showing that the Hat
activation function also improves generalization accuracy on image
classification tasks, this indicates that using the Hat activation provides
significant advantages over the ReLU on certain problems.
Related papers
- ReLUs Are Sufficient for Learning Implicit Neural Representations [17.786058035763254]
We revisit the use of ReLU activation functions for learning implicit neural representations.
Inspired by second order B-spline wavelets, we incorporate a set of simple constraints to the ReLU neurons in each layer of a deep neural network (DNN)
We demonstrate that, contrary to popular belief, one can learn state-of-the-art INRs based on a DNN composed of only ReLU neurons.
arXiv Detail & Related papers (2024-06-04T17:51:08Z) - Benign Overfitting in Deep Neural Networks under Lazy Training [72.28294823115502]
We show that when the data distribution is well-separated, DNNs can achieve Bayes-optimal test error for classification.
Our results indicate that interpolating with smoother functions leads to better generalization.
arXiv Detail & Related papers (2023-05-30T19:37:44Z) - A Scalable Walsh-Hadamard Regularizer to Overcome the Low-degree
Spectral Bias of Neural Networks [79.28094304325116]
Despite the capacity of neural nets to learn arbitrary functions, models trained through gradient descent often exhibit a bias towards simpler'' functions.
We show how this spectral bias towards low-degree frequencies can in fact hurt the neural network's generalization on real-world datasets.
We propose a new scalable functional regularization scheme that aids the neural network to learn higher degree frequencies.
arXiv Detail & Related papers (2023-05-16T20:06:01Z) - Globally Optimal Training of Neural Networks with Threshold Activation
Functions [63.03759813952481]
We study weight decay regularized training problems of deep neural networks with threshold activations.
We derive a simplified convex optimization formulation when the dataset can be shattered at a certain layer of the network.
arXiv Detail & Related papers (2023-03-06T18:59:13Z) - Data-aware customization of activation functions reduces neural network
error [0.35172332086962865]
We show that data-aware customization of activation functions can result in striking reductions in neural network error.
A simple substitution with the seagull'' activation function in an already-refined neural network can lead to an order-of-magnitude reduction in error.
arXiv Detail & Related papers (2023-01-16T23:38:37Z) - Theoretical Characterization of How Neural Network Pruning Affects its
Generalization [131.1347309639727]
This work makes the first attempt to study how different pruning fractions affect the model's gradient descent dynamics and generalization.
It is shown that as long as the pruning fraction is below a certain threshold, gradient descent can drive the training loss toward zero.
More surprisingly, the generalization bound gets better as the pruning fraction gets larger.
arXiv Detail & Related papers (2023-01-01T03:10:45Z) - Momentum Diminishes the Effect of Spectral Bias in Physics-Informed
Neural Networks [72.09574528342732]
Physics-informed neural network (PINN) algorithms have shown promising results in solving a wide range of problems involving partial differential equations (PDEs)
They often fail to converge to desirable solutions when the target function contains high-frequency features, due to a phenomenon known as spectral bias.
In the present work, we exploit neural tangent kernels (NTKs) to investigate the training dynamics of PINNs evolving under gradient descent with momentum (SGDM)
arXiv Detail & Related papers (2022-06-29T19:03:10Z) - Phenomenology of Double Descent in Finite-Width Neural Networks [29.119232922018732]
Double descent delineates the behaviour of models depending on the regime they belong to.
We use influence functions to derive suitable expressions of the population loss and its lower bound.
Building on our analysis, we investigate how the loss function affects double descent.
arXiv Detail & Related papers (2022-03-14T17:39:49Z) - The Spectral Bias of Polynomial Neural Networks [63.27903166253743]
Polynomial neural networks (PNNs) have been shown to be particularly effective at image generation and face recognition, where high-frequency information is critical.
Previous studies have revealed that neural networks demonstrate a $textitspectral bias$ towards low-frequency functions, which yields faster learning of low-frequency components during training.
Inspired by such studies, we conduct a spectral analysis of the Tangent Kernel (NTK) of PNNs.
We find that the $Pi$-Net family, i.e., a recently proposed parametrization of PNNs, speeds up the
arXiv Detail & Related papers (2022-02-27T23:12:43Z) - Periodic Activation Functions Induce Stationarity [19.689175123261613]
We show that periodic activation functions in Bayesian neural networks establish a connection between the prior on the network weights and translation-invariant, stationary Gaussian process priors.
In a series of experiments, we show that periodic activation functions obtain comparable performance for in-domain data and capture sensitivity to perturbed inputs in deep neural networks for out-of-domain detection.
arXiv Detail & Related papers (2021-10-26T11:10:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.