Neural Networks Fail to Learn Periodic Functions and How to Fix It
- URL: http://arxiv.org/abs/2006.08195v2
- Date: Sun, 25 Oct 2020 01:57:51 GMT
- Title: Neural Networks Fail to Learn Periodic Functions and How to Fix It
- Authors: Liu Ziyin, Tilman Hartwig, Masahito Ueda
- Abstract summary: We prove and demonstrate experimentally that the standard activations functions, such as ReLU, tanh, sigmoid, fail to learn to extrapolate simple periodic functions.
We propose a new activation, $x + sin2(x)$, which achieves the desired periodic inductive bias to learn a periodic function.
Experimentally, we apply the proposed method to temperature and financial data prediction.
- Score: 6.230751621285322
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Previous literature offers limited clues on how to learn a periodic function
using modern neural networks. We start with a study of the extrapolation
properties of neural networks; we prove and demonstrate experimentally that the
standard activations functions, such as ReLU, tanh, sigmoid, along with their
variants, all fail to learn to extrapolate simple periodic functions. We
hypothesize that this is due to their lack of a "periodic" inductive bias. As a
fix of this problem, we propose a new activation, namely, $x + \sin^2(x)$,
which achieves the desired periodic inductive bias to learn a periodic function
while maintaining a favorable optimization property of the ReLU-based
activations. Experimentally, we apply the proposed method to temperature and
financial data prediction.
Related papers
- Frequency and Generalisation of Periodic Activation Functions in Reinforcement Learning [9.6812227037557]
We show that periodic activations learn low frequency representations and as a result avoid overfitting to bootstrapped targets.
We also show that weight decay regularization is able to partially offset the overfitting of periodic activation functions.
arXiv Detail & Related papers (2024-07-09T11:07:41Z) - Benign Overfitting in Deep Neural Networks under Lazy Training [72.28294823115502]
We show that when the data distribution is well-separated, DNNs can achieve Bayes-optimal test error for classification.
Our results indicate that interpolating with smoother functions leads to better generalization.
arXiv Detail & Related papers (2023-05-30T19:37:44Z) - Globally Optimal Training of Neural Networks with Threshold Activation
Functions [63.03759813952481]
We study weight decay regularized training problems of deep neural networks with threshold activations.
We derive a simplified convex optimization formulation when the dataset can be shattered at a certain layer of the network.
arXiv Detail & Related papers (2023-03-06T18:59:13Z) - Periodic Extrapolative Generalisation in Neural Networks [10.482805367361818]
We formalise the problem of extrapolative generalisation for periodic signals.
We investigate the generalisation abilities of classical, population-based, and recently proposed periodic architectures.
We find that periodic and "snake" activation functions consistently fail at periodic extrapolation.
arXiv Detail & Related papers (2022-09-21T11:47:30Z) - On the Activation Function Dependence of the Spectral Bias of Neural
Networks [0.0]
We study the phenomenon from the point of view of the spectral bias of neural networks.
We provide a theoretical explanation for the spectral bias of ReLU neural networks by leveraging connections with the theory of finite element methods.
We show that neural networks with the Hat activation function are trained significantly faster using gradient descent and ADAM.
arXiv Detail & Related papers (2022-08-09T17:40:57Z) - Exploring Linear Feature Disentanglement For Neural Networks [63.20827189693117]
Non-linear activation functions, e.g., Sigmoid, ReLU, and Tanh, have achieved great success in neural networks (NNs)
Due to the complex non-linear characteristic of samples, the objective of those activation functions is to project samples from their original feature space to a linear separable feature space.
This phenomenon ignites our interest in exploring whether all features need to be transformed by all non-linear functions in current typical NNs.
arXiv Detail & Related papers (2022-03-22T13:09:17Z) - The Spectral Bias of Polynomial Neural Networks [63.27903166253743]
Polynomial neural networks (PNNs) have been shown to be particularly effective at image generation and face recognition, where high-frequency information is critical.
Previous studies have revealed that neural networks demonstrate a $textitspectral bias$ towards low-frequency functions, which yields faster learning of low-frequency components during training.
Inspired by such studies, we conduct a spectral analysis of the Tangent Kernel (NTK) of PNNs.
We find that the $Pi$-Net family, i.e., a recently proposed parametrization of PNNs, speeds up the
arXiv Detail & Related papers (2022-02-27T23:12:43Z) - Periodic Activation Functions Induce Stationarity [19.689175123261613]
We show that periodic activation functions in Bayesian neural networks establish a connection between the prior on the network weights and translation-invariant, stationary Gaussian process priors.
In a series of experiments, we show that periodic activation functions obtain comparable performance for in-domain data and capture sensitivity to perturbed inputs in deep neural networks for out-of-domain detection.
arXiv Detail & Related papers (2021-10-26T11:10:37Z) - Learning a Single Neuron with Bias Using Gradient Descent [53.15475693468925]
We study the fundamental problem of learning a single neuron with a bias term.
We show that this is a significantly different and more challenging problem than the bias-less case.
arXiv Detail & Related papers (2021-06-02T12:09:55Z) - A Use of Even Activation Functions in Neural Networks [0.35172332086962865]
We propose an alternative approach to integrate existing knowledge or hypotheses of data structure by constructing custom activation functions.
We show that using an even activation function in one of the fully connected layers improves neural network performance.
arXiv Detail & Related papers (2020-11-23T20:33:13Z) - Gradient Starvation: A Learning Proclivity in Neural Networks [97.02382916372594]
Gradient Starvation arises when cross-entropy loss is minimized by capturing only a subset of features relevant for the task.
This work provides a theoretical explanation for the emergence of such feature imbalance in neural networks.
arXiv Detail & Related papers (2020-11-18T18:52:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.