Gaussian Error Linear Units (GELUs)
- URL: http://arxiv.org/abs/1606.08415v5
- Date: Tue, 6 Jun 2023 01:53:32 GMT
- Title: Gaussian Error Linear Units (GELUs)
- Authors: Dan Hendrycks and Kevin Gimpel
- Abstract summary: We propose a neural network activation function that weights inputs by their value, rather than gates by their sign.
We find performance improvements across all considered computer vision, natural language processing, and speech tasks.
- Score: 58.195342948092964
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose the Gaussian Error Linear Unit (GELU), a high-performing neural
network activation function. The GELU activation function is $x\Phi(x)$, where
$\Phi(x)$ the standard Gaussian cumulative distribution function. The GELU
nonlinearity weights inputs by their value, rather than gates inputs by their
sign as in ReLUs ($x\mathbf{1}_{x>0}$). We perform an empirical evaluation of
the GELU nonlinearity against the ReLU and ELU activations and find performance
improvements across all considered computer vision, natural language
processing, and speech tasks.
Related papers
- ReLU$^2$ Wins: Discovering Efficient Activation Functions for Sparse
LLMs [91.31204876440765]
We introduce a general method that defines neuron activation through neuron output magnitudes and a tailored magnitude threshold.
To find the most efficient activation function for sparse computation, we propose a systematic framework.
We conduct thorough experiments on LLMs utilizing different activation functions, including ReLU, SwiGLU, ReGLU, and ReLU$2$.
arXiv Detail & Related papers (2024-02-06T08:45:51Z) - A Unifying Generator Loss Function for Generative Adversarial Networks [5.5575224613422725]
A unifying $alpha$-parametrized generator loss function is introduced for a dual-objective generative adversarial network (GAN)
The generator loss function is based on a symmetric class probability estimation type function, $mathcalL_alpha$, and the resulting GAN system is termed $mathcalL_alpha$-GAN.
arXiv Detail & Related papers (2023-08-14T16:16:31Z) - Efficiently Learning One-Hidden-Layer ReLU Networks via Schur
Polynomials [50.90125395570797]
We study the problem of PAC learning a linear combination of $k$ ReLU activations under the standard Gaussian distribution on $mathbbRd$ with respect to the square loss.
Our main result is an efficient algorithm for this learning task with sample and computational complexity $(dk/epsilon)O(k)$, whereepsilon>0$ is the target accuracy.
arXiv Detail & Related papers (2023-07-24T14:37:22Z) - Learning a Single Neuron for Non-monotonic Activation Functions [3.890410443467757]
Non-monotonic activation functions outperform the traditional monotonic ones in many applications.
We show that mild conditions on $sigma$ are sufficient to guarantee the learnability in samples time.
We also discuss how our positive results are related to existing negative results on training two-layer neural networks.
arXiv Detail & Related papers (2022-02-16T13:44:25Z) - Graph-adaptive Rectified Linear Unit for Graph Neural Networks [64.92221119723048]
Graph Neural Networks (GNNs) have achieved remarkable success by extending traditional convolution to learning on non-Euclidean data.
We propose Graph-adaptive Rectified Linear Unit (GReLU) which is a new parametric activation function incorporating the neighborhood information in a novel and efficient way.
We conduct comprehensive experiments to show that our plug-and-play GReLU method is efficient and effective given different GNN backbones and various downstream tasks.
arXiv Detail & Related papers (2022-02-13T10:54:59Z) - Convergence and Sample Complexity of SGD in GANs [15.25030172685628]
We provide convergence guarantees on training Generative Adversarial Networks (GANs) via SGD.
We consider learning a target distribution modeled by a 1-layer Generator network with a non-linear activation function.
Our results apply to a broad class of non-linear activation functions $phi$, including ReLUs and is enabled by a connection with truncated statistics.
arXiv Detail & Related papers (2020-12-01T18:50:38Z) - Optimal Robust Linear Regression in Nearly Linear Time [97.11565882347772]
We study the problem of high-dimensional robust linear regression where a learner is given access to $n$ samples from the generative model $Y = langle X,w* rangle + epsilon$
We propose estimators for this problem under two settings: (i) $X$ is L4-L2 hypercontractive, $mathbbE [XXtop]$ has bounded condition number and $epsilon$ has bounded variance and (ii) $X$ is sub-Gaussian with identity second moment and $epsilon$ is
arXiv Detail & Related papers (2020-07-16T06:44:44Z) - Overcoming Overfitting and Large Weight Update Problem in Linear
Rectifiers: Thresholded Exponential Rectified Linear Units [0.0]
"Thresholded exponential rectified linear unit" (TERELU) activation function works better in alleviating in overfitting: large weight update problem.
We will show better performance on the various using neural networks, considering TERELU activation method compared to other activation datasets.
arXiv Detail & Related papers (2020-06-04T11:55:47Z) - Least $k$th-Order and R\'{e}nyi Generative Adversarial Networks [12.13405065406781]
Experimental results indicate that the proposed loss functions, applied to the MNIST and CelebA datasets, confer performance benefits by virtue of the extra degrees of freedom provided by the parameters $k$ and $alpha$, respectively.
While it was applied to GANs in this study, the proposed approach is generic and can be used in other applications of information theory to deep learning, e.g., the issues of fairness or privacy in artificial intelligence.
arXiv Detail & Related papers (2020-06-03T18:44:05Z) - Agnostic Learning of a Single Neuron with Gradient Descent [92.7662890047311]
We consider the problem of learning the best-fitting single neuron as measured by the expected square loss.
For the ReLU activation, our population risk guarantee is $O(mathsfOPT1/2)+epsilon$.
For the ReLU activation, our population risk guarantee is $O(mathsfOPT1/2)+epsilon$.
arXiv Detail & Related papers (2020-05-29T07:20:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.