Saturated Non-Monotonic Activation Functions
- URL: http://arxiv.org/abs/2305.07537v2
- Date: Thu, 25 May 2023 06:41:18 GMT
- Title: Saturated Non-Monotonic Activation Functions
- Authors: Junjia Chen and Zhibin Pan
- Abstract summary: We present three new activation functions built with our proposed method: SGELU, SSiLU, and SMish, which are composed of the negative portion of GELU, SiLU, and Mish, respectively, and ReLU's positive portion.
The results of image classification experiments on CIFAR-100 indicate that our proposed activation functions are highly effective and outperform state-of-the-art baselines across multiple deep learning architectures.
- Score: 21.16866749728754
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Activation functions are essential to deep learning networks. Popular and
versatile activation functions are mostly monotonic functions, some
non-monotonic activation functions are being explored and show promising
performance. But by introducing non-monotonicity, they also alter the positive
input, which is proved to be unnecessary by the success of ReLU and its
variants. In this paper, we double down on the non-monotonic activation
functions' development and propose the Saturated Gaussian Error Linear Units by
combining the characteristics of ReLU and non-monotonic activation functions.
We present three new activation functions built with our proposed method:
SGELU, SSiLU, and SMish, which are composed of the negative portion of GELU,
SiLU, and Mish, respectively, and ReLU's positive portion. The results of image
classification experiments on CIFAR-100 indicate that our proposed activation
functions are highly effective and outperform state-of-the-art baselines across
multiple deep learning architectures.
Related papers
- Sparsing Law: Towards Large Language Models with Greater Activation Sparsity [62.09617609556697]
Activation sparsity denotes the existence of substantial weakly-contributed elements within activation outputs that can be eliminated.
We propose PPL-$p%$ sparsity, a precise and performance-aware activation sparsity metric.
We show that ReLU is more efficient as the activation function than SiLU and can leverage more training data to improve activation sparsity.
arXiv Detail & Related papers (2024-11-04T17:59:04Z) - Not All Diffusion Model Activations Have Been Evaluated as Discriminative Features [115.33889811527533]
Diffusion models are initially designed for image generation.
Recent research shows that the internal signals within their backbones, named activations, can also serve as dense features for various discriminative tasks.
arXiv Detail & Related papers (2024-10-04T16:05:14Z) - ProSparse: Introducing and Enhancing Intrinsic Activation Sparsity within Large Language Models [74.59731375779934]
Activation sparsity refers to the existence of weakly-contributed elements among activation outputs.
This paper introduces a simple and effective sparsification method named "ProSparse" to push LLMs for higher activation sparsity.
arXiv Detail & Related papers (2024-02-21T03:58:49Z) - ReLU$^2$ Wins: Discovering Efficient Activation Functions for Sparse
LLMs [91.31204876440765]
We introduce a general method that defines neuron activation through neuron output magnitudes and a tailored magnitude threshold.
To find the most efficient activation function for sparse computation, we propose a systematic framework.
We conduct thorough experiments on LLMs utilizing different activation functions, including ReLU, SwiGLU, ReGLU, and ReLU$2$.
arXiv Detail & Related papers (2024-02-06T08:45:51Z) - Parametric Leaky Tanh: A New Hybrid Activation Function for Deep
Learning [0.0]
Activation functions (AFs) are crucial components of deep neural networks (DNNs)
We propose a novel hybrid activation function designed to combine the strengths of both the Tanh and Leaky ReLU activation functions.
PLanh is differentiable at all points and addresses the 'dying ReLU' problem by ensuring a non-zero gradient for negative inputs.
arXiv Detail & Related papers (2023-08-11T08:59:27Z) - Empirical study of the modulus as activation function in computer vision
applications [1.5099465160569119]
We show that using the proposed function on computer vision tasks the models generalize better than with other nonlinearities.
The simplicity of the proposed function and its derivative make this solution specially suitable for TinyML and hardware applications.
arXiv Detail & Related papers (2023-01-15T00:32:03Z) - Neural Estimation of Submodular Functions with Applications to
Differentiable Subset Selection [50.14730810124592]
Submodular functions and variants, through their ability to characterize diversity and coverage, have emerged as a key tool for data selection and summarization.
We propose FLEXSUBNET, a family of flexible neural models for both monotone and non-monotone submodular functions.
arXiv Detail & Related papers (2022-10-20T06:00:45Z) - Nish: A Novel Negative Stimulated Hybrid Activation Function [5.482532589225552]
We propose a novel non-monotonic activation function called Negative Stimulated Hybrid Activation Function (Nish)
It behaves like a Rectified Linear Unit (ReLU) function for values greater than zero, and a sinus-sigmoidal function for values less than zero.
The proposed function incorporates the sigmoid and sine wave, allowing new dynamics over traditional ReLU activations.
arXiv Detail & Related papers (2022-10-17T13:32:52Z) - Transformers with Learnable Activation Functions [63.98696070245065]
We use Rational Activation Function (RAF) to learn optimal activation functions during training according to input data.
RAF opens a new research direction for analyzing and interpreting pre-trained models according to the learned activation functions.
arXiv Detail & Related papers (2022-08-30T09:47:31Z) - Activation Functions: Dive into an optimal activation function [1.52292571922932]
We find an optimal activation function by defining it as a weighted sum of existing activation functions.
The study uses three activation functions, ReLU, tanh, and sin, over three popular image datasets.
arXiv Detail & Related papers (2022-02-24T12:44:11Z) - Growing Cosine Unit: A Novel Oscillatory Activation Function That Can
Speedup Training and Reduce Parameters in Convolutional Neural Networks [0.1529342790344802]
Convolution neural networks have been successful in solving many socially important and economically significant problems.
Key discovery that made training deep networks feasible was the adoption of the Rectified Linear Unit (ReLU) activation function.
New activation function C(z) = z cos z outperforms Sigmoids, Swish, Mish and ReLU on a variety of architectures.
arXiv Detail & Related papers (2021-08-30T01:07:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.