Activation Functions: Dive into an optimal activation function
- URL: http://arxiv.org/abs/2202.12065v1
- Date: Thu, 24 Feb 2022 12:44:11 GMT
- Title: Activation Functions: Dive into an optimal activation function
- Authors: Vipul Bansal
- Abstract summary: We find an optimal activation function by defining it as a weighted sum of existing activation functions.
The study uses three activation functions, ReLU, tanh, and sin, over three popular image datasets.
- Score: 1.52292571922932
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Activation functions have come up as one of the essential components of
neural networks. The choice of adequate activation function can impact the
accuracy of these methods. In this study, we experiment for finding an optimal
activation function by defining it as a weighted sum of existing activation
functions and then further optimizing these weights while training the network.
The study uses three activation functions, ReLU, tanh, and sin, over three
popular image datasets, MNIST, FashionMNIST, and KMNIST. We observe that the
ReLU activation function can easily overlook other activation functions. Also,
we see that initial layers prefer to have ReLU or LeakyReLU type of activation
functions, but deeper layers tend to prefer more convergent activation
functions.
Related papers
- Sparsing Law: Towards Large Language Models with Greater Activation Sparsity [62.09617609556697]
Activation sparsity denotes the existence of substantial weakly-contributed elements within activation outputs that can be eliminated.
We propose PPL-$p%$ sparsity, a precise and performance-aware activation sparsity metric.
We show that ReLU is more efficient as the activation function than SiLU and can leverage more training data to improve activation sparsity.
arXiv Detail & Related papers (2024-11-04T17:59:04Z) - Not All Diffusion Model Activations Have Been Evaluated as Discriminative Features [115.33889811527533]
Diffusion models are initially designed for image generation.
Recent research shows that the internal signals within their backbones, named activations, can also serve as dense features for various discriminative tasks.
arXiv Detail & Related papers (2024-10-04T16:05:14Z) - Trainable Highly-expressive Activation Functions [8.662179223772089]
We introduce DiTAC, a trainable highly-expressive activation function.
DiTAC enhances model expressiveness and performance, often yielding substantial improvements.
It also outperforms existing activation functions (regardless of whether the latter are fixed or trainable) in tasks such as semantic segmentation, image generation, regression problems, and image classification.
arXiv Detail & Related papers (2024-07-10T11:49:29Z) - ProSparse: Introducing and Enhancing Intrinsic Activation Sparsity within Large Language Models [74.59731375779934]
Activation sparsity refers to the existence of weakly-contributed elements among activation outputs.
This paper introduces a simple and effective sparsification method named "ProSparse" to push LLMs for higher activation sparsity.
arXiv Detail & Related papers (2024-02-21T03:58:49Z) - ReLU$^2$ Wins: Discovering Efficient Activation Functions for Sparse
LLMs [91.31204876440765]
We introduce a general method that defines neuron activation through neuron output magnitudes and a tailored magnitude threshold.
To find the most efficient activation function for sparse computation, we propose a systematic framework.
We conduct thorough experiments on LLMs utilizing different activation functions, including ReLU, SwiGLU, ReGLU, and ReLU$2$.
arXiv Detail & Related papers (2024-02-06T08:45:51Z) - ErfReLU: Adaptive Activation Function for Deep Neural Network [1.9336815376402716]
Recent research has found that the activation function selected for adding non-linearity into the output can have a big impact on how effectively deep learning networks perform.
Researchers recently started developing activation functions that can be trained throughout the learning process.
State of art activation functions like Sigmoid, ReLU, Tanh, and their properties have been briefly explained.
arXiv Detail & Related papers (2023-06-02T13:41:47Z) - Saturated Non-Monotonic Activation Functions [21.16866749728754]
We present three new activation functions built with our proposed method: SGELU, SSiLU, and SMish, which are composed of the negative portion of GELU, SiLU, and Mish, respectively, and ReLU's positive portion.
The results of image classification experiments on CIFAR-100 indicate that our proposed activation functions are highly effective and outperform state-of-the-art baselines across multiple deep learning architectures.
arXiv Detail & Related papers (2023-05-12T15:01:06Z) - Globally Optimal Training of Neural Networks with Threshold Activation
Functions [63.03759813952481]
We study weight decay regularized training problems of deep neural networks with threshold activations.
We derive a simplified convex optimization formulation when the dataset can be shattered at a certain layer of the network.
arXiv Detail & Related papers (2023-03-06T18:59:13Z) - Evaluating CNN with Oscillatory Activation Function [0.0]
CNNs capability to learn high-dimensional complex features from the images is the non-linearity introduced by the activation function.
This paper explores the performance of one of the CNN architecture ALexNet on MNIST and CIFAR10 datasets using oscillating activation function (GCU) and some other commonly used activation functions like ReLu, PReLu, and Mish.
arXiv Detail & Related papers (2022-11-13T11:17:13Z) - Transformers with Learnable Activation Functions [63.98696070245065]
We use Rational Activation Function (RAF) to learn optimal activation functions during training according to input data.
RAF opens a new research direction for analyzing and interpreting pre-trained models according to the learned activation functions.
arXiv Detail & Related papers (2022-08-30T09:47:31Z) - Evolutionary Optimization of Deep Learning Activation Functions [15.628118691027328]
We show that evolutionary algorithms can discover novel activation functions that outperform the Rectified Linear Unit (ReLU)
replacing ReLU with evolved activation functions results in statistically significant increases in network accuracy.
These novel activation functions are shown to generalize, achieving high performance across tasks.
arXiv Detail & Related papers (2020-02-17T19:54:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.