SwishReLU: A Unified Approach to Activation Functions for Enhanced Deep Neural Networks Performance
- URL: http://arxiv.org/abs/2407.08232v1
- Date: Thu, 11 Jul 2024 07:14:34 GMT
- Title: SwishReLU: A Unified Approach to Activation Functions for Enhanced Deep Neural Networks Performance
- Authors: Jamshaid Ul Rahman, Rubiqa Zulfiqar, Asad Khan, Nimra,
- Abstract summary: ReLU, a commonly used activation function in deep neural networks, is prone to the issue of "Dying ReLU"
Several enhanced versions, such as ELU, SeLU, and Swish, have been introduced and are considered to be less commonly utilized.
This paper proposes SwishReLU, a novel activation function combining elements of ReLU and Swish.
- Score: 1.2724528787590168
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: ReLU, a commonly used activation function in deep neural networks, is prone to the issue of "Dying ReLU". Several enhanced versions, such as ELU, SeLU, and Swish, have been introduced and are considered to be less commonly utilized. However, replacing ReLU can be somewhat challenging due to its inconsistent advantages. While Swish offers a smoother transition similar to ReLU, its utilization generally incurs a greater computational burden compared to ReLU. This paper proposes SwishReLU, a novel activation function combining elements of ReLU and Swish. Our findings reveal that SwishReLU outperforms ReLU in performance with a lower computational cost than Swish. This paper undertakes an examination and comparison of different types of ReLU variants with SwishReLU. Specifically, we compare ELU and SeLU along with Tanh on three datasets: CIFAR-10, CIFAR-100 and MNIST. Notably, applying SwishReLU in the VGG16 model described in Algorithm 2 yields a 6% accuracy improvement on the CIFAR-10 dataset.
Related papers
- R3-RAG: Learning Step-by-Step Reasoning and Retrieval for LLMs via Reinforcement Learning [62.742230250513025]
Retrieval-Augmented Generation (RAG) integrates external knowledge with Large Language Models (LLMs) to enhance factual correctness and hallucination.<n>We propose $textbfR3-RAG$, which uses $textbfR$einforcement learning to make the LLM learn how to $textbfR$eason and $textbfR$etrieve step by step, thus retrieving comprehensive external knowledge and leading to correct answers.
arXiv Detail & Related papers (2025-05-26T12:25:37Z) - VeLU: Variance-enhanced Learning Unit for Deep Neural Networks [38.363465138060086]
We propose VeLU as an activation function that scales based on input variance.
VeLU is superior to ReLU, ReLU6, Swish, and GELU on six vision benchmarks.
arXiv Detail & Related papers (2025-04-21T12:20:46Z) - LoRA Done RITE: Robust Invariant Transformation Equilibration for LoRA Optimization [78.93425154518705]
Low-rank adaption (LoRA) is a widely used parameter-efficient finetuning method for LLM that reduces memory requirements.
This paper introduces LoRA-RITE, a novel adaptive matrix preconditioning method for LoRA optimization.
arXiv Detail & Related papers (2024-10-27T22:57:12Z) - ReLU$^2$ Wins: Discovering Efficient Activation Functions for Sparse
LLMs [91.31204876440765]
We introduce a general method that defines neuron activation through neuron output magnitudes and a tailored magnitude threshold.
To find the most efficient activation function for sparse computation, we propose a systematic framework.
We conduct thorough experiments on LLMs utilizing different activation functions, including ReLU, SwiGLU, ReGLU, and ReLU$2$.
arXiv Detail & Related papers (2024-02-06T08:45:51Z) - TaLU: A Hybrid Activation Function Combining Tanh and Rectified Linear
Unit to Enhance Neural Networks [1.3477333339913569]
TaLU is a modified activation function combining Tanh and ReLU, which mitigates the dying gradient problem of ReLU.
Deep learning model with the proposed activation function was tested on MNIST and CIFAR-10.
arXiv Detail & Related papers (2023-05-08T01:13:59Z) - A Study on ReLU and Softmax in Transformer [51.0740713922741]
The Transformer architecture consists of self-attention and feed-forward networks (FFNs) which can be viewed as key-value memories.
We first rebuild the connections between FFN and key-value memory by conducting extensive studies on ReLU and Softmax.
In addition, ReLU outperforms Softmax on both FFN and key-value memory when the number of value slots is large.
arXiv Detail & Related papers (2023-02-13T15:41:20Z) - Rotate the ReLU to implicitly sparsify deep networks [13.203765985718201]
We propose a novel idea of rotating the ReLU activation to give one more degree of freedom to the architecture.
We show that this activation wherein the rotation is learned via training results in the elimination of those parameters/filters in the network which are not important for the task.
arXiv Detail & Related papers (2022-06-01T13:38:45Z) - SAU: Smooth activation function using convolution with approximate
identities [1.5267236995686555]
Well-known activation functions like ReLU or Leaky ReLU are non-differentiable at the origin.
We propose new smooth approximations of a non-differentiable activation function by convolving it with approximate identities.
arXiv Detail & Related papers (2021-09-27T17:31:04Z) - Reducing ReLU Count for Privacy-Preserving CNN Speedup [25.86435513157795]
Privacy-Preserving Machine Learning algorithms must balance classification accuracy with data privacy.
CNNs typically consist of two types of operations: a convolutional or linear layer, followed by a non-linear function such as ReLU.
Recent research suggests that ReLU is responsible for most of the communication bandwidth.
We propose to share ReLU operations. Specifically, the ReLU decision of one activation can be used by others, and we explore different ways to determine the ReLU for such a group of activations.
arXiv Detail & Related papers (2021-01-28T06:49:31Z) - ALReLU: A different approach on Leaky ReLU activation function to
improve Neural Networks Performance [0.0]
The classical ReLU activation function (AF) has been extensively applied in Deep Neural Networks (DNN)
The common gradient issues of ReLU pose challenges in applications on academy and industry sectors.
The Absolute Leaky ReLU (ALReLU) AF, a variation of LReLU, is proposed as an alternative method to resolve the common 'dying ReLU problem'
arXiv Detail & Related papers (2020-12-11T06:46:42Z) - Comparisons among different stochastic selection of activation layers
for convolutional neural networks for healthcare [77.99636165307996]
We classify biomedical images using ensembles of neural networks.
We select our activations among the following ones: ReLU, leaky ReLU, Parametric ReLU, ELU, Adaptive Piecewice Linear Unit, S-Shaped ReLU, Swish, Mish, Mexican Linear Unit, Parametric Deformable Linear Unit, Soft Root Sign.
arXiv Detail & Related papers (2020-11-24T01:53:39Z) - Efficient Integer-Arithmetic-Only Convolutional Neural Networks [87.01739569518513]
We replace conventional ReLU with Bounded ReLU and find that the decline is due to activation quantization.
Our integer networks achieve equivalent performance as the corresponding FPN networks, but have only 1/4 memory cost and run 2x faster on modern GPU.
arXiv Detail & Related papers (2020-06-21T08:23:03Z) - Dynamic ReLU [74.973224160508]
We propose dynamic ReLU (DY-ReLU), a dynamic input of parameters which are generated by a hyper function over all in-put elements.
Compared to its static counterpart, DY-ReLU has negligible extra computational cost, but significantly more representation capability.
By simply using DY-ReLU for MobileNetV2, the top-1 accuracy on ImageNet classification is boosted from 72.0% to 76.2% with only 5% additional FLOPs.
arXiv Detail & Related papers (2020-03-22T23:45:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.