Zorro: A Flexible and Differentiable Parametric Family of Activation Functions That Extends ReLU and GELU
- URL: http://arxiv.org/abs/2409.19239v1
- Date: Sat, 28 Sep 2024 05:04:56 GMT
- Title: Zorro: A Flexible and Differentiable Parametric Family of Activation Functions That Extends ReLU and GELU
- Authors: Matias Roodschild, Jorge Gotay-SardiƱas, Victor A. Jimenez, Adrian Will,
- Abstract summary: More than 400 functions have been proposed over the last 30 years, including fixed or trainable parameters, but only a few are widely used.
This article introduces a novel set of activation functions called Zorro, a continuously differentiable and flexible family comprising five main functions fusing ReLU and Sigmoid.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Even in recent neural network architectures such as Transformers and Extended LSTM (xLSTM), and traditional ones like Convolutional Neural Networks, Activation Functions are an integral part of nearly all neural networks. They enable more effective training and capture nonlinear data patterns. More than 400 functions have been proposed over the last 30 years, including fixed or trainable parameters, but only a few are widely used. ReLU is one of the most frequently used, with GELU and Swish variants increasingly appearing. However, ReLU presents non-differentiable points and exploding gradient issues, while testing different parameters of GELU and Swish variants produces varying results, needing more parameters to adapt to datasets and architectures. This article introduces a novel set of activation functions called Zorro, a continuously differentiable and flexible family comprising five main functions fusing ReLU and Sigmoid. Zorro functions are smooth and adaptable, and serve as information gates, aligning with ReLU in the 0-1 range, offering an alternative to ReLU without the need for normalization, neuron death, or gradient explosions. Zorro also approximates functions like Swish, GELU, and DGELU, providing parameters to adjust to different datasets and architectures. We tested it on fully connected, convolutional, and transformer architectures to demonstrate its effectiveness.
Related papers
- Activation function optimization method: Learnable series linear units (LSLUs) [12.089173508371246]
We propose a series-based learnable ac-tivation function called LSLU (Learnable Series Linear Units)
This method simplifies deep learning networks while im-proving accuracy.
We evaluate LSLU's performance on CIFAR10, CIFAR100, and specific task datasets (e.g., Silkworm)
arXiv Detail & Related papers (2024-08-28T11:12:27Z) - A Non-monotonic Smooth Activation Function [4.269446061678759]
Activation functions are crucial in deep learning models since they introduce non-linearity into the networks.
In this study, we propose a new activation function called Sqish, which is a non-monotonic and smooth function.
We showed its superiority in classification, object detection, segmentation tasks, and adversarial robustness experiments.
arXiv Detail & Related papers (2023-10-16T07:09:47Z) - Neural Estimation of Submodular Functions with Applications to
Differentiable Subset Selection [50.14730810124592]
Submodular functions and variants, through their ability to characterize diversity and coverage, have emerged as a key tool for data selection and summarization.
We propose FLEXSUBNET, a family of flexible neural models for both monotone and non-monotone submodular functions.
arXiv Detail & Related papers (2022-10-20T06:00:45Z) - Graph-adaptive Rectified Linear Unit for Graph Neural Networks [64.92221119723048]
Graph Neural Networks (GNNs) have achieved remarkable success by extending traditional convolution to learning on non-Euclidean data.
We propose Graph-adaptive Rectified Linear Unit (GReLU) which is a new parametric activation function incorporating the neighborhood information in a novel and efficient way.
We conduct comprehensive experiments to show that our plug-and-play GReLU method is efficient and effective given different GNN backbones and various downstream tasks.
arXiv Detail & Related papers (2022-02-13T10:54:59Z) - SAU: Smooth activation function using convolution with approximate
identities [1.5267236995686555]
Well-known activation functions like ReLU or Leaky ReLU are non-differentiable at the origin.
We propose new smooth approximations of a non-differentiable activation function by convolving it with approximate identities.
arXiv Detail & Related papers (2021-09-27T17:31:04Z) - Efficient Feature Transformations for Discriminative and Generative
Continual Learning [98.10425163678082]
We propose a simple task-specific feature map transformation strategy for continual learning.
Theses provide powerful flexibility for learning new tasks, achieved with minimal parameters added to the base architecture.
We demonstrate the efficacy and efficiency of our method with an extensive set of experiments in discriminative (CIFAR-100 and ImageNet-1K) and generative sequences of tasks.
arXiv Detail & Related papers (2021-03-25T01:48:14Z) - GhostSR: Learning Ghost Features for Efficient Image Super-Resolution [49.393251361038025]
Single image super-resolution (SISR) system based on convolutional neural networks (CNNs) achieves fancy performance while requires huge computational costs.
We propose to use shift operation to generate the redundant features (i.e., Ghost features) of SISR models.
We show that both the non-compact and lightweight SISR models embedded in our proposed module can achieve comparable performance to that of their baselines.
arXiv Detail & Related papers (2021-01-21T10:09:47Z) - Comparisons among different stochastic selection of activation layers
for convolutional neural networks for healthcare [77.99636165307996]
We classify biomedical images using ensembles of neural networks.
We select our activations among the following ones: ReLU, leaky ReLU, Parametric ReLU, ELU, Adaptive Piecewice Linear Unit, S-Shaped ReLU, Swish, Mish, Mexican Linear Unit, Parametric Deformable Linear Unit, Soft Root Sign.
arXiv Detail & Related papers (2020-11-24T01:53:39Z) - Dynamic ReLU [74.973224160508]
We propose dynamic ReLU (DY-ReLU), a dynamic input of parameters which are generated by a hyper function over all in-put elements.
Compared to its static counterpart, DY-ReLU has negligible extra computational cost, but significantly more representation capability.
By simply using DY-ReLU for MobileNetV2, the top-1 accuracy on ImageNet classification is boosted from 72.0% to 76.2% with only 5% additional FLOPs.
arXiv Detail & Related papers (2020-03-22T23:45:35Z) - Soft-Root-Sign Activation Function [21.716884634290516]
"Soft-Root-Sign" (SRS) is smooth, non-monotonic, and bounded.
In contrast to ReLU, SRS can adaptively adjust the output by a pair of independent trainable parameters.
Our SRS matches or exceeds models with ReLU and other state-of-the-art nonlinearities.
arXiv Detail & Related papers (2020-03-01T18:38:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.