Stable and Robust Deep Learning By Hyperbolic Tangent Exponential Linear
Unit (TeLU)
- URL: http://arxiv.org/abs/2402.02790v1
- Date: Mon, 5 Feb 2024 07:56:02 GMT
- Title: Stable and Robust Deep Learning By Hyperbolic Tangent Exponential Linear
Unit (TeLU)
- Authors: Alfredo Fernandez and Ankur Mali
- Abstract summary: We introduce a novel neural network activation function, represented as $f(x) = xcdottanh(ex)$.
TeLU is designed to overcome the limitations of conventional activation functions like ReLU, GELU, and Mish.
Our theoretical analysis and empirical assessments reveal that TeLU outperforms existing activation functions in stability and robustness.
- Score: 2.1485350418225244
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we introduce the Hyperbolic Tangent Exponential Linear Unit
(TeLU), a novel neural network activation function, represented as $f(x) =
x{\cdot}tanh(e^x)$. TeLU is designed to overcome the limitations of
conventional activation functions like ReLU, GELU, and Mish by addressing the
vanishing and, to an extent, the exploding gradient problems. Our theoretical
analysis and empirical assessments reveal that TeLU outperforms existing
activation functions in stability and robustness, effectively adjusting
activation outputs' mean towards zero for enhanced training stability and
convergence. Extensive evaluations against popular activation functions (ReLU,
GELU, SiLU, Mish, Logish, Smish) across advanced architectures, including
Resnet-50, demonstrate TeLU's lower variance and superior performance, even
under hyperparameter conditions optimized for other functions. In large-scale
tests with challenging datasets like CIFAR-10, CIFAR-100, and TinyImageNet,
encompassing 860 scenarios, TeLU consistently showcased its effectiveness,
positioning itself as a potential new standard for neural network activation
functions, boosting stability and performance in diverse deep learning
applications.
Related papers
- Hysteresis Activation Function for Efficient Inference [3.5223695602582614]
We propose a Hysteresis Rectified Linear Unit (HeLU) to address the dying ReLU'' problem with minimal complexity.
Unlike traditional activation functions with fixed thresholds for training and inference, HeLU employs a variable threshold that refines the backpropagation.
arXiv Detail & Related papers (2024-11-15T20:46:58Z) - Activation function optimization method: Learnable series linear units (LSLUs) [12.089173508371246]
We propose a series-based learnable ac-tivation function called LSLU (Learnable Series Linear Units)
This method simplifies deep learning networks while im-proving accuracy.
We evaluate LSLU's performance on CIFAR10, CIFAR100, and specific task datasets (e.g., Silkworm)
arXiv Detail & Related papers (2024-08-28T11:12:27Z) - Stabilizing Extreme Q-learning by Maclaurin Expansion [51.041889588036895]
Extreme Q-learning (XQL) employs a loss function based on the assumption that Bellman error follows a Gumbel distribution.
It has demonstrated strong performance in both offline and online reinforcement learning settings.
We propose Maclaurin Expanded Extreme Q-learning to enhance stability.
arXiv Detail & Related papers (2024-06-07T12:43:17Z) - A Method on Searching Better Activation Functions [15.180864683908878]
We propose Entropy-based Activation Function Optimization (EAFO) methodology for designing static activation functions in deep neural networks.
We derive a novel activation function from ReLU, known as Correction Regularized ReLU (CRReLU)
arXiv Detail & Related papers (2024-05-19T03:48:05Z) - ReLU$^2$ Wins: Discovering Efficient Activation Functions for Sparse
LLMs [91.31204876440765]
We introduce a general method that defines neuron activation through neuron output magnitudes and a tailored magnitude threshold.
To find the most efficient activation function for sparse computation, we propose a systematic framework.
We conduct thorough experiments on LLMs utilizing different activation functions, including ReLU, SwiGLU, ReGLU, and ReLU$2$.
arXiv Detail & Related papers (2024-02-06T08:45:51Z) - The Implicit Bias of Minima Stability in Multivariate Shallow ReLU
Networks [53.95175206863992]
We study the type of solutions to which gradient descent converges when used to train a single hidden-layer multivariate ReLU network with the quadratic loss.
We prove that although shallow ReLU networks are universal approximators, stable shallow networks are not.
arXiv Detail & Related papers (2023-06-30T09:17:39Z) - TaLU: A Hybrid Activation Function Combining Tanh and Rectified Linear
Unit to Enhance Neural Networks [1.3477333339913569]
TaLU is a modified activation function combining Tanh and ReLU, which mitigates the dying gradient problem of ReLU.
Deep learning model with the proposed activation function was tested on MNIST and CIFAR-10.
arXiv Detail & Related papers (2023-05-08T01:13:59Z) - Transformers with Learnable Activation Functions [63.98696070245065]
We use Rational Activation Function (RAF) to learn optimal activation functions during training according to input data.
RAF opens a new research direction for analyzing and interpreting pre-trained models according to the learned activation functions.
arXiv Detail & Related papers (2022-08-30T09:47:31Z) - Toward Fast, Flexible, and Robust Low-Light Image Enhancement [87.27326390675155]
We develop a new Self-Calibrated Illumination (SCI) learning framework for fast, flexible, and robust brightening images in real-world low-light scenarios.
Considering the computational burden of the cascaded pattern, we construct the self-calibrated module which realizes the convergence between results of each stage.
We make comprehensive explorations to SCI's inherent properties including operation-insensitive adaptability and model-irrelevant generality.
arXiv Detail & Related papers (2022-04-21T14:40:32Z) - Adversarially Robust Learning for Security-Constrained Optimal Power
Flow [55.816266355623085]
We tackle the problem of N-k security-constrained optimal power flow (SCOPF)
N-k SCOPF is a core problem for the operation of electrical grids.
Inspired by methods in adversarially robust training, we frame N-k SCOPF as a minimax optimization problem.
arXiv Detail & Related papers (2021-11-12T22:08:10Z) - Soft-Root-Sign Activation Function [21.716884634290516]
"Soft-Root-Sign" (SRS) is smooth, non-monotonic, and bounded.
In contrast to ReLU, SRS can adaptively adjust the output by a pair of independent trainable parameters.
Our SRS matches or exceeds models with ReLU and other state-of-the-art nonlinearities.
arXiv Detail & Related papers (2020-03-01T18:38:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.