TeLU Activation Function for Fast and Stable Deep Learning
- URL: http://arxiv.org/abs/2412.20269v2
- Date: Thu, 02 Jan 2025 02:32:43 GMT
- Title: TeLU Activation Function for Fast and Stable Deep Learning
- Authors: Alfredo Fernandez, Ankur Mali,
- Abstract summary: Hyperbolic Tangent Exponential Linear Unit (TeLU) is a neural network hidden activation function defined as TeLU(x)=xtanh(exp(x))
TeLU's design is grounded in the core principles of key activation functions, achieving strong convergence.
Our results highlight TeLU's potential to set a new standard in activation functions, driving more efficient and stable learning in deep neural networks.
- Score: 1.9116784879310025
- License:
- Abstract: We propose the Hyperbolic Tangent Exponential Linear Unit (TeLU), a neural network hidden activation function defined as TeLU(x)=xtanh(exp(x)). TeLU's design is grounded in the core principles of key activation functions, achieving strong convergence by closely approximating the identity function in its active region while effectively mitigating the vanishing gradient problem in its saturating region. Its simple formulation enhances computational efficiency, leading to improvements in scalability and convergence speed. Unlike many modern activation functions, TeLU seamlessly combines the simplicity and effectiveness of ReLU with the smoothness and analytic properties essential for learning stability in deep neural networks. TeLU's ability to mimic the behavior and optimal hyperparameter settings of ReLU, while introducing the benefits of smoothness and curvature, makes it an ideal drop-in replacement. Its analytic nature positions TeLU as a powerful universal approximator, enhancing both robustness and generalization across a multitude of experiments. We rigorously validate these claims through theoretical analysis and experimental validation, demonstrating TeLU's performance across challenging benchmarks; including ResNet18 on ImageNet, Dynamic-Pooling Transformers on Text8, and Recurrent Neural Networks (RNNs) on the Penn TreeBank dataset. These results highlight TeLU's potential to set a new standard in activation functions, driving more efficient and stable learning in deep neural networks, thereby accelerating scientific discoveries across various fields.
Related papers
- Gompertz Linear Units: Leveraging Asymmetry for Enhanced Learning Dynamics [39.0860823332923]
GoLU is a novel self-gated activation function defined as $mathrmGoLU(x) = x, mathrmGompertz(x)$, wheremathrmGompertz(x) = e-e-x$.
GoLU's superior performance relative to state-of-the-art activation functions, highlights GoLU as a robust alternative to existing activation functions.
arXiv Detail & Related papers (2025-02-05T22:32:22Z) - Activation function optimization method: Learnable series linear units (LSLUs) [12.089173508371246]
We propose a series-based learnable ac-tivation function called LSLU (Learnable Series Linear Units)
This method simplifies deep learning networks while im-proving accuracy.
We evaluate LSLU's performance on CIFAR10, CIFAR100, and specific task datasets (e.g., Silkworm)
arXiv Detail & Related papers (2024-08-28T11:12:27Z) - Stable and Robust Deep Learning By Hyperbolic Tangent Exponential Linear
Unit (TeLU) [2.1485350418225244]
We introduce a novel neural network activation function, represented as $f(x) = xcdottanh(ex)$.
TeLU is designed to overcome the limitations of conventional activation functions like ReLU, GELU, and Mish.
Our theoretical analysis and empirical assessments reveal that TeLU outperforms existing activation functions in stability and robustness.
arXiv Detail & Related papers (2024-02-05T07:56:02Z) - Fixing the NTK: From Neural Network Linearizations to Exact Convex
Programs [63.768739279562105]
We show that for a particular choice of mask weights that do not depend on the learning targets, this kernel is equivalent to the NTK of the gated ReLU network on the training data.
A consequence of this lack of dependence on the targets is that the NTK cannot perform better than the optimal MKL kernel on the training set.
arXiv Detail & Related papers (2023-09-26T17:42:52Z) - Parametric Leaky Tanh: A New Hybrid Activation Function for Deep
Learning [0.0]
Activation functions (AFs) are crucial components of deep neural networks (DNNs)
We propose a novel hybrid activation function designed to combine the strengths of both the Tanh and Leaky ReLU activation functions.
PLanh is differentiable at all points and addresses the 'dying ReLU' problem by ensuring a non-zero gradient for negative inputs.
arXiv Detail & Related papers (2023-08-11T08:59:27Z) - Benign Overfitting in Deep Neural Networks under Lazy Training [72.28294823115502]
We show that when the data distribution is well-separated, DNNs can achieve Bayes-optimal test error for classification.
Our results indicate that interpolating with smoother functions leads to better generalization.
arXiv Detail & Related papers (2023-05-30T19:37:44Z) - TaLU: A Hybrid Activation Function Combining Tanh and Rectified Linear
Unit to Enhance Neural Networks [1.3477333339913569]
TaLU is a modified activation function combining Tanh and ReLU, which mitigates the dying gradient problem of ReLU.
Deep learning model with the proposed activation function was tested on MNIST and CIFAR-10.
arXiv Detail & Related papers (2023-05-08T01:13:59Z) - Globally Optimal Training of Neural Networks with Threshold Activation
Functions [63.03759813952481]
We study weight decay regularized training problems of deep neural networks with threshold activations.
We derive a simplified convex optimization formulation when the dataset can be shattered at a certain layer of the network.
arXiv Detail & Related papers (2023-03-06T18:59:13Z) - Implicit Stochastic Gradient Descent for Training Physics-informed
Neural Networks [51.92362217307946]
Physics-informed neural networks (PINNs) have effectively been demonstrated in solving forward and inverse differential equation problems.
PINNs are trapped in training failures when the target functions to be approximated exhibit high-frequency or multi-scale features.
In this paper, we propose to employ implicit gradient descent (ISGD) method to train PINNs for improving the stability of training process.
arXiv Detail & Related papers (2023-03-03T08:17:47Z) - NeuralStagger: Accelerating Physics-constrained Neural PDE Solver with
Spatial-temporal Decomposition [67.46012350241969]
This paper proposes a general acceleration methodology called NeuralStagger.
It decomposing the original learning tasks into several coarser-resolution subtasks.
We demonstrate the successful application of NeuralStagger on 2D and 3D fluid dynamics simulations.
arXiv Detail & Related papers (2023-02-20T19:36:52Z) - Comparisons among different stochastic selection of activation layers
for convolutional neural networks for healthcare [77.99636165307996]
We classify biomedical images using ensembles of neural networks.
We select our activations among the following ones: ReLU, leaky ReLU, Parametric ReLU, ELU, Adaptive Piecewice Linear Unit, S-Shaped ReLU, Swish, Mish, Mexican Linear Unit, Parametric Deformable Linear Unit, Soft Root Sign.
arXiv Detail & Related papers (2020-11-24T01:53:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.