Related papers: TeLU Activation Function for Fast and Stable Deep Learning

TeLU Activation Function for Fast and Stable Deep Learning

URL: http://arxiv.org/abs/2412.20269v2
Date: Thu, 02 Jan 2025 02:32:43 GMT
Title: TeLU Activation Function for Fast and Stable Deep Learning
Authors: Alfredo Fernandez, Ankur Mali,
Abstract summary: Hyperbolic Tangent Exponential Linear Unit (TeLU) is a neural network hidden activation function defined as TeLU(x)=xtanh(exp(x))<n>TeLU's design is grounded in the core principles of key activation functions, achieving strong convergence.<n>Our results highlight TeLU's potential to set a new standard in activation functions, driving more efficient and stable learning in deep neural networks.
Score: 1.9116784879310025
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We propose the Hyperbolic Tangent Exponential Linear Unit (TeLU), a neural network hidden activation function defined as TeLU(x)=xtanh(exp(x)). TeLU's design is grounded in the core principles of key activation functions, achieving strong convergence by closely approximating the identity function in its active region while effectively mitigating the vanishing gradient problem in its saturating region. Its simple formulation enhances computational efficiency, leading to improvements in scalability and convergence speed. Unlike many modern activation functions, TeLU seamlessly combines the simplicity and effectiveness of ReLU with the smoothness and analytic properties essential for learning stability in deep neural networks. TeLU's ability to mimic the behavior and optimal hyperparameter settings of ReLU, while introducing the benefits of smoothness and curvature, makes it an ideal drop-in replacement. Its analytic nature positions TeLU as a powerful universal approximator, enhancing both robustness and generalization across a multitude of experiments. We rigorously validate these claims through theoretical analysis and experimental validation, demonstrating TeLU's performance across challenging benchmarks; including ResNet18 on ImageNet, Dynamic-Pooling Transformers on Text8, and Recurrent Neural Networks (RNNs) on the Penn TreeBank dataset. These results highlight TeLU's potential to set a new standard in activation functions, driving more efficient and stable learning in deep neural networks, thereby accelerating scientific discoveries across various fields.

Related papers

Global Convergence and Rich Feature Learning in $L$-Layer Infinite-Width Neural Networks under $μ$P Parametrization [66.03821840425539]
In this paper, we investigate the training dynamics of $L$-layer neural networks using the tensor gradient program (SGD) framework. We show that SGD enables these networks to learn linearly independent features that substantially deviate from their initial values. This rich feature space captures relevant data information and ensures that any convergent point of the training process is a global minimum.
arXiv Detail & Related papers (2025-03-12T17:33:13Z)
Gompertz Linear Units: Leveraging Asymmetry for Enhanced Learning Dynamics [39.0860823332923]
GoLU is a novel self-gated activation function defined as $mathrmGoLU(x) = x, mathrmGompertz(x)$, wheremathrmGompertz(x) = e-e-x$. GoLU's superior performance relative to state-of-the-art activation functions, highlights GoLU as a robust alternative to existing activation functions.
arXiv Detail & Related papers (2025-02-05T22:32:22Z)
Activation function optimization method: Learnable series linear units (LSLUs) [12.089173508371246]
We propose a series-based learnable ac-tivation function called LSLU (Learnable Series Linear Units) This method simplifies deep learning networks while im-proving accuracy. We evaluate LSLU's performance on CIFAR10, CIFAR100, and specific task datasets (e.g., Silkworm)
arXiv Detail & Related papers (2024-08-28T11:12:27Z)
Stable and Robust Deep Learning By Hyperbolic Tangent Exponential Linear Unit (TeLU) [2.1485350418225244]
We introduce a novel neural network activation function, represented as $f(x) = xcdottanh(ex)$. TeLU is designed to overcome the limitations of conventional activation functions like ReLU, GELU, and Mish. Our theoretical analysis and empirical assessments reveal that TeLU outperforms existing activation functions in stability and robustness.
arXiv Detail & Related papers (2024-02-05T07:56:02Z)
Fixing the NTK: From Neural Network Linearizations to Exact Convex Programs [63.768739279562105]
We show that for a particular choice of mask weights that do not depend on the learning targets, this kernel is equivalent to the NTK of the gated ReLU network on the training data. A consequence of this lack of dependence on the targets is that the NTK cannot perform better than the optimal MKL kernel on the training set.
arXiv Detail & Related papers (2023-09-26T17:42:52Z)
Parametric Leaky Tanh: A New Hybrid Activation Function for Deep Learning [0.0]
Activation functions (AFs) are crucial components of deep neural networks (DNNs) We propose a novel hybrid activation function designed to combine the strengths of both the Tanh and Leaky ReLU activation functions. PLanh is differentiable at all points and addresses the 'dying ReLU' problem by ensuring a non-zero gradient for negative inputs.
arXiv Detail & Related papers (2023-08-11T08:59:27Z)
Benign Overfitting in Deep Neural Networks under Lazy Training [72.28294823115502]
We show that when the data distribution is well-separated, DNNs can achieve Bayes-optimal test error for classification. Our results indicate that interpolating with smoother functions leads to better generalization.
arXiv Detail & Related papers (2023-05-30T19:37:44Z)
TaLU: A Hybrid Activation Function Combining Tanh and Rectified Linear Unit to Enhance Neural Networks [1.3477333339913569]
TaLU is a modified activation function combining Tanh and ReLU, which mitigates the dying gradient problem of ReLU. Deep learning model with the proposed activation function was tested on MNIST and CIFAR-10.
arXiv Detail & Related papers (2023-05-08T01:13:59Z)
Globally Optimal Training of Neural Networks with Threshold Activation Functions [63.03759813952481]
We study weight decay regularized training problems of deep neural networks with threshold activations. We derive a simplified convex optimization formulation when the dataset can be shattered at a certain layer of the network.
arXiv Detail & Related papers (2023-03-06T18:59:13Z)
NeuralStagger: Accelerating Physics-constrained Neural PDE Solver with Spatial-temporal Decomposition [67.46012350241969]
This paper proposes a general acceleration methodology called NeuralStagger. It decomposing the original learning tasks into several coarser-resolution subtasks. We demonstrate the successful application of NeuralStagger on 2D and 3D fluid dynamics simulations.
arXiv Detail & Related papers (2023-02-20T19:36:52Z)
SignReLU neural network and its approximation ability [8.94250977764275]
We explore the approximation ability of deep neural networks (DNNs) using a different activation function, called SignReLU. Our theoretical results demonstrate that SignReLU networks outperform rational and ReLU networks in terms of approximation performance.
arXiv Detail & Related papers (2022-10-19T02:59:31Z)
Comparisons among different stochastic selection of activation layers for convolutional neural networks for healthcare [77.99636165307996]
We classify biomedical images using ensembles of neural networks. We select our activations among the following ones: ReLU, leaky ReLU, Parametric ReLU, ELU, Adaptive Piecewice Linear Unit, S-Shaped ReLU, Swish, Mish, Mexican Linear Unit, Parametric Deformable Linear Unit, Soft Root Sign.
arXiv Detail & Related papers (2020-11-24T01:53:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.