Related papers: Parametric Leaky Tanh: A New Hybrid Activation Function for Deep Learning

Parametric Leaky Tanh: A New Hybrid Activation Function for Deep Learning

URL: http://arxiv.org/abs/2310.07720v1
Date: Fri, 11 Aug 2023 08:59:27 GMT
Title: Parametric Leaky Tanh: A New Hybrid Activation Function for Deep Learning
Authors: Stamatis Mastromichalakis
Abstract summary: Activation functions (AFs) are crucial components of deep neural networks (DNNs) We propose a novel hybrid activation function designed to combine the strengths of both the Tanh and Leaky ReLU activation functions. PLanh is differentiable at all points and addresses the 'dying ReLU' problem by ensuring a non-zero gradient for negative inputs.
Score: 0.0
License: http://creativecommons.org/publicdomain/zero/1.0/
Abstract: Activation functions (AFs) are crucial components of deep neural networks (DNNs), having a significant impact on their performance. An activation function in a DNN is typically a smooth, nonlinear function that transforms an input signal into an output signal for the subsequent layer. In this paper, we propose the Parametric Leaky Tanh (PLTanh), a novel hybrid activation function designed to combine the strengths of both the Tanh and Leaky ReLU (LReLU) activation functions. PLTanh is differentiable at all points and addresses the 'dying ReLU' problem by ensuring a non-zero gradient for negative inputs, consistent with the behavior of LReLU. By integrating the unique advantages of these two diverse activation functions, PLTanh facilitates the learning of more intricate nonlinear relationships within the network. This paper presents an empirical evaluation of PLTanh against established activation functions, namely ReLU, LReLU, and ALReLU utilizing five diverse datasets.

Related papers

R-Sparse: Rank-Aware Activation Sparsity for Efficient LLM Inference [77.47238561728459]
R-Sparse is a training-free activation sparsity approach capable of achieving high sparsity levels in advanced LLMs. Experiments on Llama-2/3 and Mistral models across ten diverse tasks demonstrate that R-Sparse achieves comparable performance at 50% model-level sparsity.
arXiv Detail & Related papers (2025-04-28T03:30:32Z)
Deriving Activation Functions via Integration [0.0]
Activation functions play a crucial role in introducing non-linearities to deep neural networks. We propose a novel approach to designing activation functions by focusing on their gradients and deriving the corresponding functions through integration. Our work introduces the Integral of the Exponential Linear Unit (xIELU), a trainable piecewise activation function derived by integrating trainable affine transformations applied on the ELU activation function.
arXiv Detail & Related papers (2024-11-20T03:24:21Z)
Sparsing Law: Towards Large Language Models with Greater Activation Sparsity [62.09617609556697]
Activation sparsity denotes the existence of substantial weakly-contributed elements within activation outputs that can be eliminated. We propose PPL-$p%$ sparsity, a precise and performance-aware activation sparsity metric. We show that ReLU is more efficient as the activation function than SiLU and can leverage more training data to improve activation sparsity.
arXiv Detail & Related papers (2024-11-04T17:59:04Z)
ReLU$^2$ Wins: Discovering Efficient Activation Functions for Sparse LLMs [91.31204876440765]
We introduce a general method that defines neuron activation through neuron output magnitudes and a tailored magnitude threshold. To find the most efficient activation function for sparse computation, we propose a systematic framework. We conduct thorough experiments on LLMs utilizing different activation functions, including ReLU, SwiGLU, ReGLU, and ReLU$2$.
arXiv Detail & Related papers (2024-02-06T08:45:51Z)
Generalized Activation via Multivariate Projection [46.837481855573145]
Activation functions are essential to introduce nonlinearity into neural networks. We consider ReLU as a projection from R onto the nonnegative half-line R+. We extend ReLU by substituting it with a generalized projection operator onto a convex cone, such as the Second-Order Cone (SOC) projection.
arXiv Detail & Related papers (2023-09-29T12:44:27Z)
Layer-wise Feedback Propagation [53.00944147633484]
We present Layer-wise Feedback Propagation (LFP), a novel training approach for neural-network-like predictors. LFP assigns rewards to individual connections based on their respective contributions to solving a given task. We demonstrate its effectiveness in achieving comparable performance to gradient descent on various models and datasets.
arXiv Detail & Related papers (2023-08-23T10:48:28Z)
Saturated Non-Monotonic Activation Functions [21.16866749728754]
We present three new activation functions built with our proposed method: SGELU, SSiLU, and SMish, which are composed of the negative portion of GELU, SiLU, and Mish, respectively, and ReLU's positive portion. The results of image classification experiments on CIFAR-100 indicate that our proposed activation functions are highly effective and outperform state-of-the-art baselines across multiple deep learning architectures.
arXiv Detail & Related papers (2023-05-12T15:01:06Z)
TaLU: A Hybrid Activation Function Combining Tanh and Rectified Linear Unit to Enhance Neural Networks [1.3477333339913569]
TaLU is a modified activation function combining Tanh and ReLU, which mitigates the dying gradient problem of ReLU. Deep learning model with the proposed activation function was tested on MNIST and CIFAR-10.
arXiv Detail & Related papers (2023-05-08T01:13:59Z)
Nish: A Novel Negative Stimulated Hybrid Activation Function [5.482532589225552]
We propose a novel non-monotonic activation function called Negative Stimulated Hybrid Activation Function (Nish) It behaves like a Rectified Linear Unit (ReLU) function for values greater than zero, and a sinus-sigmoidal function for values less than zero. The proposed function incorporates the sigmoid and sine wave, allowing new dynamics over traditional ReLU activations.
arXiv Detail & Related papers (2022-10-17T13:32:52Z)
Exploring Linear Feature Disentanglement For Neural Networks [63.20827189693117]
Non-linear activation functions, e.g., Sigmoid, ReLU, and Tanh, have achieved great success in neural networks (NNs) Due to the complex non-linear characteristic of samples, the objective of those activation functions is to project samples from their original feature space to a linear separable feature space. This phenomenon ignites our interest in exploring whether all features need to be transformed by all non-linear functions in current typical NNs.
arXiv Detail & Related papers (2022-03-22T13:09:17Z)
Graph-adaptive Rectified Linear Unit for Graph Neural Networks [64.92221119723048]
Graph Neural Networks (GNNs) have achieved remarkable success by extending traditional convolution to learning on non-Euclidean data. We propose Graph-adaptive Rectified Linear Unit (GReLU) which is a new parametric activation function incorporating the neighborhood information in a novel and efficient way. We conduct comprehensive experiments to show that our plug-and-play GReLU method is efficient and effective given different GNN backbones and various downstream tasks.
arXiv Detail & Related papers (2022-02-13T10:54:59Z)
Growing Cosine Unit: A Novel Oscillatory Activation Function That Can Speedup Training and Reduce Parameters in Convolutional Neural Networks [0.1529342790344802]
Convolution neural networks have been successful in solving many socially important and economically significant problems. Key discovery that made training deep networks feasible was the adoption of the Rectified Linear Unit (ReLU) activation function. New activation function C(z) = z cos z outperforms Sigmoids, Swish, Mish and ReLU on a variety of architectures.
arXiv Detail & Related papers (2021-08-30T01:07:05Z)
Investigating the interaction between gradient-only line searches and different activation functions [0.0]
Gradient-only line searches (GOLS) adaptively determine step sizes along search directions for discontinuous loss functions in neural network training. We find that GOLS are robust for a range of activation functions, but sensitive to the Rectified Linear Unit (ReLU) activation function in standard feedforward architectures.
arXiv Detail & Related papers (2020-02-23T12:28:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.