Related papers: SAU: Smooth activation function using convolution with approximate identities

SAU: Smooth activation function using convolution with approximate identities

URL: http://arxiv.org/abs/2109.13210v1
Date: Mon, 27 Sep 2021 17:31:04 GMT
Title: SAU: Smooth activation function using convolution with approximate identities
Authors: Koushik Biswas, Sandeep Kumar, Shilpak Banerjee, Ashish Kumar Pandey
Abstract summary: Well-known activation functions like ReLU or Leaky ReLU are non-differentiable at the origin. We propose new smooth approximations of a non-differentiable activation function by convolving it with approximate identities.
Score: 1.5267236995686555
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Well-known activation functions like ReLU or Leaky ReLU are non-differentiable at the origin. Over the years, many smooth approximations of ReLU have been proposed using various smoothing techniques. We propose new smooth approximations of a non-differentiable activation function by convolving it with approximate identities. In particular, we present smooth approximations of Leaky ReLU and show that they outperform several well-known activation functions in various datasets and models. We call this function Smooth Activation Unit (SAU). Replacing ReLU by SAU, we get 5.12% improvement with ShuffleNet V2 (2.0x) model on CIFAR100 dataset.

Related papers

Zorro: A Flexible and Differentiable Parametric Family of Activation Functions That Extends ReLU and GELU [0.0]
More than 400 functions have been proposed over the last 30 years, including fixed or trainable parameters, but only a few are widely used. This article introduces a novel set of activation functions called Zorro, a continuously differentiable and flexible family comprising five main functions fusing ReLU and Sigmoid.
arXiv Detail & Related papers (2024-09-28T05:04:56Z)
ReLU$^2$ Wins: Discovering Efficient Activation Functions for Sparse LLMs [91.31204876440765]
We introduce a general method that defines neuron activation through neuron output magnitudes and a tailored magnitude threshold. To find the most efficient activation function for sparse computation, we propose a systematic framework. We conduct thorough experiments on LLMs utilizing different activation functions, including ReLU, SwiGLU, ReGLU, and ReLU$2$.
arXiv Detail & Related papers (2024-02-06T08:45:51Z)
A Non-monotonic Smooth Activation Function [4.269446061678759]
Activation functions are crucial in deep learning models since they introduce non-linearity into the networks. In this study, we propose a new activation function called Sqish, which is a non-monotonic and smooth function. We showed its superiority in classification, object detection, segmentation tasks, and adversarial robustness experiments.
arXiv Detail & Related papers (2023-10-16T07:09:47Z)
Parametric Leaky Tanh: A New Hybrid Activation Function for Deep Learning [0.0]
Activation functions (AFs) are crucial components of deep neural networks (DNNs) We propose a novel hybrid activation function designed to combine the strengths of both the Tanh and Leaky ReLU activation functions. PLanh is differentiable at all points and addresses the 'dying ReLU' problem by ensuring a non-zero gradient for negative inputs.
arXiv Detail & Related papers (2023-08-11T08:59:27Z)
On Convergence of Incremental Gradient for Non-Convex Smooth Functions [63.51187646914962]
In machine learning and network optimization, algorithms like shuffle SGD are popular due to minimizing the number of misses and good cache. This paper delves into the convergence properties SGD algorithms with arbitrary data ordering.
arXiv Detail & Related papers (2023-05-30T17:47:27Z)
Neural Estimation of Submodular Functions with Applications to Differentiable Subset Selection [50.14730810124592]
Submodular functions and variants, through their ability to characterize diversity and coverage, have emerged as a key tool for data selection and summarization. We propose FLEXSUBNET, a family of flexible neural models for both monotone and non-monotone submodular functions.
arXiv Detail & Related papers (2022-10-20T06:00:45Z)
Nish: A Novel Negative Stimulated Hybrid Activation Function [5.482532589225552]
We propose a novel non-monotonic activation function called Negative Stimulated Hybrid Activation Function (Nish) It behaves like a Rectified Linear Unit (ReLU) function for values greater than zero, and a sinus-sigmoidal function for values less than zero. The proposed function incorporates the sigmoid and sine wave, allowing new dynamics over traditional ReLU activations.
arXiv Detail & Related papers (2022-10-17T13:32:52Z)
Graph-adaptive Rectified Linear Unit for Graph Neural Networks [64.92221119723048]
Graph Neural Networks (GNNs) have achieved remarkable success by extending traditional convolution to learning on non-Euclidean data. We propose Graph-adaptive Rectified Linear Unit (GReLU) which is a new parametric activation function incorporating the neighborhood information in a novel and efficient way. We conduct comprehensive experiments to show that our plug-and-play GReLU method is efficient and effective given different GNN backbones and various downstream tasks.
arXiv Detail & Related papers (2022-02-13T10:54:59Z)
SMU: smooth activation function for deep networks using smoothing maximum technique [1.5267236995686555]
We propose a new novel activation function based on approximation of known activation functions like Leaky ReLU. We have got 6.22% improvement in the CIFAR100 dataset with the ShuffleNet V2 model.
arXiv Detail & Related papers (2021-11-08T17:54:08Z)
Sparse Attention with Linear Units [60.399814410157425]
We introduce a novel, simple method for achieving sparsity in attention: we replace the softmax activation with a ReLU. Our model, which we call Rectified Linear Attention (ReLA), is easy to implement and more efficient than previously proposed sparse attention mechanisms. Our analysis shows that ReLA delivers high sparsity rate and head diversity, and the induced cross attention achieves better accuracy with respect to source-target word alignment.
arXiv Detail & Related papers (2021-04-14T17:52:38Z)
Dynamic ReLU [74.973224160508]
We propose dynamic ReLU (DY-ReLU), a dynamic input of parameters which are generated by a hyper function over all in-put elements. Compared to its static counterpart, DY-ReLU has negligible extra computational cost, but significantly more representation capability. By simply using DY-ReLU for MobileNetV2, the top-1 accuracy on ImageNet classification is boosted from 72.0% to 76.2% with only 5% additional FLOPs.
arXiv Detail & Related papers (2020-03-22T23:45:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.