LiSHT: Non-Parametric Linearly Scaled Hyperbolic Tangent Activation
Function for Neural Networks
- URL: http://arxiv.org/abs/1901.05894v4
- Date: Fri, 17 Feb 2023 01:49:12 GMT
- Title: LiSHT: Non-Parametric Linearly Scaled Hyperbolic Tangent Activation
Function for Neural Networks
- Authors: Swalpa Kumar Roy, Suvojit Manna, Shiv Ram Dubey, Bidyut Baran
Chaudhuri
- Abstract summary: We propose a Linearly Scaled Hyperbolic Tangent (LiSHT) for Neural Networks (NNs) by scaling the Tanh linearly.
We observe the superior performance using Multi-layer Perceptron (MLP), Residual Network (ResNet) and Long-short term memory (LSTM) for data classification, image classification and tweets classification tasks.
- Score: 14.943863837083496
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The activation function in neural network introduces the non-linearity
required to deal with the complex tasks. Several activation/non-linearity
functions are developed for deep learning models. However, most of the existing
activation functions suffer due to the dying gradient problem and
non-utilization of the large negative input values. In this paper, we propose a
Linearly Scaled Hyperbolic Tangent (LiSHT) for Neural Networks (NNs) by scaling
the Tanh linearly. The proposed LiSHT is non-parametric and tackles the dying
gradient problem. We perform the experiments on benchmark datasets of different
type, such as vector data, image data and natural language data. We observe the
superior performance using Multi-layer Perceptron (MLP), Residual Network
(ResNet) and Long-short term memory (LSTM) for data classification, image
classification and tweets classification tasks, respectively. The accuracy on
CIFAR100 dataset using ResNet model with LiSHT is improved by 9.48, 3.40, 3.16,
4.26, and 1.17\% as compared to Tanh, ReLU, PReLU, LReLU, and Swish,
respectively. We also show the qualitative results using loss landscape, weight
distribution and activations maps in support of the proposed activation
function.
Related papers
- Just How Flexible are Neural Networks in Practice? [89.80474583606242]
It is widely believed that a neural network can fit a training set containing at least as many samples as it has parameters.
In practice, however, we only find solutions via our training procedure, including the gradient and regularizers, limiting flexibility.
arXiv Detail & Related papers (2024-06-17T12:24:45Z) - A Novel Explanation Against Linear Neural Networks [1.223779595809275]
Linear Regression and neural networks are widely used to model data.
We show that neural networks without activation functions, or linear neural networks, actually reduce both training and testing performance.
We prove this hypothesis through an analysis of the optimization of an LNN and rigorous testing comparing the performance between both LNNs and linear regression on noisy datasets.
arXiv Detail & Related papers (2023-12-30T09:44:51Z) - ReLU soothes the NTK condition number and accelerates optimization for
wide neural networks [9.374151703899047]
We show that ReLU leads to: it better separation for similar data, and it better conditioning of neural tangent kernel (NTK)
Our results imply that ReLU activation, as well as the depth of ReLU network, helps improve the gradient descent convergence rate.
arXiv Detail & Related papers (2023-05-15T17:22:26Z) - Globally Optimal Training of Neural Networks with Threshold Activation
Functions [63.03759813952481]
We study weight decay regularized training problems of deep neural networks with threshold activations.
We derive a simplified convex optimization formulation when the dataset can be shattered at a certain layer of the network.
arXiv Detail & Related papers (2023-03-06T18:59:13Z) - Exploring Linear Feature Disentanglement For Neural Networks [63.20827189693117]
Non-linear activation functions, e.g., Sigmoid, ReLU, and Tanh, have achieved great success in neural networks (NNs)
Due to the complex non-linear characteristic of samples, the objective of those activation functions is to project samples from their original feature space to a linear separable feature space.
This phenomenon ignites our interest in exploring whether all features need to be transformed by all non-linear functions in current typical NNs.
arXiv Detail & Related papers (2022-03-22T13:09:17Z) - Graph-adaptive Rectified Linear Unit for Graph Neural Networks [64.92221119723048]
Graph Neural Networks (GNNs) have achieved remarkable success by extending traditional convolution to learning on non-Euclidean data.
We propose Graph-adaptive Rectified Linear Unit (GReLU) which is a new parametric activation function incorporating the neighborhood information in a novel and efficient way.
We conduct comprehensive experiments to show that our plug-and-play GReLU method is efficient and effective given different GNN backbones and various downstream tasks.
arXiv Detail & Related papers (2022-02-13T10:54:59Z) - Scaling Neural Tangent Kernels via Sketching and Random Features [53.57615759435126]
Recent works report that NTK regression can outperform finitely-wide neural networks trained on small-scale datasets.
We design a near input-sparsity time approximation algorithm for NTK, by sketching the expansions of arc-cosine kernels.
We show that a linear regressor trained on our CNTK features matches the accuracy of exact CNTK on CIFAR-10 dataset while achieving 150x speedup.
arXiv Detail & Related papers (2021-06-15T04:44:52Z) - Learning specialized activation functions with the Piecewise Linear Unit [7.820667552233989]
We propose a new activation function called Piecewise Linear Unit(PWLU), which incorporates a carefully designed formulation and learning method.
It can learn specialized activation functions and achieves SOTA performance on large-scale datasets like ImageNet and COCO.
PWLU is also easy to implement and efficient at inference, which can be widely applied in real-world applications.
arXiv Detail & Related papers (2021-04-08T11:29:11Z) - Comparisons among different stochastic selection of activation layers
for convolutional neural networks for healthcare [77.99636165307996]
We classify biomedical images using ensembles of neural networks.
We select our activations among the following ones: ReLU, leaky ReLU, Parametric ReLU, ELU, Adaptive Piecewice Linear Unit, S-Shaped ReLU, Swish, Mish, Mexican Linear Unit, Parametric Deformable Linear Unit, Soft Root Sign.
arXiv Detail & Related papers (2020-11-24T01:53:39Z) - Overcoming Overfitting and Large Weight Update Problem in Linear
Rectifiers: Thresholded Exponential Rectified Linear Units [0.0]
"Thresholded exponential rectified linear unit" (TERELU) activation function works better in alleviating in overfitting: large weight update problem.
We will show better performance on the various using neural networks, considering TERELU activation method compared to other activation datasets.
arXiv Detail & Related papers (2020-06-04T11:55:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.