Parametric Flatten-T Swish: An Adaptive Non-linear Activation Function
For Deep Learning
- URL: http://arxiv.org/abs/2011.03155v2
- Date: Sat, 26 Feb 2022 12:50:20 GMT
- Title: Parametric Flatten-T Swish: An Adaptive Non-linear Activation Function
For Deep Learning
- Authors: Hock Hung Chieng, Noorhaniza Wahid and Pauline Ong
- Abstract summary: Rectified Linear Unit (ReLU) has been the most popular activation function across the deep learning community.
This paper introduces Parametric Flatten-T Swish (PFTS) as an alternative to ReLU.
PFTS manifested higher non-linear approximation power during training and thereby improved the predictive performance of the networks.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Activation function is a key component in deep learning that performs
non-linear mappings between the inputs and outputs. Rectified Linear Unit
(ReLU) has been the most popular activation function across the deep learning
community. However, ReLU contains several shortcomings that can result in
inefficient training of the deep neural networks, these are: 1) the negative
cancellation property of ReLU tends to treat negative inputs as unimportant
information for the learning, resulting in a performance degradation; 2) the
inherent predefined nature of ReLU is unlikely to promote additional
flexibility, expressivity, and robustness to the networks; 3) the mean
activation of ReLU is highly positive and leads to bias shift effect in network
layers; and 4) the multilinear structure of ReLU restricts the non-linear
approximation power of the networks. To tackle these shortcomings, this paper
introduced Parametric Flatten-T Swish (PFTS) as an alternative to ReLU. By
taking ReLU as a baseline method, the experiments showed that PFTS improved
classification accuracy on SVHN dataset by 0.31%, 0.98%, 2.16%, 17.72%, 1.35%,
0.97%, 39.99%, and 71.83% on DNN-3A, DNN-3B, DNN-4, DNN- 5A, DNN-5B, DNN-5C,
DNN-6, and DNN-7, respectively. Besides, PFTS also achieved the highest mean
rank among the comparison methods. The proposed PFTS manifested higher
non-linear approximation power during training and thereby improved the
predictive performance of the networks.
Related papers
- Fixing the NTK: From Neural Network Linearizations to Exact Convex
Programs [63.768739279562105]
We show that for a particular choice of mask weights that do not depend on the learning targets, this kernel is equivalent to the NTK of the gated ReLU network on the training data.
A consequence of this lack of dependence on the targets is that the NTK cannot perform better than the optimal MKL kernel on the training set.
arXiv Detail & Related papers (2023-09-26T17:42:52Z) - Benign Overfitting in Deep Neural Networks under Lazy Training [72.28294823115502]
We show that when the data distribution is well-separated, DNNs can achieve Bayes-optimal test error for classification.
Our results indicate that interpolating with smoother functions leads to better generalization.
arXiv Detail & Related papers (2023-05-30T19:37:44Z) - Learning to Linearize Deep Neural Networks for Secure and Efficient
Private Inference [5.293553970082942]
Existing techniques to reduce ReLU operations often involve manual effort and sacrifice accuracy.
We first present a novel measure of non-linearity layers' ReLU sensitivity, enabling mitigation of the time-consuming manual efforts.
We then present SENet, a three-stage training method that automatically assigns per-layer ReLU counts, decides the ReLU locations for each layer's activation map, and trains a model with significantly fewer ReLUs.
arXiv Detail & Related papers (2023-01-23T03:33:38Z) - Recurrent Bilinear Optimization for Binary Neural Networks [58.972212365275595]
BNNs neglect the intrinsic bilinear relationship of real-valued weights and scale factors.
Our work is the first attempt to optimize BNNs from the bilinear perspective.
We obtain robust RBONNs, which show impressive performance over state-of-the-art BNNs on various models and datasets.
arXiv Detail & Related papers (2022-09-04T06:45:33Z) - Linearity Grafting: Relaxed Neuron Pruning Helps Certifiable Robustness [172.61581010141978]
Certifiable robustness is a desirable property for adopting deep neural networks (DNNs) in safety-critical scenarios.
We propose a novel solution to strategically manipulate neurons, by "grafting" appropriate levels of linearity.
arXiv Detail & Related papers (2022-06-15T22:42:29Z) - Piecewise Linear Units Improve Deep Neural Networks [0.0]
The activation function is at the heart of a deep neural networks nonlinearity.
Currently, many practitioners prefer the Rectified Linear Unit (ReLU) due to its simplicity and reliability.
We propose an adaptive piecewise linear activation function, the Piecewise Linear Unit (PiLU), which can be learned independently for each dimension of the neural network.
arXiv Detail & Related papers (2021-08-02T08:09:38Z) - Learning to Solve the AC-OPF using Sensitivity-Informed Deep Neural
Networks [52.32646357164739]
We propose a deep neural network (DNN) to solve the solutions of the optimal power flow (ACOPF)
The proposed SIDNN is compatible with a broad range of OPF schemes.
It can be seamlessly integrated in other learning-to-OPF schemes.
arXiv Detail & Related papers (2021-03-27T00:45:23Z) - A Novel Neural Network Training Framework with Data Assimilation [2.948167339160823]
A gradient-free training framework based on data assimilation is proposed to avoid the calculation of gradients.
The results show that the proposed training framework performed better than the gradient decent method.
arXiv Detail & Related papers (2020-10-06T11:12:23Z) - RIFLE: Backpropagation in Depth for Deep Transfer Learning through
Re-Initializing the Fully-connected LayEr [60.07531696857743]
Fine-tuning the deep convolution neural network(CNN) using a pre-trained model helps transfer knowledge learned from larger datasets to the target task.
We propose RIFLE - a strategy that deepens backpropagation in transfer learning settings.
RIFLE brings meaningful updates to the weights of deep CNN layers and improves low-level feature learning.
arXiv Detail & Related papers (2020-07-07T11:27:43Z) - LiSHT: Non-Parametric Linearly Scaled Hyperbolic Tangent Activation
Function for Neural Networks [14.943863837083496]
We propose a Linearly Scaled Hyperbolic Tangent (LiSHT) for Neural Networks (NNs) by scaling the Tanh linearly.
We observe the superior performance using Multi-layer Perceptron (MLP), Residual Network (ResNet) and Long-short term memory (LSTM) for data classification, image classification and tweets classification tasks.
arXiv Detail & Related papers (2019-01-01T02:24:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.