Piecewise Linear Units Improve Deep Neural Networks
- URL: http://arxiv.org/abs/2108.00700v1
- Date: Mon, 2 Aug 2021 08:09:38 GMT
- Title: Piecewise Linear Units Improve Deep Neural Networks
- Authors: Jordan Inturrisi, Sui Yang Khoo, Abbas Kouzani, Riccardo Pagliarella
- Abstract summary: The activation function is at the heart of a deep neural networks nonlinearity.
Currently, many practitioners prefer the Rectified Linear Unit (ReLU) due to its simplicity and reliability.
We propose an adaptive piecewise linear activation function, the Piecewise Linear Unit (PiLU), which can be learned independently for each dimension of the neural network.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The activation function is at the heart of a deep neural networks
nonlinearity; the choice of the function has great impact on the success of
training. Currently, many practitioners prefer the Rectified Linear Unit (ReLU)
due to its simplicity and reliability, despite its few drawbacks. While most
previous functions proposed to supplant ReLU have been hand-designed, recent
work on learning the function during training has shown promising results. In
this paper we propose an adaptive piecewise linear activation function, the
Piecewise Linear Unit (PiLU), which can be learned independently for each
dimension of the neural network. We demonstrate how PiLU is a generalised
rectifier unit and note its similarities with the Adaptive Piecewise Linear
Units, namely adaptive and piecewise linear. Across a distribution of 30
experiments, we show that for the same model architecture, hyperparameters, and
pre-processing, PiLU significantly outperforms ReLU: reducing classification
error by 18.53% on CIFAR-10 and 13.13% on CIFAR-100, for a minor increase in
the number of neurons. Further work should be dedicated to exploring
generalised piecewise linear units, as well as verifying these results across
other challenging domains and larger problems.
Related papers
- Activation function optimization method: Learnable series linear units (LSLUs) [12.089173508371246]
We propose a series-based learnable ac-tivation function called LSLU (Learnable Series Linear Units)
This method simplifies deep learning networks while im-proving accuracy.
We evaluate LSLU's performance on CIFAR10, CIFAR100, and specific task datasets (e.g., Silkworm)
arXiv Detail & Related papers (2024-08-28T11:12:27Z) - Just How Flexible are Neural Networks in Practice? [89.80474583606242]
It is widely believed that a neural network can fit a training set containing at least as many samples as it has parameters.
In practice, however, we only find solutions via our training procedure, including the gradient and regularizers, limiting flexibility.
arXiv Detail & Related papers (2024-06-17T12:24:45Z) - Slimmable Networks for Contrastive Self-supervised Learning [69.9454691873866]
Self-supervised learning makes significant progress in pre-training large models, but struggles with small models.
We introduce another one-stage solution to obtain pre-trained small models without the need for extra teachers.
A slimmable network consists of a full network and several weight-sharing sub-networks, which can be pre-trained once to obtain various networks.
arXiv Detail & Related papers (2022-09-30T15:15:05Z) - A Comprehensive Survey and Performance Analysis of Activation Functions
in Deep Learning [23.83339228535986]
Various types of neural networks have been introduced to deal with different types of problems.
The main goal of any neural network is to transform the non-linearly separable input data into more linearly separable abstract features.
The most popular and common non-linearity layers are activation functions (AFs), such as Logistic Sigmoid, Tanh, ReLU, ELU, Swish and Mish.
arXiv Detail & Related papers (2021-09-29T16:41:19Z) - Growing Cosine Unit: A Novel Oscillatory Activation Function That Can
Speedup Training and Reduce Parameters in Convolutional Neural Networks [0.1529342790344802]
Convolution neural networks have been successful in solving many socially important and economically significant problems.
Key discovery that made training deep networks feasible was the adoption of the Rectified Linear Unit (ReLU) activation function.
New activation function C(z) = z cos z outperforms Sigmoids, Swish, Mish and ReLU on a variety of architectures.
arXiv Detail & Related papers (2021-08-30T01:07:05Z) - Gone Fishing: Neural Active Learning with Fisher Embeddings [55.08537975896764]
There is an increasing need for active learning algorithms that are compatible with deep neural networks.
This article introduces BAIT, a practical representation of tractable, and high-performing active learning algorithm for neural networks.
arXiv Detail & Related papers (2021-06-17T17:26:31Z) - Comparisons among different stochastic selection of activation layers
for convolutional neural networks for healthcare [77.99636165307996]
We classify biomedical images using ensembles of neural networks.
We select our activations among the following ones: ReLU, leaky ReLU, Parametric ReLU, ELU, Adaptive Piecewice Linear Unit, S-Shaped ReLU, Swish, Mish, Mexican Linear Unit, Parametric Deformable Linear Unit, Soft Root Sign.
arXiv Detail & Related papers (2020-11-24T01:53:39Z) - Parametric Flatten-T Swish: An Adaptive Non-linear Activation Function
For Deep Learning [0.0]
Rectified Linear Unit (ReLU) has been the most popular activation function across the deep learning community.
This paper introduces Parametric Flatten-T Swish (PFTS) as an alternative to ReLU.
PFTS manifested higher non-linear approximation power during training and thereby improved the predictive performance of the networks.
arXiv Detail & Related papers (2020-11-06T01:50:46Z) - Activation Functions: Do They Represent A Trade-Off Between Modular
Nature of Neural Networks And Task Performance [2.5919242494186037]
Key factors in designing neural network architectures involve choosing number of filters for every convolution layer, number of hidden neurons for every fully connected layer, dropout and pruning.
The default activation function in most cases is the ReLU, as it has empirically shown faster training convergence.
arXiv Detail & Related papers (2020-09-16T16:38:16Z) - Towards Efficient Processing and Learning with Spikes: New Approaches
for Multi-Spike Learning [59.249322621035056]
We propose two new multi-spike learning rules which demonstrate better performance over other baselines on various tasks.
In the feature detection task, we re-examine the ability of unsupervised STDP with its limitations being presented.
Our proposed learning rules can reliably solve the task over a wide range of conditions without specific constraints being applied.
arXiv Detail & Related papers (2020-05-02T06:41:20Z) - Non-linear Neurons with Human-like Apical Dendrite Activations [81.18416067005538]
We show that a standard neuron followed by our novel apical dendrite activation (ADA) can learn the XOR logical function with 100% accuracy.
We conduct experiments on six benchmark data sets from computer vision, signal processing and natural language processing.
arXiv Detail & Related papers (2020-02-02T21:09:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.