Learning specialized activation functions with the Piecewise Linear Unit
- URL: http://arxiv.org/abs/2104.03693v1
- Date: Thu, 8 Apr 2021 11:29:11 GMT
- Title: Learning specialized activation functions with the Piecewise Linear Unit
- Authors: Yucong Zhou, Zezhou Zhu, Zhao Zhong
- Abstract summary: We propose a new activation function called Piecewise Linear Unit(PWLU), which incorporates a carefully designed formulation and learning method.
It can learn specialized activation functions and achieves SOTA performance on large-scale datasets like ImageNet and COCO.
PWLU is also easy to implement and efficient at inference, which can be widely applied in real-world applications.
- Score: 7.820667552233989
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The choice of activation functions is crucial for modern deep neural
networks. Popular hand-designed activation functions like Rectified Linear
Unit(ReLU) and its variants show promising performance in various tasks and
models. Swish, the automatically discovered activation function, has been
proposed and outperforms ReLU on many challenging datasets. However, it has two
main drawbacks. First, the tree-based search space is highly discrete and
restricted, which is difficult for searching. Second, the sample-based
searching method is inefficient, making it infeasible to find specialized
activation functions for each dataset or neural architecture. To tackle these
drawbacks, we propose a new activation function called Piecewise Linear
Unit(PWLU), which incorporates a carefully designed formulation and learning
method. It can learn specialized activation functions and achieves SOTA
performance on large-scale datasets like ImageNet and COCO. For example, on
ImageNet classification dataset, PWLU improves 0.9%/0.53%/1.0%/1.7%/1.0% top-1
accuracy over Swish for
ResNet-18/ResNet-50/MobileNet-V2/MobileNet-V3/EfficientNet-B0. PWLU is also
easy to implement and efficient at inference, which can be widely applied in
real-world applications.
Related papers
- APALU: A Trainable, Adaptive Activation Function for Deep Learning
Networks [0.0]
We introduce a novel trainable activation function, adaptive piecewise approximated activation linear unit (APALU)
Experiments reveal significant improvements over widely used activation functions for different tasks.
APALU achieves 100% accuracy on a sign language recognition task with a limited dataset.
arXiv Detail & Related papers (2024-02-13T06:18:42Z) - ReLU$^2$ Wins: Discovering Efficient Activation Functions for Sparse
LLMs [91.31204876440765]
We introduce a general method that defines neuron activation through neuron output magnitudes and a tailored magnitude threshold.
To find the most efficient activation function for sparse computation, we propose a systematic framework.
We conduct thorough experiments on LLMs utilizing different activation functions, including ReLU, SwiGLU, ReGLU, and ReLU$2$.
arXiv Detail & Related papers (2024-02-06T08:45:51Z) - A Non-monotonic Smooth Activation Function [4.269446061678759]
Activation functions are crucial in deep learning models since they introduce non-linearity into the networks.
In this study, we propose a new activation function called Sqish, which is a non-monotonic and smooth function.
We showed its superiority in classification, object detection, segmentation tasks, and adversarial robustness experiments.
arXiv Detail & Related papers (2023-10-16T07:09:47Z) - Globally Optimal Training of Neural Networks with Threshold Activation
Functions [63.03759813952481]
We study weight decay regularized training problems of deep neural networks with threshold activations.
We derive a simplified convex optimization formulation when the dataset can be shattered at a certain layer of the network.
arXiv Detail & Related papers (2023-03-06T18:59:13Z) - Transformers with Learnable Activation Functions [63.98696070245065]
We use Rational Activation Function (RAF) to learn optimal activation functions during training according to input data.
RAF opens a new research direction for analyzing and interpreting pre-trained models according to the learned activation functions.
arXiv Detail & Related papers (2022-08-30T09:47:31Z) - Efficient Feature Transformations for Discriminative and Generative
Continual Learning [98.10425163678082]
We propose a simple task-specific feature map transformation strategy for continual learning.
Theses provide powerful flexibility for learning new tasks, achieved with minimal parameters added to the base architecture.
We demonstrate the efficacy and efficiency of our method with an extensive set of experiments in discriminative (CIFAR-100 and ImageNet-1K) and generative sequences of tasks.
arXiv Detail & Related papers (2021-03-25T01:48:14Z) - Lightweight Single-Image Super-Resolution Network with Attentive
Auxiliary Feature Learning [73.75457731689858]
We develop a computation efficient yet accurate network based on the proposed attentive auxiliary features (A$2$F) for SISR.
Experimental results on large-scale dataset demonstrate the effectiveness of the proposed model against the state-of-the-art (SOTA) SR methods.
arXiv Detail & Related papers (2020-11-13T06:01:46Z) - Discovering Parametric Activation Functions [17.369163074697475]
This paper proposes a technique for customizing activation functions automatically, resulting in reliable improvements in performance.
Experiments with four different neural network architectures on the CIFAR-10 and CIFAR-100 image classification datasets show that this approach is effective.
arXiv Detail & Related papers (2020-06-05T00:25:33Z) - Activation functions are not needed: the ratio net [3.9636371287541086]
This paper focus on designing a new function approximator.
Instead of designing new activation functions or kernel functions, the new proposed network uses the fractional form.
It shows that, in most cases, the ratio net converges faster and outperforms both the classification and the RBF.
arXiv Detail & Related papers (2020-05-14T01:07:56Z) - Evolving Normalization-Activation Layers [100.82879448303805]
We develop efficient rejection protocols to quickly filter out candidate layers that do not work well.
Our method leads to the discovery of EvoNorms, a set of new normalization-activation layers with novel, and sometimes surprising structures.
Our experiments show that EvoNorms work well on image classification models including ResNets, MobileNets and EfficientNets.
arXiv Detail & Related papers (2020-04-06T19:52:48Z) - Evolutionary Optimization of Deep Learning Activation Functions [15.628118691027328]
We show that evolutionary algorithms can discover novel activation functions that outperform the Rectified Linear Unit (ReLU)
replacing ReLU with evolved activation functions results in statistically significant increases in network accuracy.
These novel activation functions are shown to generalize, achieving high performance across tasks.
arXiv Detail & Related papers (2020-02-17T19:54:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.