ErfReLU: Adaptive Activation Function for Deep Neural Network
- URL: http://arxiv.org/abs/2306.01822v1
- Date: Fri, 2 Jun 2023 13:41:47 GMT
- Title: ErfReLU: Adaptive Activation Function for Deep Neural Network
- Authors: Ashish Rajanand, Pradeep Singh
- Abstract summary: Recent research has found that the activation function selected for adding non-linearity into the output can have a big impact on how effectively deep learning networks perform.
Researchers recently started developing activation functions that can be trained throughout the learning process.
State of art activation functions like Sigmoid, ReLU, Tanh, and their properties have been briefly explained.
- Score: 1.9336815376402716
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent research has found that the activation function (AF) selected for
adding non-linearity into the output can have a big impact on how effectively
deep learning networks perform. Developing activation functions that can adapt
simultaneously with learning is a need of time. Researchers recently started
developing activation functions that can be trained throughout the learning
process, known as trainable, or adaptive activation functions (AAF). Research
on AAF that enhance the outcomes is still in its early stages. In this paper, a
novel activation function 'ErfReLU' has been developed based on the erf
function and ReLU. This function exploits the ReLU and the error function (erf)
to its advantage. State of art activation functions like Sigmoid, ReLU, Tanh,
and their properties have been briefly explained. Adaptive activation functions
like Tanhsoft1, Tanhsoft2, Tanhsoft3, TanhLU, SAAF, ErfAct, Pserf, Smish, and
Serf have also been described. Lastly, performance analysis of 9 trainable
activation functions along with the proposed one namely Tanhsoft1, Tanhsoft2,
Tanhsoft3, TanhLU, SAAF, ErfAct, Pserf, Smish, and Serf has been shown by
applying these activation functions in MobileNet, VGG16, and ResNet models on
CIFAR-10, MNIST, and FMNIST benchmark datasets.
Related papers
- Sparsing Law: Towards Large Language Models with Greater Activation Sparsity [62.09617609556697]
Activation sparsity denotes the existence of substantial weakly-contributed elements within activation outputs that can be eliminated.
We propose PPL-$p%$ sparsity, a precise and performance-aware activation sparsity metric.
We show that ReLU is more efficient as the activation function than SiLU and can leverage more training data to improve activation sparsity.
arXiv Detail & Related papers (2024-11-04T17:59:04Z) - ProSparse: Introducing and Enhancing Intrinsic Activation Sparsity within Large Language Models [74.59731375779934]
Activation sparsity refers to the existence of weakly-contributed elements among activation outputs.
This paper introduces a simple and effective sparsification method named "ProSparse" to push LLMs for higher activation sparsity.
arXiv Detail & Related papers (2024-02-21T03:58:49Z) - ReLU$^2$ Wins: Discovering Efficient Activation Functions for Sparse
LLMs [91.31204876440765]
We introduce a general method that defines neuron activation through neuron output magnitudes and a tailored magnitude threshold.
To find the most efficient activation function for sparse computation, we propose a systematic framework.
We conduct thorough experiments on LLMs utilizing different activation functions, including ReLU, SwiGLU, ReGLU, and ReLU$2$.
arXiv Detail & Related papers (2024-02-06T08:45:51Z) - FIND: A Function Description Benchmark for Evaluating Interpretability
Methods [86.80718559904854]
This paper introduces FIND (Function INterpretation and Description), a benchmark suite for evaluating automated interpretability methods.
FIND contains functions that resemble components of trained neural networks, and accompanying descriptions of the kind we seek to generate.
We evaluate methods that use pretrained language models to produce descriptions of function behavior in natural language and code.
arXiv Detail & Related papers (2023-09-07T17:47:26Z) - Evaluating CNN with Oscillatory Activation Function [0.0]
CNNs capability to learn high-dimensional complex features from the images is the non-linearity introduced by the activation function.
This paper explores the performance of one of the CNN architecture ALexNet on MNIST and CIFAR10 datasets using oscillating activation function (GCU) and some other commonly used activation functions like ReLu, PReLu, and Mish.
arXiv Detail & Related papers (2022-11-13T11:17:13Z) - How important are activation functions in regression and classification?
A survey, performance comparison, and future directions [0.0]
We survey the activation functions that have been employed in the past as well as the current state-of-the-art.
In recent years, a physics-informed machine learning framework has emerged for solving problems related to scientific computations.
arXiv Detail & Related papers (2022-09-06T17:51:52Z) - Transformers with Learnable Activation Functions [63.98696070245065]
We use Rational Activation Function (RAF) to learn optimal activation functions during training according to input data.
RAF opens a new research direction for analyzing and interpreting pre-trained models according to the learned activation functions.
arXiv Detail & Related papers (2022-08-30T09:47:31Z) - Activation Functions: Dive into an optimal activation function [1.52292571922932]
We find an optimal activation function by defining it as a weighted sum of existing activation functions.
The study uses three activation functions, ReLU, tanh, and sin, over three popular image datasets.
arXiv Detail & Related papers (2022-02-24T12:44:11Z) - eQE 2.0: Subsystem DFT Beyond GGA Functionals [58.720142291102135]
subsystem-DFT (sDFT) can dramatically reduce the computational cost of large-scale electronic structure calculations.
The key ingredients of sDFT are the nonadditive kinetic energy and exchange-correlation functionals which dominate it's accuracy.
eQE 2.0 delivers excellent interaction energies compared to conventional Kohn-Sham DFT and CCSD(T)
arXiv Detail & Related papers (2021-03-12T22:26:36Z) - Discovering Parametric Activation Functions [17.369163074697475]
This paper proposes a technique for customizing activation functions automatically, resulting in reliable improvements in performance.
Experiments with four different neural network architectures on the CIFAR-10 and CIFAR-100 image classification datasets show that this approach is effective.
arXiv Detail & Related papers (2020-06-05T00:25:33Z) - Evolutionary Optimization of Deep Learning Activation Functions [15.628118691027328]
We show that evolutionary algorithms can discover novel activation functions that outperform the Rectified Linear Unit (ReLU)
replacing ReLU with evolved activation functions results in statistically significant increases in network accuracy.
These novel activation functions are shown to generalize, achieving high performance across tasks.
arXiv Detail & Related papers (2020-02-17T19:54:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.