Related papers: Evolution of Activation Functions: An Empirical Investigation

Evolution of Activation Functions: An Empirical Investigation

URL: http://arxiv.org/abs/2105.14614v1
Date: Sun, 30 May 2021 20:08:20 GMT
Title: Evolution of Activation Functions: An Empirical Investigation
Authors: Andrew Nader and Danielle Azar
Abstract summary: This work presents an evolutionary algorithm to automate the search for completely new activation functions. We compare these new evolved activation functions to other existing and commonly used activation functions.
Score: 0.30458514384586394
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The hyper-parameters of a neural network are traditionally designed through a time consuming process of trial and error that requires substantial expert knowledge. Neural Architecture Search (NAS) algorithms aim to take the human out of the loop by automatically finding a good set of hyper-parameters for the problem at hand. These algorithms have mostly focused on hyper-parameters such as the architectural configurations of the hidden layers and the connectivity of the hidden neurons, but there has been relatively little work on automating the search for completely new activation functions, which are one of the most crucial hyper-parameters to choose. There are some widely used activation functions nowadays which are simple and work well, but nonetheless, there has been some interest in finding better activation functions. The work in the literature has mostly focused on designing new activation functions by hand, or choosing from a set of predefined functions while this work presents an evolutionary algorithm to automate the search for completely new activation functions. We compare these new evolved activation functions to other existing and commonly used activation functions. The results are favorable and are obtained from averaging the performance of the activation functions found over 30 runs, with experiments being conducted on 10 different datasets and architectures to ensure the statistical robustness of the study.

Related papers

Brain-Inspired Exploration of Functional Networks and Key Neurons in Large Language Models [53.91412558475662]
We use methods similar to those in the field of functional neuroimaging analysis to locate and identify functional networks in large language models (LLMs) Experimental results show that, similar to the human brain, LLMs contain functional networks that frequently recur during operation. Masking key functional networks significantly impairs the model's performance, while retaining just a subset is adequate to maintain effective operation.
arXiv Detail & Related papers (2025-02-13T04:42:39Z)
Not All Diffusion Model Activations Have Been Evaluated as Discriminative Features [115.33889811527533]
Diffusion models are initially designed for image generation. Recent research shows that the internal signals within their backbones, named activations, can also serve as dense features for various discriminative tasks.
arXiv Detail & Related papers (2024-10-04T16:05:14Z)
Efficient Search for Customized Activation Functions with Gradient Descent [42.20716255578699]
Different activation functions work best for different deep learning models. We propose a fine-grained search cell that combines basic mathematical operations to model activation functions. Our approach enables the identification of specialized activations, leading to improved performance in every model we tried.
arXiv Detail & Related papers (2024-08-13T11:27:31Z)
Active Learning for Derivative-Based Global Sensitivity Analysis with Gaussian Processes [70.66864668709677]
We consider the problem of active learning for global sensitivity analysis of expensive black-box functions. Since function evaluations are expensive, we use active learning to prioritize experimental resources where they yield the most value. We propose novel active learning acquisition functions that directly target key quantities of derivative-based global sensitivity measures.
arXiv Detail & Related papers (2024-07-13T01:41:12Z)
Evaluating CNN with Oscillatory Activation Function [0.0]
CNNs capability to learn high-dimensional complex features from the images is the non-linearity introduced by the activation function. This paper explores the performance of one of the CNN architecture ALexNet on MNIST and CIFAR10 datasets using oscillating activation function (GCU) and some other commonly used activation functions like ReLu, PReLu, and Mish.
arXiv Detail & Related papers (2022-11-13T11:17:13Z)
Transformers with Learnable Activation Functions [63.98696070245065]
We use Rational Activation Function (RAF) to learn optimal activation functions during training according to input data. RAF opens a new research direction for analyzing and interpreting pre-trained models according to the learned activation functions.
arXiv Detail & Related papers (2022-08-30T09:47:31Z)
Learning Bayesian Sparse Networks with Full Experience Replay for Continual Learning [54.7584721943286]
Continual Learning (CL) methods aim to enable machine learning models to learn new tasks without catastrophic forgetting of those that have been previously mastered. Existing CL approaches often keep a buffer of previously-seen samples, perform knowledge distillation, or use regularization techniques towards this goal. We propose to only activate and select sparse neurons for learning current and past tasks at any stage.
arXiv Detail & Related papers (2022-02-21T13:25:03Z)
Discovering Parametric Activation Functions [17.369163074697475]
This paper proposes a technique for customizing activation functions automatically, resulting in reliable improvements in performance. Experiments with four different neural network architectures on the CIFAR-10 and CIFAR-100 image classification datasets show that this approach is effective.
arXiv Detail & Related papers (2020-06-05T00:25:33Z)
A survey on modern trainable activation functions [0.0]
We propose a taxonomy of trainable activation functions and highlight common and distinctive proprieties of recent and past models. We show that many of the proposed approaches are equivalent to adding neuron layers which use fixed (non-trainable) activation functions.
arXiv Detail & Related papers (2020-05-02T12:38:43Z)
Evolving Normalization-Activation Layers [100.82879448303805]
We develop efficient rejection protocols to quickly filter out candidate layers that do not work well. Our method leads to the discovery of EvoNorms, a set of new normalization-activation layers with novel, and sometimes surprising structures. Our experiments show that EvoNorms work well on image classification models including ResNets, MobileNets and EfficientNets.
arXiv Detail & Related papers (2020-04-06T19:52:48Z)
Evolutionary Optimization of Deep Learning Activation Functions [15.628118691027328]
We show that evolutionary algorithms can discover novel activation functions that outperform the Rectified Linear Unit (ReLU) replacing ReLU with evolved activation functions results in statistically significant increases in network accuracy. These novel activation functions are shown to generalize, achieving high performance across tasks.
arXiv Detail & Related papers (2020-02-17T19:54:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.