Related papers: Discovering Parametric Activation Functions

Discovering Parametric Activation Functions

URL: http://arxiv.org/abs/2006.03179v5
Date: Fri, 21 Jan 2022 19:39:36 GMT
Title: Discovering Parametric Activation Functions
Authors: Garrett Bingham and Risto Miikkulainen
Abstract summary: This paper proposes a technique for customizing activation functions automatically, resulting in reliable improvements in performance. Experiments with four different neural network architectures on the CIFAR-10 and CIFAR-100 image classification datasets show that this approach is effective.
Score: 17.369163074697475
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Recent studies have shown that the choice of activation function can significantly affect the performance of deep learning networks. However, the benefits of novel activation functions have been inconsistent and task dependent, and therefore the rectified linear unit (ReLU) is still the most commonly used. This paper proposes a technique for customizing activation functions automatically, resulting in reliable improvements in performance. Evolutionary search is used to discover the general form of the function, and gradient descent to optimize its parameters for different parts of the network and over the learning process. Experiments with four different neural network architectures on the CIFAR-10 and CIFAR-100 image classification datasets show that this approach is effective. It discovers both general activation functions and specialized functions for different architectures, consistently improving accuracy over ReLU and other activation functions by significant margins. The approach can therefore be used as an automated optimization step in applying deep learning to new tasks.

Related papers

Task-Specific Activation Functions for Neuroevolution using Grammatical Evolution [0.0]
We introduce Neuvo GEAF - an innovative approach leveraging grammatical evolution (GE) to automatically evolve novel activation functions. Experiments conducted on well-known binary classification datasets show statistically significant improvements in F1-score (between 2.4% and 9.4%) over ReLU.
arXiv Detail & Related papers (2025-03-13T20:50:21Z)
Active Learning for Derivative-Based Global Sensitivity Analysis with Gaussian Processes [70.66864668709677]
We consider the problem of active learning for global sensitivity analysis of expensive black-box functions. Since function evaluations are expensive, we use active learning to prioritize experimental resources where they yield the most value. We propose novel active learning acquisition functions that directly target key quantities of derivative-based global sensitivity measures.
arXiv Detail & Related papers (2024-07-13T01:41:12Z)
APALU: A Trainable, Adaptive Activation Function for Deep Learning Networks [0.0]
We introduce a novel trainable activation function, adaptive piecewise approximated activation linear unit (APALU) Experiments reveal significant improvements over widely used activation functions for different tasks. APALU achieves 100% accuracy on a sign language recognition task with a limited dataset.
arXiv Detail & Related papers (2024-02-13T06:18:42Z)
GELU Activation Function in Deep Learning: A Comprehensive Mathematical Analysis and Performance [2.458437232470188]
We investigate the differentiability, boundedness, stationarity, and smoothness properties of the GELU activation function. Our results demonstrate the superior performance of GELU compared to other activation functions.
arXiv Detail & Related papers (2023-05-20T03:22:43Z)
Offline Reinforcement Learning with Differentiable Function Approximation is Provably Efficient [65.08966446962845]
offline reinforcement learning, which aims at optimizing decision-making strategies with historical data, has been extensively applied in real-life applications. We take a step by considering offline reinforcement learning with differentiable function class approximation (DFA) Most importantly, we show offline differentiable function approximation is provably efficient by analyzing the pessimistic fitted Q-learning algorithm.
arXiv Detail & Related papers (2022-10-03T07:59:42Z)
Improved Algorithms for Neural Active Learning [74.89097665112621]
We improve the theoretical and empirical performance of neural-network(NN)-based active learning algorithms for the non-parametric streaming setting. We introduce two regret metrics by minimizing the population loss that are more suitable in active learning than the one used in state-of-the-art (SOTA) related work.
arXiv Detail & Related papers (2022-10-02T05:03:38Z)
Transformers with Learnable Activation Functions [63.98696070245065]
We use Rational Activation Function (RAF) to learn optimal activation functions during training according to input data. RAF opens a new research direction for analyzing and interpreting pre-trained models according to the learned activation functions.
arXiv Detail & Related papers (2022-08-30T09:47:31Z)
Stabilizing Q-learning with Linear Architectures for Provably Efficient Learning [53.17258888552998]
This work proposes an exploration variant of the basic $Q$-learning protocol with linear function approximation. We show that the performance of the algorithm degrades very gracefully under a novel and more permissive notion of approximation error.
arXiv Detail & Related papers (2022-06-01T23:26:51Z)
Evolution of Activation Functions: An Empirical Investigation [0.30458514384586394]
This work presents an evolutionary algorithm to automate the search for completely new activation functions. We compare these new evolved activation functions to other existing and commonly used activation functions.
arXiv Detail & Related papers (2021-05-30T20:08:20Z)
Learning specialized activation functions with the Piecewise Linear Unit [7.820667552233989]
We propose a new activation function called Piecewise Linear Unit(PWLU), which incorporates a carefully designed formulation and learning method. It can learn specialized activation functions and achieves SOTA performance on large-scale datasets like ImageNet and COCO. PWLU is also easy to implement and efficient at inference, which can be widely applied in real-world applications.
arXiv Detail & Related papers (2021-04-08T11:29:11Z)
Efficient Feature Transformations for Discriminative and Generative Continual Learning [98.10425163678082]
We propose a simple task-specific feature map transformation strategy for continual learning. Theses provide powerful flexibility for learning new tasks, achieved with minimal parameters added to the base architecture. We demonstrate the efficacy and efficiency of our method with an extensive set of experiments in discriminative (CIFAR-100 and ImageNet-1K) and generative sequences of tasks.
arXiv Detail & Related papers (2021-03-25T01:48:14Z)
Evolutionary Optimization of Deep Learning Activation Functions [15.628118691027328]
We show that evolutionary algorithms can discover novel activation functions that outperform the Rectified Linear Unit (ReLU) replacing ReLU with evolved activation functions results in statistically significant increases in network accuracy. These novel activation functions are shown to generalize, achieving high performance across tasks.
arXiv Detail & Related papers (2020-02-17T19:54:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.