Efficient Activation Function Optimization through Surrogate Modeling
- URL: http://arxiv.org/abs/2301.05785v6
- Date: Wed, 8 Nov 2023 23:01:16 GMT
- Title: Efficient Activation Function Optimization through Surrogate Modeling
- Authors: Garrett Bingham and Risto Miikkulainen
- Abstract summary: This paper aims to improve the state of the art through three steps.
First, the benchmark Act-Bench-CNN, Act-Bench-ResNet, and Act-Bench-ViT were created by training convolutional, residual, and vision transformer architectures.
Second, a characterization of the benchmark space was developed, leading to a new surrogate-based method for optimization.
- Score: 15.219959721479835
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Carefully designed activation functions can improve the performance of neural
networks in many machine learning tasks. However, it is difficult for humans to
construct optimal activation functions, and current activation function search
algorithms are prohibitively expensive. This paper aims to improve the state of
the art through three steps: First, the benchmark datasets Act-Bench-CNN,
Act-Bench-ResNet, and Act-Bench-ViT were created by training convolutional,
residual, and vision transformer architectures from scratch with 2,913
systematically generated activation functions. Second, a characterization of
the benchmark space was developed, leading to a new surrogate-based method for
optimization. More specifically, the spectrum of the Fisher information matrix
associated with the model's predictive distribution at initialization and the
activation function's output distribution were found to be highly predictive of
performance. Third, the surrogate was used to discover improved activation
functions in several real-world tasks, with a surprising finding: a sigmoidal
design that outperformed all other activation functions was discovered,
challenging the status quo of always using rectifier nonlinearities in deep
learning. Each of these steps is a contribution in its own right; together they
serve as a practical and theoretical foundation for further research on
activation function optimization.
Related papers
- Adaptive Activation Functions for Predictive Modeling with Sparse
Experimental Data [2.012425476229879]
This study investigates the influence of adaptive or trainable activation functions on classification accuracy and predictive uncertainty in settings characterized by limited data availability.
Our investigation reveals that adaptive activation functions, such as Exponential Linear Unit (ELU) and Softplus, with individual trainable parameters, result in accurate and confident prediction models.
arXiv Detail & Related papers (2024-02-08T04:35:09Z) - ReLU$^2$ Wins: Discovering Efficient Activation Functions for Sparse
LLMs [91.31204876440765]
We introduce a general method that defines neuron activation through neuron output magnitudes and a tailored magnitude threshold.
To find the most efficient activation function for sparse computation, we propose a systematic framework.
We conduct thorough experiments on LLMs utilizing different activation functions, including ReLU, SwiGLU, ReGLU, and ReLU$2$.
arXiv Detail & Related papers (2024-02-06T08:45:51Z) - Promises and Pitfalls of the Linearized Laplace in Bayesian Optimization [73.80101701431103]
The linearized-Laplace approximation (LLA) has been shown to be effective and efficient in constructing Bayesian neural networks.
We study the usefulness of the LLA in Bayesian optimization and highlight its strong performance and flexibility.
arXiv Detail & Related papers (2023-04-17T14:23:43Z) - Bayesian optimization for sparse neural networks with trainable
activation functions [0.0]
We propose a trainable activation function whose parameters need to be estimated.
A fully Bayesian model is developed to automatically estimate from the learning data both the model weights and activation function parameters.
arXiv Detail & Related papers (2023-04-10T08:44:44Z) - Improved Algorithms for Neural Active Learning [74.89097665112621]
We improve the theoretical and empirical performance of neural-network(NN)-based active learning algorithms for the non-parametric streaming setting.
We introduce two regret metrics by minimizing the population loss that are more suitable in active learning than the one used in state-of-the-art (SOTA) related work.
arXiv Detail & Related papers (2022-10-02T05:03:38Z) - Transformers with Learnable Activation Functions [63.98696070245065]
We use Rational Activation Function (RAF) to learn optimal activation functions during training according to input data.
RAF opens a new research direction for analyzing and interpreting pre-trained models according to the learned activation functions.
arXiv Detail & Related papers (2022-08-30T09:47:31Z) - Stabilizing Q-learning with Linear Architectures for Provably Efficient
Learning [53.17258888552998]
This work proposes an exploration variant of the basic $Q$-learning protocol with linear function approximation.
We show that the performance of the algorithm degrades very gracefully under a novel and more permissive notion of approximation error.
arXiv Detail & Related papers (2022-06-01T23:26:51Z) - Efficient Neural Network Analysis with Sum-of-Infeasibilities [64.31536828511021]
Inspired by sum-of-infeasibilities methods in convex optimization, we propose a novel procedure for analyzing verification queries on networks with extensive branching functions.
An extension to a canonical case-analysis-based complete search procedure can be achieved by replacing the convex procedure executed at each search state with DeepSoI.
arXiv Detail & Related papers (2022-03-19T15:05:09Z) - Evolution of Activation Functions: An Empirical Investigation [0.30458514384586394]
This work presents an evolutionary algorithm to automate the search for completely new activation functions.
We compare these new evolved activation functions to other existing and commonly used activation functions.
arXiv Detail & Related papers (2021-05-30T20:08:20Z) - Discovering Parametric Activation Functions [17.369163074697475]
This paper proposes a technique for customizing activation functions automatically, resulting in reliable improvements in performance.
Experiments with four different neural network architectures on the CIFAR-10 and CIFAR-100 image classification datasets show that this approach is effective.
arXiv Detail & Related papers (2020-06-05T00:25:33Z) - Evolutionary Optimization of Deep Learning Activation Functions [15.628118691027328]
We show that evolutionary algorithms can discover novel activation functions that outperform the Rectified Linear Unit (ReLU)
replacing ReLU with evolved activation functions results in statistically significant increases in network accuracy.
These novel activation functions are shown to generalize, achieving high performance across tasks.
arXiv Detail & Related papers (2020-02-17T19:54:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.