Related papers: Efficient Search for Customized Activation Functions with Gradient Descent

Efficient Search for Customized Activation Functions with Gradient Descent

URL: http://arxiv.org/abs/2408.06820v1
Date: Tue, 13 Aug 2024 11:27:31 GMT
Title: Efficient Search for Customized Activation Functions with Gradient Descent
Authors: Lukas Strack, Mahmoud Safari, Frank Hutter,
Abstract summary: Different activation functions work best for different deep learning models. We propose a fine-grained search cell that combines basic mathematical operations to model activation functions. Our approach enables the identification of specialized activations, leading to improved performance in every model we tried.
Score: 42.20716255578699
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Different activation functions work best for different deep learning models. To exploit this, we leverage recent advancements in gradient-based search techniques for neural architectures to efficiently identify high-performing activation functions for a given application. We propose a fine-grained search cell that combines basic mathematical operations to model activation functions, allowing for the exploration of novel activations. Our approach enables the identification of specialized activations, leading to improved performance in every model we tried, from image classification to language models. Moreover, the identified activations exhibit strong transferability to larger models of the same type, as well as new datasets. Importantly, our automated process for creating customized activation functions is orders of magnitude more efficient than previous approaches. It can easily be applied on top of arbitrary deep learning pipelines and thus offers a promising practical avenue for enhancing deep learning architectures.

Related papers

Learning to Rank for Active Learning via Multi-Task Bilevel Optimization [29.207101107965563]
We propose a novel approach for active learning, which aims to select batches of unlabeled instances through a learned surrogate model for data acquisition. A key challenge in this approach is developing an acquisition function that generalizes well, as the history of data, which forms part of the utility function's input, grows over time.
arXiv Detail & Related papers (2023-10-25T22:50:09Z)
Efficient Activation Function Optimization through Surrogate Modeling [15.219959721479835]
This paper aims to improve the state of the art through three steps. First, the benchmark Act-Bench-CNN, Act-Bench-ResNet, and Act-Bench-ViT were created by training convolutional, residual, and vision transformer architectures. Second, a characterization of the benchmark space was developed, leading to a new surrogate-based method for optimization.
arXiv Detail & Related papers (2023-01-13T23:11:14Z)
Offline Reinforcement Learning with Differentiable Function Approximation is Provably Efficient [65.08966446962845]
offline reinforcement learning, which aims at optimizing decision-making strategies with historical data, has been extensively applied in real-life applications. We take a step by considering offline reinforcement learning with differentiable function class approximation (DFA) Most importantly, we show offline differentiable function approximation is provably efficient by analyzing the pessimistic fitted Q-learning algorithm.
arXiv Detail & Related papers (2022-10-03T07:59:42Z)
Transformers with Learnable Activation Functions [63.98696070245065]
We use Rational Activation Function (RAF) to learn optimal activation functions during training according to input data. RAF opens a new research direction for analyzing and interpreting pre-trained models according to the learned activation functions.
arXiv Detail & Related papers (2022-08-30T09:47:31Z)
Energy-based Latent Aligner for Incremental Learning [83.0135278697976]
Deep learning models tend to forget their earlier knowledge while incrementally learning new tasks. This behavior emerges because the parameter updates optimized for the new tasks may not align well with the updates suitable for older tasks. We propose ELI: Energy-based Latent Aligner for Incremental Learning.
arXiv Detail & Related papers (2022-03-28T17:57:25Z)
Gone Fishing: Neural Active Learning with Fisher Embeddings [55.08537975896764]
There is an increasing need for active learning algorithms that are compatible with deep neural networks. This article introduces BAIT, a practical representation of tractable, and high-performing active learning algorithm for neural networks.
arXiv Detail & Related papers (2021-06-17T17:26:31Z)
Evolution of Activation Functions: An Empirical Investigation [0.30458514384586394]
This work presents an evolutionary algorithm to automate the search for completely new activation functions. We compare these new evolved activation functions to other existing and commonly used activation functions.
arXiv Detail & Related papers (2021-05-30T20:08:20Z)
Efficient Feature Transformations for Discriminative and Generative Continual Learning [98.10425163678082]
We propose a simple task-specific feature map transformation strategy for continual learning. Theses provide powerful flexibility for learning new tasks, achieved with minimal parameters added to the base architecture. We demonstrate the efficacy and efficiency of our method with an extensive set of experiments in discriminative (CIFAR-100 and ImageNet-1K) and generative sequences of tasks.
arXiv Detail & Related papers (2021-03-25T01:48:14Z)
Discovering Parametric Activation Functions [17.369163074697475]
This paper proposes a technique for customizing activation functions automatically, resulting in reliable improvements in performance. Experiments with four different neural network architectures on the CIFAR-10 and CIFAR-100 image classification datasets show that this approach is effective.
arXiv Detail & Related papers (2020-06-05T00:25:33Z)
Evolutionary Optimization of Deep Learning Activation Functions [15.628118691027328]
We show that evolutionary algorithms can discover novel activation functions that outperform the Rectified Linear Unit (ReLU) replacing ReLU with evolved activation functions results in statistically significant increases in network accuracy. These novel activation functions are shown to generalize, achieving high performance across tasks.
arXiv Detail & Related papers (2020-02-17T19:54:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.