Adaptively Customizing Activation Functions for Various Layers
- URL: http://arxiv.org/abs/2112.09442v1
- Date: Fri, 17 Dec 2021 11:23:03 GMT
- Title: Adaptively Customizing Activation Functions for Various Layers
- Authors: Haigen Hu, Aizhu Liu, Qiu Guan, Xiaoxin Li, Shengyong Chen, Qianwei
Zhou
- Abstract summary: In this work, a novel methodology is proposed to adaptively customize activation functions only by adding very few parameters to the traditional activation functions like Sigmoid, Tanh, and ReLU.
To verify the effectiveness of the proposed methodology, some theoretical and experimental analysis on accelerating the convergence and improving the performance is presented.
The results show that the proposed methodology is very simple but with significant performance in convergence speed, precision and generalization, and it can surpass other popular methods like ReLU and adaptive functions like Swish in almost all experiments in terms of overall performance.
- Score: 10.522556291990437
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: To enhance the nonlinearity of neural networks and increase their mapping
abilities between the inputs and response variables, activation functions play
a crucial role to model more complex relationships and patterns in the data. In
this work, a novel methodology is proposed to adaptively customize activation
functions only by adding very few parameters to the traditional activation
functions such as Sigmoid, Tanh, and ReLU. To verify the effectiveness of the
proposed methodology, some theoretical and experimental analysis on
accelerating the convergence and improving the performance is presented, and a
series of experiments are conducted based on various network models (such as
AlexNet, VGGNet, GoogLeNet, ResNet and DenseNet), and various datasets (such as
CIFAR10, CIFAR100, miniImageNet, PASCAL VOC and COCO) . To further verify the
validity and suitability in various optimization strategies and usage
scenarios, some comparison experiments are also implemented among different
optimization strategies (such as SGD, Momentum, AdaGrad, AdaDelta and ADAM) and
different recognition tasks like classification and detection. The results show
that the proposed methodology is very simple but with significant performance
in convergence speed, precision and generalization, and it can surpass other
popular methods like ReLU and adaptive functions like Swish in almost all
experiments in terms of overall performance.The code is publicly available at
https://github.com/HuHaigen/Adaptively-Customizing-Activation-Functions. The
package includes the proposed three adaptive activation functions for
reproducibility purposes.
Related papers
- A Method on Searching Better Activation Functions [15.180864683908878]
We propose Entropy-based Activation Function Optimization (EAFO) methodology for designing static activation functions in deep neural networks.
We derive a novel activation function from ReLU, known as Correction Regularized ReLU (CRReLU)
arXiv Detail & Related papers (2024-05-19T03:48:05Z) - Bayesian optimization for sparse neural networks with trainable
activation functions [0.0]
We propose a trainable activation function whose parameters need to be estimated.
A fully Bayesian model is developed to automatically estimate from the learning data both the model weights and activation function parameters.
arXiv Detail & Related papers (2023-04-10T08:44:44Z) - Stochastic Adaptive Activation Function [1.9199289015460212]
This study proposes a simple yet effective activation function that facilitates different thresholds and adaptive activations according to the positions of units and the contexts of inputs.
Experimental analysis demonstrates that our activation function can provide the benefits of more accurate prediction and earlier convergence in many deep learning applications.
arXiv Detail & Related papers (2022-10-21T01:57:25Z) - ECO-TR: Efficient Correspondences Finding Via Coarse-to-Fine Refinement [80.94378602238432]
We propose an efficient structure named Correspondence Efficient Transformer (ECO-TR) by finding correspondences in a coarse-to-fine manner.
To achieve this, multiple transformer blocks are stage-wisely connected to gradually refine the predicted coordinates.
Experiments on various sparse and dense matching tasks demonstrate the superiority of our method in both efficiency and effectiveness against existing state-of-the-arts.
arXiv Detail & Related papers (2022-09-25T13:05:33Z) - Learning specialized activation functions with the Piecewise Linear Unit [7.820667552233989]
We propose a new activation function called Piecewise Linear Unit(PWLU), which incorporates a carefully designed formulation and learning method.
It can learn specialized activation functions and achieves SOTA performance on large-scale datasets like ImageNet and COCO.
PWLU is also easy to implement and efficient at inference, which can be widely applied in real-world applications.
arXiv Detail & Related papers (2021-04-08T11:29:11Z) - Efficient Feature Transformations for Discriminative and Generative
Continual Learning [98.10425163678082]
We propose a simple task-specific feature map transformation strategy for continual learning.
Theses provide powerful flexibility for learning new tasks, achieved with minimal parameters added to the base architecture.
We demonstrate the efficacy and efficiency of our method with an extensive set of experiments in discriminative (CIFAR-100 and ImageNet-1K) and generative sequences of tasks.
arXiv Detail & Related papers (2021-03-25T01:48:14Z) - Adversarial Feature Augmentation and Normalization for Visual
Recognition [109.6834687220478]
Recent advances in computer vision take advantage of adversarial data augmentation to ameliorate the generalization ability of classification models.
Here, we present an effective and efficient alternative that advocates adversarial augmentation on intermediate feature embeddings.
We validate the proposed approach across diverse visual recognition tasks with representative backbone networks.
arXiv Detail & Related papers (2021-03-22T20:36:34Z) - Optimization-Inspired Learning with Architecture Augmentations and
Control Mechanisms for Low-Level Vision [74.9260745577362]
This paper proposes a unified optimization-inspired learning framework to aggregate Generative, Discriminative, and Corrective (GDC) principles.
We construct three propagative modules to effectively solve the optimization models with flexible combinations.
Experiments across varied low-level vision tasks validate the efficacy and adaptability of GDC.
arXiv Detail & Related papers (2020-12-10T03:24:53Z) - Fine-Grained Dynamic Head for Object Detection [68.70628757217939]
We propose a fine-grained dynamic head to conditionally select a pixel-level combination of FPN features from different scales for each instance.
Experiments demonstrate the effectiveness and efficiency of the proposed method on several state-of-the-art detection benchmarks.
arXiv Detail & Related papers (2020-12-07T08:16:32Z) - Adaptive Gradient Method with Resilience and Momentum [120.83046824742455]
We propose an Adaptive Gradient Method with Resilience and Momentum (AdaRem)
AdaRem adjusts the parameter-wise learning rate according to whether the direction of one parameter changes in the past is aligned with the direction of the current gradient.
Our method outperforms previous adaptive learning rate-based algorithms in terms of the training speed and the test error.
arXiv Detail & Related papers (2020-10-21T14:49:00Z) - Discovering Parametric Activation Functions [17.369163074697475]
This paper proposes a technique for customizing activation functions automatically, resulting in reliable improvements in performance.
Experiments with four different neural network architectures on the CIFAR-10 and CIFAR-100 image classification datasets show that this approach is effective.
arXiv Detail & Related papers (2020-06-05T00:25:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.