PEA: Improving the Performance of ReLU Networks for Free by Using
Progressive Ensemble Activations
- URL: http://arxiv.org/abs/2207.14074v1
- Date: Thu, 28 Jul 2022 13:29:07 GMT
- Title: PEA: Improving the Performance of ReLU Networks for Free by Using
Progressive Ensemble Activations
- Authors: \'Akos Utasi
- Abstract summary: Novel activation functions have been proposed to improve the performance of neural networks.
We propose methods that can be used to improve the performance of ReLU networks by using these efficient novel activations during model training.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In recent years novel activation functions have been proposed to improve the
performance of neural networks, and they show superior performance compared to
the ReLU counterpart. However, there are environments, where the availability
of complex activations is limited, and usually only the ReLU is supported. In
this paper we propose methods that can be used to improve the performance of
ReLU networks by using these efficient novel activations during model training.
More specifically, we propose ensemble activations that are composed of the
ReLU and one of these novel activations. Furthermore, the coefficients of the
ensemble are neither fixed nor learned, but are progressively updated during
the training process in a way that by the end of the training only the ReLU
activations remain active in the network and the other activations can be
removed. This means that in inference time the network contains ReLU
activations only. We perform extensive evaluations on the ImageNet
classification task using various compact network architectures and various
novel activation functions. Results show 0.2-0.8% top-1 accuracy gain, which
confirms the applicability of the proposed methods. Furthermore, we demonstrate
the proposed methods on semantic segmentation and we boost the performance of a
compact segmentation network by 0.34% mIOU on the Cityscapes dataset.
Related papers
- A Method on Searching Better Activation Functions [15.180864683908878]
We propose Entropy-based Activation Function Optimization (EAFO) methodology for designing static activation functions in deep neural networks.
We derive a novel activation function from ReLU, known as Correction Regularized ReLU (CRReLU)
arXiv Detail & Related papers (2024-05-19T03:48:05Z) - Improving Classification Neural Networks by using Absolute activation
function (MNIST/LeNET-5 example) [0.0]
It is shown that in deep networks Absolute activation does not cause vanishing and exploding gradients, and therefore Absolute activation can be used in both simple and deep neural networks.
It is shown that solving the MNIST problem with the LeNet-like architectures based on Absolute activation allows to significantly reduce the number of trained parameters in the neural network with improving the prediction accuracy.
arXiv Detail & Related papers (2023-04-23T22:17:58Z) - Globally Optimal Training of Neural Networks with Threshold Activation
Functions [63.03759813952481]
We study weight decay regularized training problems of deep neural networks with threshold activations.
We derive a simplified convex optimization formulation when the dataset can be shattered at a certain layer of the network.
arXiv Detail & Related papers (2023-03-06T18:59:13Z) - Learning Bayesian Sparse Networks with Full Experience Replay for
Continual Learning [54.7584721943286]
Continual Learning (CL) methods aim to enable machine learning models to learn new tasks without catastrophic forgetting of those that have been previously mastered.
Existing CL approaches often keep a buffer of previously-seen samples, perform knowledge distillation, or use regularization techniques towards this goal.
We propose to only activate and select sparse neurons for learning current and past tasks at any stage.
arXiv Detail & Related papers (2022-02-21T13:25:03Z) - SMU: smooth activation function for deep networks using smoothing
maximum technique [1.5267236995686555]
We propose a new novel activation function based on approximation of known activation functions like Leaky ReLU.
We have got 6.22% improvement in the CIFAR100 dataset with the ShuffleNet V2 model.
arXiv Detail & Related papers (2021-11-08T17:54:08Z) - Bridging the Gap Between Target Networks and Functional Regularization [61.051716530459586]
We show that Target Networks act as an implicit regularizer which can be beneficial in some cases, but also have disadvantages.
We propose an explicit Functional Regularization alternative that is flexible and a convex regularizer in function space.
Our findings emphasize that Functional Regularization can be used as a drop-in replacement for Target Networks and result in performance improvement.
arXiv Detail & Related papers (2021-06-04T17:21:07Z) - CondenseNet V2: Sparse Feature Reactivation for Deep Networks [87.38447745642479]
Reusing features in deep networks through dense connectivity is an effective way to achieve high computational efficiency.
We propose an alternative approach named sparse feature reactivation (SFR), aiming at actively increasing the utility of features for reusing.
Our experiments show that the proposed models achieve promising performance on image classification (ImageNet and CIFAR) and object detection (MS COCO) in terms of both theoretical efficiency and practical speed.
arXiv Detail & Related papers (2021-04-09T14:12:43Z) - Incremental Embedding Learning via Zero-Shot Translation [65.94349068508863]
Current state-of-the-art incremental learning methods tackle catastrophic forgetting problem in traditional classification networks.
We propose a novel class-incremental method for embedding network, named as zero-shot translation class-incremental method (ZSTCI)
In addition, ZSTCI can easily be combined with existing regularization-based incremental learning methods to further improve performance of embedding networks.
arXiv Detail & Related papers (2020-12-31T08:21:37Z) - ReActNet: Towards Precise Binary Neural Network with Generalized
Activation Functions [76.05981545084738]
We propose several ideas for enhancing a binary network to close its accuracy gap from real-valued networks without incurring any additional computational cost.
We first construct a baseline network by modifying and binarizing a compact real-valued network with parameter-free shortcuts.
We show that the proposed ReActNet outperforms all the state-of-the-arts by a large margin.
arXiv Detail & Related papers (2020-03-07T02:12:02Z) - Evolutionary Optimization of Deep Learning Activation Functions [15.628118691027328]
We show that evolutionary algorithms can discover novel activation functions that outperform the Rectified Linear Unit (ReLU)
replacing ReLU with evolved activation functions results in statistically significant increases in network accuracy.
These novel activation functions are shown to generalize, achieving high performance across tasks.
arXiv Detail & Related papers (2020-02-17T19:54:26Z) - Cooperative Initialization based Deep Neural Network Training [35.14235994478142]
Our approach uses multiple activation functions in the initial few epochs for the update of all sets of weight parameters while training the network.
Our approach outperforms various baselines and, at the same time, performs well over various tasks such as classification and detection.
arXiv Detail & Related papers (2020-01-05T14:08:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.