Cooperative Initialization based Deep Neural Network Training
- URL: http://arxiv.org/abs/2001.01240v1
- Date: Sun, 5 Jan 2020 14:08:46 GMT
- Title: Cooperative Initialization based Deep Neural Network Training
- Authors: Pravendra Singh, Munender Varshney, Vinay P. Namboodiri
- Abstract summary: Our approach uses multiple activation functions in the initial few epochs for the update of all sets of weight parameters while training the network.
Our approach outperforms various baselines and, at the same time, performs well over various tasks such as classification and detection.
- Score: 35.14235994478142
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Researchers have proposed various activation functions. These activation
functions help the deep network to learn non-linear behavior with a significant
effect on training dynamics and task performance. The performance of these
activations also depends on the initial state of the weight parameters, i.e.,
different initial state leads to a difference in the performance of a network.
In this paper, we have proposed a cooperative initialization for training the
deep network using ReLU activation function to improve the network performance.
Our approach uses multiple activation functions in the initial few epochs for
the update of all sets of weight parameters while training the network. These
activation functions cooperate to overcome their drawbacks in the update of
weight parameters, which in effect learn better "feature representation" and
boost the network performance later. Cooperative initialization based training
also helps in reducing the overfitting problem and does not increase the number
of parameters, inference (test) time in the final model while improving the
performance. Experiments show that our approach outperforms various baselines
and, at the same time, performs well over various tasks such as classification
and detection. The Top-1 classification accuracy of the model trained using our
approach improves by 2.8% for VGG-16 and 2.1% for ResNet-56 on CIFAR-100
dataset.
Related papers
- Effect of Choosing Loss Function when Using T-batching for
Representation Learning on Dynamic Networks [0.0]
T-batching is a valuable technique for training dynamic network models.
We have identified a limitation in the training loss function used with t-batching.
We propose two alternative loss functions that overcome these issues, resulting in enhanced training performance.
arXiv Detail & Related papers (2023-08-13T23:34:36Z) - ENN: A Neural Network with DCT Adaptive Activation Functions [2.2713084727838115]
We present Expressive Neural Network (ENN), a novel model in which the non-linear activation functions are modeled using the Discrete Cosine Transform (DCT)
This parametrization keeps the number of trainable parameters low, is appropriate for gradient-based schemes, and adapts to different learning tasks.
The performance of ENN outperforms state of the art benchmarks, providing above a 40% gap in accuracy in some scenarios.
arXiv Detail & Related papers (2023-07-02T21:46:30Z) - Improving Classification Neural Networks by using Absolute activation
function (MNIST/LeNET-5 example) [0.0]
It is shown that in deep networks Absolute activation does not cause vanishing and exploding gradients, and therefore Absolute activation can be used in both simple and deep neural networks.
It is shown that solving the MNIST problem with the LeNet-like architectures based on Absolute activation allows to significantly reduce the number of trained parameters in the neural network with improving the prediction accuracy.
arXiv Detail & Related papers (2023-04-23T22:17:58Z) - Globally Optimal Training of Neural Networks with Threshold Activation
Functions [63.03759813952481]
We study weight decay regularized training problems of deep neural networks with threshold activations.
We derive a simplified convex optimization formulation when the dataset can be shattered at a certain layer of the network.
arXiv Detail & Related papers (2023-03-06T18:59:13Z) - Improved Algorithms for Neural Active Learning [74.89097665112621]
We improve the theoretical and empirical performance of neural-network(NN)-based active learning algorithms for the non-parametric streaming setting.
We introduce two regret metrics by minimizing the population loss that are more suitable in active learning than the one used in state-of-the-art (SOTA) related work.
arXiv Detail & Related papers (2022-10-02T05:03:38Z) - PEA: Improving the Performance of ReLU Networks for Free by Using
Progressive Ensemble Activations [0.0]
Novel activation functions have been proposed to improve the performance of neural networks.
We propose methods that can be used to improve the performance of ReLU networks by using these efficient novel activations during model training.
arXiv Detail & Related papers (2022-07-28T13:29:07Z) - An Experimental Study of the Impact of Pre-training on the Pruning of a
Convolutional Neural Network [0.0]
In recent years, deep neural networks have known a wide success in various application domains.
Deep neural networks usually involve a large number of parameters, which correspond to the weights of the network.
The pruning methods notably attempt to reduce the size of the parameter set, by identifying and removing the irrelevant weights.
arXiv Detail & Related papers (2021-12-15T16:02:15Z) - CondenseNet V2: Sparse Feature Reactivation for Deep Networks [87.38447745642479]
Reusing features in deep networks through dense connectivity is an effective way to achieve high computational efficiency.
We propose an alternative approach named sparse feature reactivation (SFR), aiming at actively increasing the utility of features for reusing.
Our experiments show that the proposed models achieve promising performance on image classification (ImageNet and CIFAR) and object detection (MS COCO) in terms of both theoretical efficiency and practical speed.
arXiv Detail & Related papers (2021-04-09T14:12:43Z) - Revisiting Initialization of Neural Networks [72.24615341588846]
We propose a rigorous estimation of the global curvature of weights across layers by approximating and controlling the norm of their Hessian matrix.
Our experiments on Word2Vec and the MNIST/CIFAR image classification tasks confirm that tracking the Hessian norm is a useful diagnostic tool.
arXiv Detail & Related papers (2020-04-20T18:12:56Z) - Dynamic R-CNN: Towards High Quality Object Detection via Dynamic
Training [70.2914594796002]
We propose Dynamic R-CNN to adjust the label assignment criteria and the shape of regression loss function.
Our method improves upon ResNet-50-FPN baseline with 1.9% AP and 5.5% AP$_90$ on the MS dataset with no extra overhead.
arXiv Detail & Related papers (2020-04-13T15:20:25Z) - Parameter-Efficient Transfer from Sequential Behaviors for User Modeling
and Recommendation [111.44445634272235]
In this paper, we develop a parameter efficient transfer learning architecture, termed as PeterRec.
PeterRec allows the pre-trained parameters to remain unaltered during fine-tuning by injecting a series of re-learned neural networks.
We perform extensive experimental ablation to show the effectiveness of the learned user representation in five downstream tasks.
arXiv Detail & Related papers (2020-01-13T14:09:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.