Training Efficient CNNS: Tweaking the Nuts and Bolts of Neural Networks
for Lighter, Faster and Robust Models
- URL: http://arxiv.org/abs/2205.12050v1
- Date: Mon, 23 May 2022 13:51:06 GMT
- Title: Training Efficient CNNS: Tweaking the Nuts and Bolts of Neural Networks
for Lighter, Faster and Robust Models
- Authors: Sabeesh Ethiraj, Bharath Kumar Bolla
- Abstract summary: We demonstrate how an efficient deep convolution network can be built in a phased manner by sequentially reducing the number of training parameters.
We achieved a SOTA accuracy of 99.2% on MNIST data with just 1500 parameters and an accuracy of 86.01% with just over 140K parameters on the CIFAR-10 dataset.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Deep Learning has revolutionized the fields of computer vision, natural
language understanding, speech recognition, information retrieval and more.
Many techniques have evolved over the past decade that made models lighter,
faster, and robust with better generalization. However, many deep learning
practitioners persist with pre-trained models and architectures trained mostly
on standard datasets such as Imagenet, MS-COCO, IMDB-Wiki Dataset, and
Kinetics-700 and are either hesitant or unaware of redesigning the architecture
from scratch that will lead to better performance. This scenario leads to
inefficient models that are not suitable on various devices such as mobile,
edge, and fog. In addition, these conventional training methods are of concern
as they consume a lot of computing power. In this paper, we revisit various
SOTA techniques that deal with architecture efficiency (Global Average Pooling,
depth-wise convolutions & squeeze and excitation, Blurpool), learning rate
(Cyclical Learning Rate), data augmentation (Mixup, Cutout), label manipulation
(label smoothing), weight space manipulation (stochastic weight averaging), and
optimizer (sharpness aware minimization). We demonstrate how an efficient deep
convolution network can be built in a phased manner by sequentially reducing
the number of training parameters and using the techniques mentioned above. We
achieved a SOTA accuracy of 99.2% on MNIST data with just 1500 parameters and
an accuracy of 86.01% with just over 140K parameters on the CIFAR-10 dataset.
Related papers
- Optimizing Dense Feed-Forward Neural Networks [0.0]
We propose a novel feed-forward neural network constructing method based on pruning and transfer learning.
Our approach can compress the number of parameters by more than 70%.
We also evaluate the transfer learning level comparing the refined model and the original one training from scratch a neural network.
arXiv Detail & Related papers (2023-12-16T23:23:16Z) - Dataset Quantization [72.61936019738076]
We present dataset quantization (DQ), a new framework to compress large-scale datasets into small subsets.
DQ is the first method that can successfully distill large-scale datasets such as ImageNet-1k with a state-of-the-art compression ratio.
arXiv Detail & Related papers (2023-08-21T07:24:29Z) - Learning Rate Curriculum [75.98230528486401]
We propose a novel curriculum learning approach termed Learning Rate Curriculum (LeRaC)
LeRaC uses a different learning rate for each layer of a neural network to create a data-agnostic curriculum during the initial training epochs.
We compare our approach with Curriculum by Smoothing (CBS), a state-of-the-art data-agnostic curriculum learning approach.
arXiv Detail & Related papers (2022-05-18T18:57:36Z) - Training Efficiency and Robustness in Deep Learning [2.6451769337566406]
We study approaches to improve the training efficiency and robustness of deep learning models.
We find that prioritizing learning on more informative training data increases convergence speed and improves generalization performance on test data.
We show that a redundancy-aware modification to the sampling of training data improves the training speed and develops an efficient method for detecting the diversity of training signal.
arXiv Detail & Related papers (2021-12-02T17:11:33Z) - Effective Model Sparsification by Scheduled Grow-and-Prune Methods [73.03533268740605]
We propose a novel scheduled grow-and-prune (GaP) methodology without pre-training the dense models.
Experiments have shown that such models can match or beat the quality of highly optimized dense models at 80% sparsity on a variety of tasks.
arXiv Detail & Related papers (2021-06-18T01:03:13Z) - Improving the Accuracy of Early Exits in Multi-Exit Architectures via
Curriculum Learning [88.17413955380262]
Multi-exit architectures allow deep neural networks to terminate their execution early in order to adhere to tight deadlines at the cost of accuracy.
We introduce a novel method called Multi-Exit Curriculum Learning that utilizes curriculum learning.
Our method consistently improves the accuracy of early exits compared to the standard training approach.
arXiv Detail & Related papers (2021-04-21T11:12:35Z) - Dataset Condensation with Differentiable Siamese Augmentation [30.571335208276246]
We focus on condensing large training sets into significantly smaller synthetic sets which can be used to train deep neural networks.
We propose Differentiable Siamese Augmentation that enables effective use of data augmentation to synthesize more informative synthetic images.
We show with only less than 1% data that our method achieves 99.6%, 94.9%, 88.5%, 71.5% relative performance on MNIST, FashionMNIST, SVHN, CIFAR10 respectively.
arXiv Detail & Related papers (2021-02-16T16:32:21Z) - Weight Update Skipping: Reducing Training Time for Artificial Neural
Networks [0.30458514384586394]
We propose a new training methodology for ANNs that exploits the observation of improvement of accuracy shows temporal variations.
During such time windows, we keep updating bias which ensures the network still trains and avoids overfitting.
Such a training approach virtually achieves the same accuracy with considerably less computational cost, thus lower training time.
arXiv Detail & Related papers (2020-12-05T15:12:10Z) - RIFLE: Backpropagation in Depth for Deep Transfer Learning through
Re-Initializing the Fully-connected LayEr [60.07531696857743]
Fine-tuning the deep convolution neural network(CNN) using a pre-trained model helps transfer knowledge learned from larger datasets to the target task.
We propose RIFLE - a strategy that deepens backpropagation in transfer learning settings.
RIFLE brings meaningful updates to the weights of deep CNN layers and improves low-level feature learning.
arXiv Detail & Related papers (2020-07-07T11:27:43Z) - Large-Scale Gradient-Free Deep Learning with Recursive Local
Representation Alignment [84.57874289554839]
Training deep neural networks on large-scale datasets requires significant hardware resources.
Backpropagation, the workhorse for training these networks, is an inherently sequential process that is difficult to parallelize.
We propose a neuro-biologically-plausible alternative to backprop that can be used to train deep networks.
arXiv Detail & Related papers (2020-02-10T16:20:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.