Enabling On-Device CNN Training by Self-Supervised Instance Filtering
and Error Map Pruning
- URL: http://arxiv.org/abs/2007.03213v1
- Date: Tue, 7 Jul 2020 05:52:37 GMT
- Title: Enabling On-Device CNN Training by Self-Supervised Instance Filtering
and Error Map Pruning
- Authors: Yawen Wu, Zhepeng Wang, Yiyu Shi, Jingtong Hu
- Abstract summary: This work aims to enable on-device training of convolutional neural networks (CNNs) by reducing the computation cost at training time.
CNN models are usually trained on high-performance computers and only the trained models are deployed to edge devices.
- Score: 17.272561332310303
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This work aims to enable on-device training of convolutional neural networks
(CNNs) by reducing the computation cost at training time. CNN models are
usually trained on high-performance computers and only the trained models are
deployed to edge devices. But the statically trained model cannot adapt
dynamically in a real environment and may result in low accuracy for new
inputs. On-device training by learning from the real-world data after
deployment can greatly improve accuracy. However, the high computation cost
makes training prohibitive for resource-constrained devices. To tackle this
problem, we explore the computational redundancies in training and reduce the
computation cost by two complementary approaches: self-supervised early
instance filtering on data level and error map pruning on the algorithm level.
The early instance filter selects important instances from the input stream to
train the network and drops trivial ones. The error map pruning further prunes
out insignificant computations when training with the selected instances.
Extensive experiments show that the computation cost is substantially reduced
without any or with marginal accuracy loss. For example, when training
ResNet-110 on CIFAR-10, we achieve 68% computation saving while preserving full
accuracy and 75% computation saving with a marginal accuracy loss of 1.3%.
Aggressive computation saving of 96% is achieved with less than 0.1% accuracy
loss when quantization is integrated into the proposed approaches. Besides,
when training LeNet on MNIST, we save 79% computation while boosting accuracy
by 0.2%.
Related papers
- Effective pruning of web-scale datasets based on complexity of concept
clusters [48.125618324485195]
We present a method for pruning large-scale multimodal datasets for training CLIP-style models on ImageNet.
We find that training on a smaller set of high-quality data can lead to higher performance with significantly lower training costs.
We achieve a new state-of-the-art Imagehttps://info.arxiv.org/help/prep#commentsNet zero-shot accuracy and a competitive average zero-shot accuracy on 38 evaluation tasks.
arXiv Detail & Related papers (2024-01-09T14:32:24Z) - KAKURENBO: Adaptively Hiding Samples in Deep Neural Network Training [2.8804804517897935]
We propose a method for hiding the least-important samples during the training of deep neural networks.
We adaptively find samples to exclude in a given epoch based on their contribution to the overall learning process.
Our method can reduce total training time by up to 22% impacting accuracy only by 0.4% compared to the baseline.
arXiv Detail & Related papers (2023-10-16T06:19:29Z) - Instant Complexity Reduction in CNNs using Locality-Sensitive Hashing [50.79602839359522]
We propose HASTE (Hashing for Tractable Efficiency), a parameter-free and data-free module that acts as a plug-and-play replacement for any regular convolution module.
We are able to drastically compress latent feature maps without sacrificing much accuracy by using locality-sensitive hashing (LSH)
In particular, we are able to instantly drop 46.72% of FLOPs while only losing 1.25% accuracy by just swapping the convolution modules in a ResNet34 on CIFAR-10 for our HASTE module.
arXiv Detail & Related papers (2023-09-29T13:09:40Z) - Efficient On-device Training via Gradient Filtering [14.484604762427717]
We propose a new gradient filtering approach which enables on-device CNN model training.
Our approach creates a special structure with fewer unique elements in the gradient map.
Our approach opens up a new direction of research with a huge potential for on-device training.
arXiv Detail & Related papers (2023-01-01T02:33:03Z) - Efficient Training of Spiking Neural Networks with Temporally-Truncated
Local Backpropagation through Time [1.926678651590519]
Training spiking neural networks (SNNs) has remained challenging due to complex neural dynamics and intrinsic non-differentiability in firing functions.
This work proposes an efficient and direct training algorithm for SNNs that integrates a locally-supervised training method with a temporally-truncated BPTT algorithm.
arXiv Detail & Related papers (2021-12-13T07:44:58Z) - Low Precision Decentralized Distributed Training with Heterogeneous Data [5.43185002439223]
We show the convergence of low precision decentralized training that aims to reduce the computational complexity of training and inference.
Experiments indicate that 8-bit decentralized training has minimal accuracy loss compared to its full precision counterpart even with heterogeneous data.
arXiv Detail & Related papers (2021-11-17T20:48:09Z) - LCS: Learning Compressible Subspaces for Adaptive Network Compression at
Inference Time [57.52251547365967]
We propose a method for training a "compressible subspace" of neural networks that contains a fine-grained spectrum of models.
We present results for achieving arbitrarily fine-grained accuracy-efficiency trade-offs at inference time for structured and unstructured sparsity.
Our algorithm extends to quantization at variable bit widths, achieving accuracy on par with individually trained networks.
arXiv Detail & Related papers (2021-10-08T17:03:34Z) - AC/DC: Alternating Compressed/DeCompressed Training of Deep Neural
Networks [78.62086125399831]
We present a general approach called Alternating Compressed/DeCompressed (AC/DC) training of deep neural networks (DNNs)
AC/DC outperforms existing sparse training methods in accuracy at similar computational budgets.
An important property of AC/DC is that it allows co-training of dense and sparse models, yielding accurate sparse-dense model pairs at the end of the training process.
arXiv Detail & Related papers (2021-06-23T13:23:00Z) - FracTrain: Fractionally Squeezing Bit Savings Both Temporally and
Spatially for Efficient DNN Training [81.85361544720885]
We propose FracTrain that integrates progressive fractional quantization which gradually increases the precision of activations, weights, and gradients.
FracTrain reduces computational cost and hardware-quantified energy/latency of DNN training while achieving a comparable or better (-0.12%+1.87%) accuracy.
arXiv Detail & Related papers (2020-12-24T05:24:10Z) - Predicting Training Time Without Training [120.92623395389255]
We tackle the problem of predicting the number of optimization steps that a pre-trained deep network needs to converge to a given value of the loss function.
We leverage the fact that the training dynamics of a deep network during fine-tuning are well approximated by those of a linearized model.
We are able to predict the time it takes to fine-tune a model to a given loss without having to perform any training.
arXiv Detail & Related papers (2020-08-28T04:29:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.