Related papers: Enabling On-Device CNN Training by Self-Supervised Instance Filtering and Error Map Pruning

Enabling On-Device CNN Training by Self-Supervised Instance Filtering and Error Map Pruning

URL: http://arxiv.org/abs/2007.03213v1
Date: Tue, 7 Jul 2020 05:52:37 GMT
Title: Enabling On-Device CNN Training by Self-Supervised Instance Filtering and Error Map Pruning
Authors: Yawen Wu, Zhepeng Wang, Yiyu Shi, Jingtong Hu
Abstract summary: This work aims to enable on-device training of convolutional neural networks (CNNs) by reducing the computation cost at training time. CNN models are usually trained on high-performance computers and only the trained models are deployed to edge devices.
Score: 17.272561332310303
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This work aims to enable on-device training of convolutional neural networks (CNNs) by reducing the computation cost at training time. CNN models are usually trained on high-performance computers and only the trained models are deployed to edge devices. But the statically trained model cannot adapt dynamically in a real environment and may result in low accuracy for new inputs. On-device training by learning from the real-world data after deployment can greatly improve accuracy. However, the high computation cost makes training prohibitive for resource-constrained devices. To tackle this problem, we explore the computational redundancies in training and reduce the computation cost by two complementary approaches: self-supervised early instance filtering on data level and error map pruning on the algorithm level. The early instance filter selects important instances from the input stream to train the network and drops trivial ones. The error map pruning further prunes out insignificant computations when training with the selected instances. Extensive experiments show that the computation cost is substantially reduced without any or with marginal accuracy loss. For example, when training ResNet-110 on CIFAR-10, we achieve 68% computation saving while preserving full accuracy and 75% computation saving with a marginal accuracy loss of 1.3%. Aggressive computation saving of 96% is achieved with less than 0.1% accuracy loss when quantization is integrated into the proposed approaches. Besides, when training LeNet on MNIST, we save 79% computation while boosting accuracy by 0.2%.

Related papers

Effective pruning of web-scale datasets based on complexity of concept clusters [48.125618324485195]
We present a method for pruning large-scale multimodal datasets for training CLIP-style models on ImageNet. We find that training on a smaller set of high-quality data can lead to higher performance with significantly lower training costs. We achieve a new state-of-the-art Imagehttps://info.arxiv.org/help/prep#commentsNet zero-shot accuracy and a competitive average zero-shot accuracy on 38 evaluation tasks.
arXiv Detail & Related papers (2024-01-09T14:32:24Z)
KAKURENBO: Adaptively Hiding Samples in Deep Neural Network Training [2.8804804517897935]
We propose a method for hiding the least-important samples during the training of deep neural networks. We adaptively find samples to exclude in a given epoch based on their contribution to the overall learning process. Our method can reduce total training time by up to 22% impacting accuracy only by 0.4% compared to the baseline.
arXiv Detail & Related papers (2023-10-16T06:19:29Z)
Instant Complexity Reduction in CNNs using Locality-Sensitive Hashing [50.79602839359522]
We propose HASTE (Hashing for Tractable Efficiency), a parameter-free and data-free module that acts as a plug-and-play replacement for any regular convolution module. We are able to drastically compress latent feature maps without sacrificing much accuracy by using locality-sensitive hashing (LSH) In particular, we are able to instantly drop 46.72% of FLOPs while only losing 1.25% accuracy by just swapping the convolution modules in a ResNet34 on CIFAR-10 for our HASTE module.
arXiv Detail & Related papers (2023-09-29T13:09:40Z)
Efficient On-device Training via Gradient Filtering [14.484604762427717]
We propose a new gradient filtering approach which enables on-device CNN model training. Our approach creates a special structure with fewer unique elements in the gradient map. Our approach opens up a new direction of research with a huge potential for on-device training.
arXiv Detail & Related papers (2023-01-01T02:33:03Z)
Efficient Training of Spiking Neural Networks with Temporally-Truncated Local Backpropagation through Time [1.926678651590519]
Training spiking neural networks (SNNs) has remained challenging due to complex neural dynamics and intrinsic non-differentiability in firing functions. This work proposes an efficient and direct training algorithm for SNNs that integrates a locally-supervised training method with a temporally-truncated BPTT algorithm.
arXiv Detail & Related papers (2021-12-13T07:44:58Z)
Low Precision Decentralized Distributed Training with Heterogeneous Data [5.43185002439223]
We show the convergence of low precision decentralized training that aims to reduce the computational complexity of training and inference. Experiments indicate that 8-bit decentralized training has minimal accuracy loss compared to its full precision counterpart even with heterogeneous data.
arXiv Detail & Related papers (2021-11-17T20:48:09Z)
LCS: Learning Compressible Subspaces for Adaptive Network Compression at Inference Time [57.52251547365967]
We propose a method for training a "compressible subspace" of neural networks that contains a fine-grained spectrum of models. We present results for achieving arbitrarily fine-grained accuracy-efficiency trade-offs at inference time for structured and unstructured sparsity. Our algorithm extends to quantization at variable bit widths, achieving accuracy on par with individually trained networks.
arXiv Detail & Related papers (2021-10-08T17:03:34Z)
AC/DC: Alternating Compressed/DeCompressed Training of Deep Neural Networks [78.62086125399831]
We present a general approach called Alternating Compressed/DeCompressed (AC/DC) training of deep neural networks (DNNs) AC/DC outperforms existing sparse training methods in accuracy at similar computational budgets. An important property of AC/DC is that it allows co-training of dense and sparse models, yielding accurate sparse-dense model pairs at the end of the training process.
arXiv Detail & Related papers (2021-06-23T13:23:00Z)
Post-training deep neural network pruning via layer-wise calibration [70.65691136625514]
We propose a data-free extension of the approach for computer vision models based on automatically-generated synthetic fractal images. When using real data, we are able to get a ResNet50 model on ImageNet with 65% sparsity rate in 8-bit precision in a post-training setting.
arXiv Detail & Related papers (2021-04-30T14:20:51Z)
FracTrain: Fractionally Squeezing Bit Savings Both Temporally and Spatially for Efficient DNN Training [81.85361544720885]
We propose FracTrain that integrates progressive fractional quantization which gradually increases the precision of activations, weights, and gradients. FracTrain reduces computational cost and hardware-quantified energy/latency of DNN training while achieving a comparable or better (-0.12%+1.87%) accuracy.
arXiv Detail & Related papers (2020-12-24T05:24:10Z)
Predicting Training Time Without Training [120.92623395389255]
We tackle the problem of predicting the number of optimization steps that a pre-trained deep network needs to converge to a given value of the loss function. We leverage the fact that the training dynamics of a deep network during fine-tuning are well approximated by those of a linearized model. We are able to predict the time it takes to fine-tune a model to a given loss without having to perform any training.
arXiv Detail & Related papers (2020-08-28T04:29:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.