Enabling Binary Neural Network Training on the Edge
- URL: http://arxiv.org/abs/2102.04270v6
- Date: Sun, 24 Sep 2023 23:07:32 GMT
- Title: Enabling Binary Neural Network Training on the Edge
- Authors: Erwei Wang, James J. Davis, Daniele Moro, Piotr Zielinski, Jia Jie
Lim, Claudionor Coelho, Satrajit Chatterjee, Peter Y. K. Cheung, George A.
Constantinides
- Abstract summary: Existing binary neural network training methods require concurrent storage of high-precision activations for all layers.
We introduce a low-cost binary neural network training strategy exhibiting sizable memory footprint reductions.
We also demonstrate from-scratch ImageNet training of binarized ResNet-18, achieving a 3.78$times$ memory reduction.
- Score: 7.32770338248516
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The ever-growing computational demands of increasingly complex machine
learning models frequently necessitate the use of powerful cloud-based
infrastructure for their training. Binary neural networks are known to be
promising candidates for on-device inference due to their extreme compute and
memory savings over higher-precision alternatives. However, their existing
training methods require the concurrent storage of high-precision activations
for all layers, generally making learning on memory-constrained devices
infeasible. In this article, we demonstrate that the backward propagation
operations needed for binary neural network training are strongly robust to
quantization, thereby making on-the-edge learning with modern models a
practical proposition. We introduce a low-cost binary neural network training
strategy exhibiting sizable memory footprint reductions while inducing little
to no accuracy loss vs Courbariaux & Bengio's standard approach. These
decreases are primarily enabled through the retention of activations
exclusively in binary format. Against the latter algorithm, our drop-in
replacement sees memory requirement reductions of 3--5$\times$, while reaching
similar test accuracy in comparable time, across a range of small-scale models
trained to classify popular datasets. We also demonstrate from-scratch ImageNet
training of binarized ResNet-18, achieving a 3.78$\times$ memory reduction. Our
work is open-source, and includes the Raspberry Pi-targeted prototype we used
to verify our modeled memory decreases and capture the associated energy drops.
Such savings will allow for unnecessary cloud offloading to be avoided,
reducing latency, increasing energy efficiency, and safeguarding end-user
privacy.
Related papers
- Enabling On-device Continual Learning with Binary Neural Networks [3.180732240499359]
We propose a solution that combines recent advancements in the field of Continual Learning (CL) and Binary Neural Networks (BNNs)
Specifically, our approach leverages binary latent replay activations and a novel quantization scheme that significantly reduces the number of bits required for gradient computation.
arXiv Detail & Related papers (2024-01-18T11:57:05Z) - Dr$^2$Net: Dynamic Reversible Dual-Residual Networks for Memory-Efficient Finetuning [81.0108753452546]
We propose Dynamic Reversible Dual-Residual Networks, or Dr$2$Net, to finetune a pretrained model with substantially reduced memory consumption.
Dr$2$Net contains two types of residual connections, one maintaining the residual structure in the pretrained models, and the other making the network reversible.
We show that Dr$2$Net can reach comparable performance to conventional finetuning but with significantly less memory usage.
arXiv Detail & Related papers (2024-01-08T18:59:31Z) - OLLA: Decreasing the Memory Usage of Neural Networks by Optimizing the
Lifetime and Location of Arrays [6.418232942455968]
OLLA is an algorithm that optimize the lifetime and memory location of the tensors used to train neural networks.
We present several techniques to simplify the encoding of the problem, and enable our approach to scale to the size of state-of-the-art neural networks.
arXiv Detail & Related papers (2022-10-24T02:39:13Z) - On-Device Training Under 256KB Memory [62.95579393237751]
We propose an algorithm-system co-design framework to make on-device training possible with only 256KB of memory.
Our framework is the first solution to enable tiny on-device training of convolutional neural networks under 256KB and 1MB Flash.
arXiv Detail & Related papers (2022-06-30T17:59:08Z) - Mesa: A Memory-saving Training Framework for Transformers [58.78933015299703]
We present Mesa, a memory-saving training framework for Transformers.
Mesa uses exact activations during forward pass while storing a low-precision version of activations to reduce memory consumption during training.
Experiments on ImageNet, CIFAR-100 and ADE20K demonstrate that Mesa can reduce half of the memory footprints during training.
arXiv Detail & Related papers (2021-11-22T11:23:01Z) - LCS: Learning Compressible Subspaces for Adaptive Network Compression at
Inference Time [57.52251547365967]
We propose a method for training a "compressible subspace" of neural networks that contains a fine-grained spectrum of models.
We present results for achieving arbitrarily fine-grained accuracy-efficiency trade-offs at inference time for structured and unstructured sparsity.
Our algorithm extends to quantization at variable bit widths, achieving accuracy on par with individually trained networks.
arXiv Detail & Related papers (2021-10-08T17:03:34Z) - ActNN: Reducing Training Memory Footprint via 2-Bit Activation
Compressed Training [68.63354877166756]
ActNN is a memory-efficient training framework that stores randomly quantized activations for back propagation.
ActNN reduces the memory footprint of the activation by 12x, and it enables training with a 6.6x to 14x larger batch size.
arXiv Detail & Related papers (2021-04-29T05:50:54Z) - Binary Graph Neural Networks [69.51765073772226]
Graph Neural Networks (GNNs) have emerged as a powerful and flexible framework for representation learning on irregular data.
In this paper, we present and evaluate different strategies for the binarization of graph neural networks.
We show that through careful design of the models, and control of the training process, binary graph neural networks can be trained at only a moderate cost in accuracy on challenging benchmarks.
arXiv Detail & Related papers (2020-12-31T18:48:58Z) - Dynamic Hard Pruning of Neural Networks at the Edge of the Internet [11.605253906375424]
Dynamic Hard Pruning (DynHP) technique incrementally prunes the network during training.
DynHP enables a tunable size reduction of the final neural network and reduces the NN memory occupancy during training.
Freed memory is reused by a emphdynamic batch sizing approach to counterbalance the accuracy degradation caused by the hard pruning strategy.
arXiv Detail & Related papers (2020-11-17T10:23:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.