Pipelined Training with Stale Weights of Deep Convolutional Neural
Networks
- URL: http://arxiv.org/abs/1912.12675v1
- Date: Sun, 29 Dec 2019 15:28:13 GMT
- Title: Pipelined Training with Stale Weights of Deep Convolutional Neural
Networks
- Authors: Lifu Zhang, Tarek S. Abdelrahman
- Abstract summary: We explore the impact of stale weights on the statistical efficiency and performance in a pipelined backpropagation scheme.
We show that when pipelining is limited to early layers in a network, training with stale weights converges and results in models with comparable inference accuracies.
We propose combining pipelined and non-pipelined training in a hybrid scheme to address this drop.
- Score: 0.1921787217122713
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The growth in the complexity of Convolutional Neural Networks (CNNs) is
increasing interest in partitioning a network across multiple accelerators
during training and pipelining the backpropagation computations over the
accelerators. Existing approaches avoid or limit the use of stale weights
through techniques such as micro-batching or weight stashing. These techniques
either underutilize of accelerators or increase memory footprint. We explore
the impact of stale weights on the statistical efficiency and performance in a
pipelined backpropagation scheme that maximizes accelerator utilization and
keeps memory overhead modest. We use 4 CNNs (LeNet-5, AlexNet, VGG and ResNet)
and show that when pipelining is limited to early layers in a network, training
with stale weights converges and results in models with comparable inference
accuracies to those resulting from non-pipelined training on MNIST and CIFAR-10
datasets; a drop in accuracy of 0.4%, 4%, 0.83% and 1.45% for the 4 networks,
respectively. However, when pipelining is deeper in the network, inference
accuracies drop significantly. We propose combining pipelined and non-pipelined
training in a hybrid scheme to address this drop. We demonstrate the
implementation and performance of our pipelined backpropagation in PyTorch on 2
GPUs using ResNet, achieving speedups of up to 1.8X over a 1-GPU baseline, with
a small drop in inference accuracy.
Related papers
- ReCycle: Resilient Training of Large DNNs using Pipeline Adaptation [2.0181279529015925]
ReCycle is a system designed for efficient training in the presence of failures.
It exploits the inherent functional redundancy in distributed training systems.
We show it achieves high training throughput under multiple failures.
arXiv Detail & Related papers (2024-05-22T21:35:56Z) - Intelligence Processing Units Accelerate Neuromorphic Learning [52.952192990802345]
Spiking neural networks (SNNs) have achieved orders of magnitude improvement in terms of energy consumption and latency.
We present an IPU-optimized release of our custom SNN Python package, snnTorch.
arXiv Detail & Related papers (2022-11-19T15:44:08Z) - Accelerating Training and Inference of Graph Neural Networks with Fast
Sampling and Pipelining [58.10436813430554]
Mini-batch training of graph neural networks (GNNs) requires a lot of computation and data movement.
We argue in favor of performing mini-batch training with neighborhood sampling in a distributed multi-GPU environment.
We present a sequence of improvements to mitigate these bottlenecks, including a performance-engineered neighborhood sampler.
We also conduct an empirical analysis that supports the use of sampling for inference, showing that test accuracies are not materially compromised.
arXiv Detail & Related papers (2021-10-16T02:41:35Z) - DS-Net++: Dynamic Weight Slicing for Efficient Inference in CNNs and
Transformers [105.74546828182834]
We show a hardware-efficient dynamic inference regime, named dynamic weight slicing, which adaptively slice a part of network parameters for inputs with diverse difficulty levels.
We present dynamic slimmable network (DS-Net) and dynamic slice-able network (DS-Net++) by input-dependently adjusting filter numbers of CNNs and multiple dimensions in both CNNs and transformers.
arXiv Detail & Related papers (2021-09-21T09:57:21Z) - Compact representations of convolutional neural networks via weight
pruning and quantization [63.417651529192014]
We propose a novel storage format for convolutional neural networks (CNNs) based on source coding and leveraging both weight pruning and quantization.
We achieve a reduction of space occupancy up to 0.6% on fully connected layers and 5.44% on the whole network, while performing at least as competitive as the baseline.
arXiv Detail & Related papers (2021-08-28T20:39:54Z) - Dynamic Slimmable Network [105.74546828182834]
We develop a dynamic network slimming regime named Dynamic Slimmable Network (DS-Net)
Our DS-Net is empowered with the ability of dynamic inference by the proposed double-headed dynamic gate.
It consistently outperforms its static counterparts as well as state-of-the-art static and dynamic model compression methods.
arXiv Detail & Related papers (2021-03-24T15:25:20Z) - Learning N:M Fine-grained Structured Sparse Neural Networks From Scratch [75.69506249886622]
Sparsity in Deep Neural Networks (DNNs) has been widely studied to compress and accelerate the models on resource-constrained environments.
In this paper, we are the first to study training from scratch an N:M fine-grained structured sparse network.
arXiv Detail & Related papers (2021-02-08T05:55:47Z) - BaPipe: Exploration of Balanced Pipeline Parallelism for DNN Training [9.551339069298011]
BaPipe is a pipeline parallelism training framework for distributed deep learning.
It automatically explores pipeline parallelism training methods and balanced partition strategies for distributed training.
BaPipe provides up to 3.2x speedup and 4x memory reduction in various platforms.
arXiv Detail & Related papers (2020-12-23T08:57:39Z) - Pipelined Backpropagation at Scale: Training Large Models without
Batches [0.9580895202050946]
We evaluate the use of small batch, fine-grained Pipelined Backpropagation, an asynchronous pipeline parallel training algorithm.
We show that appropriate normalization and small batch sizes can also aid training.
arXiv Detail & Related papers (2020-03-25T22:26:28Z) - Gradual Channel Pruning while Training using Feature Relevance Scores
for Convolutional Neural Networks [6.534515590778012]
Pruning is one of the predominant approaches used for deep network compression.
We present a simple-yet-effective gradual channel pruning while training methodology using a novel data-driven metric.
We demonstrate the effectiveness of the proposed methodology on architectures such as VGG and ResNet.
arXiv Detail & Related papers (2020-02-23T17:56:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.