LR-CNN: Lightweight Row-centric Convolutional Neural Network Training
for Memory Reduction
- URL: http://arxiv.org/abs/2401.11471v1
- Date: Sun, 21 Jan 2024 12:19:13 GMT
- Title: LR-CNN: Lightweight Row-centric Convolutional Neural Network Training
for Memory Reduction
- Authors: Zhigang Wang, Hangyu Yang, Ning Wang, Chuanfei Xu, Jie Nie, Zhiqiang
Wei, Yu Gu, Ge Yu
- Abstract summary: Convolutional Neural Network with a multi-layer architecture has advanced rapidly.
Current efforts mitigate such bottleneck by external auxiliary solutions with additional hardware costs, and internal modifications with potential accuracy penalty.
We break the traditional layer-by-layer (column) dataflow rule. Now operations are novelly re-organized into rows throughout all convolution layers.
This lightweight design allows a majority of intermediate data to be removed without any loss of accuracy.
- Score: 21.388549904063538
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In the last decade, Convolutional Neural Network with a multi-layer
architecture has advanced rapidly. However, training its complex network is
very space-consuming, since a lot of intermediate data are preserved across
layers, especially when processing high-dimension inputs with a big batch size.
That poses great challenges to the limited memory capacity of current
accelerators (e.g., GPUs). Existing efforts mitigate such bottleneck by
external auxiliary solutions with additional hardware costs, and internal
modifications with potential accuracy penalty. Differently, our analysis
reveals that computations intra- and inter-layers exhibit the spatial-temporal
weak dependency and even complete independency features. That inspires us to
break the traditional layer-by-layer (column) dataflow rule. Now operations are
novelly re-organized into rows throughout all convolution layers. This
lightweight design allows a majority of intermediate data to be removed without
any loss of accuracy. We particularly study the weak dependency between two
consecutive rows. For the resulting skewed memory consumption, we give two
solutions with different favorite scenarios. Evaluations on two representative
networks confirm the effectiveness. We also validate that our middle dataflow
optimization can be smoothly embraced by existing works for better memory
reduction.
Related papers
- Weight Block Sparsity: Training, Compilation, and AI Engine Accelerators [0.0]
Deep Neural Networks (DNNs) are being developed, trained, and utilized, putting a strain on both advanced and limited devices.
Our solution is to implement em weight block sparsity, which is a structured sparsity that is friendly to hardware.
We will present performance estimates using accurate and complete code generation for AIE2 configuration sets (AMD Versal FPGAs) with Resnet50, Inception V3, and VGG16.
arXiv Detail & Related papers (2024-07-12T17:37:49Z) - SHERL: Synthesizing High Accuracy and Efficient Memory for Resource-Limited Transfer Learning [63.93193829913252]
We propose an innovative METL strategy called SHERL for resource-limited scenarios.
In the early route, intermediate outputs are consolidated via an anti-redundancy operation.
In the late route, utilizing minimal late pre-trained layers could alleviate the peak demand on memory overhead.
arXiv Detail & Related papers (2024-07-10T10:22:35Z) - Dr$^2$Net: Dynamic Reversible Dual-Residual Networks for Memory-Efficient Finetuning [81.0108753452546]
We propose Dynamic Reversible Dual-Residual Networks, or Dr$2$Net, to finetune a pretrained model with substantially reduced memory consumption.
Dr$2$Net contains two types of residual connections, one maintaining the residual structure in the pretrained models, and the other making the network reversible.
We show that Dr$2$Net can reach comparable performance to conventional finetuning but with significantly less memory usage.
arXiv Detail & Related papers (2024-01-08T18:59:31Z) - A Low-Complexity Approach to Rate-Distortion Optimized Variable Bit-Rate
Compression for Split DNN Computing [5.3221129103999125]
Split computing has emerged as a recent paradigm for implementation of DNN-based AI workloads.
We present an approach that addresses the challenge of optimizing the rate-accuracy-complexity trade-off.
Our approach is remarkably lightweight, both during training and inference, highly effective and achieves excellent rate-distortion performance.
arXiv Detail & Related papers (2022-08-24T15:02:11Z) - Dynamic Split Computing for Efficient Deep Edge Intelligence [78.4233915447056]
We introduce dynamic split computing, where the optimal split location is dynamically selected based on the state of the communication channel.
We show that dynamic split computing achieves faster inference in edge computing environments where the data rate and server load vary over time.
arXiv Detail & Related papers (2022-05-23T12:35:18Z) - EcoFlow: Efficient Convolutional Dataflows for Low-Power Neural Network
Accelerators [12.223778147172107]
Dilated and transposed convolutions are widely used in modern convolutional neural networks (CNNs)
These kernels stress current compute systems due to their high memory intensity, exascale compute demands, and large energy consumption.
We propose EcoFlow, a new set of dataflows and mapping algorithms for dilated and transposed convolutions.
arXiv Detail & Related papers (2022-02-04T18:48:36Z) - MCUNetV2: Memory-Efficient Patch-based Inference for Tiny Deep Learning [72.80896338009579]
We find that the memory bottleneck is due to the imbalanced memory distribution in convolutional neural network (CNN) designs.
We propose a generic patch-by-patch inference scheduling, which significantly cuts down the peak memory.
We automate the process with neural architecture search to jointly optimize the neural architecture and inference scheduling, leading to MCUNetV2.
arXiv Detail & Related papers (2021-10-28T17:58:45Z) - Towards Memory-Efficient Neural Networks via Multi-Level in situ
Generation [10.563649948220371]
Deep neural networks (DNN) have shown superior performance in a variety of tasks.
As they rapidly evolve, their escalating computation and memory demands make it challenging to deploy them on resource-constrained edge devices.
We propose a general and unified framework to trade expensive memory transactions with ultra-fast on-chip computations.
arXiv Detail & Related papers (2021-08-25T18:50:24Z) - MAFAT: Memory-Aware Fusing and Tiling of Neural Networks for Accelerated
Edge Inference [1.7894377200944507]
Machine learning networks can easily exceed available memory, increasing latency due to excessive OS swapping.
We propose a memory usage predictor coupled with a search algorithm to provide optimized fusing and tiling configurations.
Results show that our approach can run in less than half the memory, and with a speedup of up to 2.78 under severe memory constraints.
arXiv Detail & Related papers (2021-07-14T19:45:49Z) - Non-Gradient Manifold Neural Network [79.44066256794187]
Deep neural network (DNN) generally takes thousands of iterations to optimize via gradient descent.
We propose a novel manifold neural network based on non-gradient optimization.
arXiv Detail & Related papers (2021-06-15T06:39:13Z) - Improving Computational Efficiency in Visual Reinforcement Learning via
Stored Embeddings [89.63764845984076]
We present Stored Embeddings for Efficient Reinforcement Learning (SEER)
SEER is a simple modification of existing off-policy deep reinforcement learning methods.
We show that SEER does not degrade the performance of RLizable agents while significantly saving computation and memory.
arXiv Detail & Related papers (2021-03-04T08:14:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.