Improving Memory Utilization in Convolutional Neural Network
Accelerators
- URL: http://arxiv.org/abs/2007.09963v2
- Date: Tue, 6 Apr 2021 15:45:49 GMT
- Title: Improving Memory Utilization in Convolutional Neural Network
Accelerators
- Authors: Petar Jokic, Stephane Emery, Luca Benini
- Abstract summary: We propose a mapping method that allows activation layers to overlap and thus utilize the memory more efficiently.
Experiments with various real-world object detector networks show that the proposed mapping technique can decrease the activations memory by up to 32.9%.
For higher resolution de-noising networks, we achieve activation memory savings of 48.8%.
- Score: 16.340620299847384
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While the accuracy of convolutional neural networks has achieved vast
improvements by introducing larger and deeper network architectures, also the
memory footprint for storing their parameters and activations has increased.
This trend especially challenges power- and resource-limited accelerator
designs, which are often restricted to store all network data in on-chip memory
to avoid interfacing energy-hungry external memories. Maximizing the network
size that fits on a given accelerator thus requires to maximize its memory
utilization. While the traditionally used ping-pong buffering technique is
mapping subsequent activation layers to disjunctive memory regions, we propose
a mapping method that allows these regions to overlap and thus utilize the
memory more efficiently. This work presents the mathematical model to compute
the maximum activations memory overlap and thus the lower bound of on-chip
memory needed to perform layer-by-layer processing of convolutional neural
networks on memory-limited accelerators. Our experiments with various
real-world object detector networks show that the proposed mapping technique
can decrease the activations memory by up to 32.9%, reducing the overall memory
for the entire network by up to 23.9% compared to traditional ping-pong
buffering. For higher resolution de-noising networks, we achieve activation
memory savings of 48.8%. Additionally, we implement a face detector network on
an FPGA-based camera to validate these memory savings on a complete end-to-end
system.
Related papers
- Efficient and accurate neural field reconstruction using resistive memory [52.68088466453264]
Traditional signal reconstruction methods on digital computers face both software and hardware challenges.
We propose a systematic approach with software-hardware co-optimizations for signal reconstruction from sparse inputs.
This work advances the AI-driven signal restoration technology and paves the way for future efficient and robust medical AI and 3D vision applications.
arXiv Detail & Related papers (2024-04-15T09:33:09Z) - Topology-aware Embedding Memory for Continual Learning on Expanding Networks [63.35819388164267]
We present a framework to tackle the memory explosion problem using memory replay techniques.
PDGNNs with Topology-aware Embedding Memory (TEM) significantly outperform state-of-the-art techniques.
arXiv Detail & Related papers (2024-01-24T03:03:17Z) - Generalized Key-Value Memory to Flexibly Adjust Redundancy in
Memory-Augmented Networks [6.03025980398201]
Memory-augmented neural networks enhance a neural network with an external key-value memory.
We propose a generalized key-value memory that decouples its dimension from the number of support vectors.
We show that adapting this parameter on demand effectively mitigates up to 44% nonidealities, at equal accuracy and number of devices.
arXiv Detail & Related papers (2022-03-11T19:59:43Z) - Mesa: A Memory-saving Training Framework for Transformers [58.78933015299703]
We present Mesa, a memory-saving training framework for Transformers.
Mesa uses exact activations during forward pass while storing a low-precision version of activations to reduce memory consumption during training.
Experiments on ImageNet, CIFAR-100 and ADE20K demonstrate that Mesa can reduce half of the memory footprints during training.
arXiv Detail & Related papers (2021-11-22T11:23:01Z) - MCUNetV2: Memory-Efficient Patch-based Inference for Tiny Deep Learning [72.80896338009579]
We find that the memory bottleneck is due to the imbalanced memory distribution in convolutional neural network (CNN) designs.
We propose a generic patch-by-patch inference scheduling, which significantly cuts down the peak memory.
We automate the process with neural architecture search to jointly optimize the neural architecture and inference scheduling, leading to MCUNetV2.
arXiv Detail & Related papers (2021-10-28T17:58:45Z) - Towards Memory-Efficient Neural Networks via Multi-Level in situ
Generation [10.563649948220371]
Deep neural networks (DNN) have shown superior performance in a variety of tasks.
As they rapidly evolve, their escalating computation and memory demands make it challenging to deploy them on resource-constrained edge devices.
We propose a general and unified framework to trade expensive memory transactions with ultra-fast on-chip computations.
arXiv Detail & Related papers (2021-08-25T18:50:24Z) - MAFAT: Memory-Aware Fusing and Tiling of Neural Networks for Accelerated
Edge Inference [1.7894377200944507]
Machine learning networks can easily exceed available memory, increasing latency due to excessive OS swapping.
We propose a memory usage predictor coupled with a search algorithm to provide optimized fusing and tiling configurations.
Results show that our approach can run in less than half the memory, and with a speedup of up to 2.78 under severe memory constraints.
arXiv Detail & Related papers (2021-07-14T19:45:49Z) - ActNN: Reducing Training Memory Footprint via 2-Bit Activation
Compressed Training [68.63354877166756]
ActNN is a memory-efficient training framework that stores randomly quantized activations for back propagation.
ActNN reduces the memory footprint of the activation by 12x, and it enables training with a 6.6x to 14x larger batch size.
arXiv Detail & Related papers (2021-04-29T05:50:54Z) - Memformer: A Memory-Augmented Transformer for Sequence Modeling [55.780849185884996]
We present Memformer, an efficient neural network for sequence modeling.
Our model achieves linear time complexity and constant memory space complexity when processing long sequences.
arXiv Detail & Related papers (2020-10-14T09:03:36Z) - DESCNet: Developing Efficient Scratchpad Memories for Capsule Network
Hardware [12.26801463167931]
Capsule Networks (CapsNets) have improved the generalization ability, as compared to Deep Neural Networks (DNNs)
CapsNets pose significantly high computational and memory requirements, making their energy-efficient inference a challenging task.
This paper provides, for the first time, an in-depth analysis to highlight the design and management related challenges for the (on-chip) memories deployed in hardware accelerators executing fast CapsNets inference.
arXiv Detail & Related papers (2020-10-12T14:50:59Z) - Robust High-dimensional Memory-augmented Neural Networks [13.82206983716435]
Memory-augmented neural networks enhance neural networks with an explicit memory to overcome these issues.
Access to this explicit memory occurs via soft read and write operations involving every individual memory entry.
We propose a robust architecture that employs a computational memory unit as the explicit memory performing analog in-memory computation on high-dimensional (HD) vectors.
arXiv Detail & Related papers (2020-10-05T12:01:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.