Efficient Memory Management for Deep Neural Net Inference
- URL: http://arxiv.org/abs/2001.03288v3
- Date: Sun, 16 Feb 2020 02:32:54 GMT
- Title: Efficient Memory Management for Deep Neural Net Inference
- Authors: Yury Pisarchyk and Juhyun Lee
- Abstract summary: Deep neural net inference can now be moved to mobile and embedded devices, desired for various reasons ranging from latency to privacy.
These devices are not only limited by their compute power and battery, but also by their inferior physical memory and cache, and thus, an efficient memory manager becomes a crucial component for deep neural net inference at the edge.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While deep neural net inference was considered a task for servers only,
latest advances in technology allow the task of inference to be moved to mobile
and embedded devices, desired for various reasons ranging from latency to
privacy. These devices are not only limited by their compute power and battery,
but also by their inferior physical memory and cache, and thus, an efficient
memory manager becomes a crucial component for deep neural net inference at the
edge. We explore various strategies to smartly share memory buffers among
intermediate tensors in deep neural nets. Employing these can result in up to
11% smaller memory footprint than the state of the art.
Related papers
- Topology-aware Embedding Memory for Continual Learning on Expanding Networks [63.35819388164267]
We present a framework to tackle the memory explosion problem using memory replay techniques.
PDGNNs with Topology-aware Embedding Memory (TEM) significantly outperform state-of-the-art techniques.
arXiv Detail & Related papers (2024-01-24T03:03:17Z) - Heterogenous Memory Augmented Neural Networks [84.29338268789684]
We introduce a novel heterogeneous memory augmentation approach for neural networks.
By introducing learnable memory tokens with attention mechanism, we can effectively boost performance without huge computational overhead.
We show our approach on various image and graph-based tasks under both in-distribution (ID) and out-of-distribution (OOD) conditions.
arXiv Detail & Related papers (2023-10-17T01:05:28Z) - Generalized Key-Value Memory to Flexibly Adjust Redundancy in
Memory-Augmented Networks [6.03025980398201]
Memory-augmented neural networks enhance a neural network with an external key-value memory.
We propose a generalized key-value memory that decouples its dimension from the number of support vectors.
We show that adapting this parameter on demand effectively mitigates up to 44% nonidealities, at equal accuracy and number of devices.
arXiv Detail & Related papers (2022-03-11T19:59:43Z) - MCUNetV2: Memory-Efficient Patch-based Inference for Tiny Deep Learning [72.80896338009579]
We find that the memory bottleneck is due to the imbalanced memory distribution in convolutional neural network (CNN) designs.
We propose a generic patch-by-patch inference scheduling, which significantly cuts down the peak memory.
We automate the process with neural architecture search to jointly optimize the neural architecture and inference scheduling, leading to MCUNetV2.
arXiv Detail & Related papers (2021-10-28T17:58:45Z) - Reservoir Stack Machines [77.12475691708838]
Memory-augmented neural networks equip a recurrent neural network with an explicit memory to support tasks that require information storage.
We introduce the reservoir stack machine, a model which can provably recognize all deterministic context-free languages.
Our results show that the reservoir stack machine achieves zero error, even on test sequences longer than the training data.
arXiv Detail & Related papers (2021-05-04T16:50:40Z) - Binary Neural Network for Speaker Verification [13.472791713805762]
This paper focuses on how to apply binary neural networks to the task of speaker verification.
Experiment results show that, after binarizing the Convolutional Neural Network, the ResNet34-based network achieves an EER of around 5%.
arXiv Detail & Related papers (2021-04-06T06:04:57Z) - Robust High-dimensional Memory-augmented Neural Networks [13.82206983716435]
Memory-augmented neural networks enhance neural networks with an explicit memory to overcome these issues.
Access to this explicit memory occurs via soft read and write operations involving every individual memory entry.
We propose a robust architecture that employs a computational memory unit as the explicit memory performing analog in-memory computation on high-dimensional (HD) vectors.
arXiv Detail & Related papers (2020-10-05T12:01:56Z) - Reservoir Memory Machines as Neural Computers [70.5993855765376]
Differentiable neural computers extend artificial neural networks with an explicit memory without interference.
We achieve some of the computational capabilities of differentiable neural computers with a model that can be trained very efficiently.
arXiv Detail & Related papers (2020-09-14T12:01:30Z) - Low-Rank Training of Deep Neural Networks for Emerging Memory Technology [4.456122555367167]
We address two key challenges for training on edge devices with non-volatile memory: low write density and low auxiliary memory.
We present a low-rank training scheme that addresses these challenges while maintaining computational efficiency.
arXiv Detail & Related papers (2020-09-08T17:59:56Z) - TinySpeech: Attention Condensers for Deep Speech Recognition Neural
Networks on Edge Devices [71.68436132514542]
We introduce the concept of attention condensers for building low-footprint, highly-efficient deep neural networks for on-device speech recognition on the edge.
To illustrate its efficacy, we introduce TinySpeech, low-precision deep neural networks tailored for on-device speech recognition.
arXiv Detail & Related papers (2020-08-10T16:34:52Z) - Improving Memory Utilization in Convolutional Neural Network
Accelerators [16.340620299847384]
We propose a mapping method that allows activation layers to overlap and thus utilize the memory more efficiently.
Experiments with various real-world object detector networks show that the proposed mapping technique can decrease the activations memory by up to 32.9%.
For higher resolution de-noising networks, we achieve activation memory savings of 48.8%.
arXiv Detail & Related papers (2020-07-20T09:34:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.