Memory Planning for Deep Neural Networks
- URL: http://arxiv.org/abs/2203.00448v1
- Date: Wed, 23 Feb 2022 05:28:18 GMT
- Title: Memory Planning for Deep Neural Networks
- Authors: Maksim Levental
- Abstract summary: We study memory allocation patterns in DNNs during inference.
Latencies incurred due to such textttmutex contention produce undesirable bottlenecks in user-facing services.
We present an implementation of textttMemoMalloc in the PyTorch deep learning framework.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We study memory allocation patterns in DNNs during inference, in the context
of large-scale systems. We observe that such memory allocation patterns, in the
context of multi-threading, are subject to high latencies, due to
\texttt{mutex} contention in the system memory allocator. Latencies incurred
due to such \texttt{mutex} contention produce undesirable bottlenecks in
user-facing services. Thus, we propose a "memorization" based technique,
\texttt{MemoMalloc}, for optimizing overall latency, with only moderate
increases in peak memory usage. Specifically, our technique consists of a
runtime component, which captures all allocations and uniquely associates them
with their high-level source operation, and a static analysis component, which
constructs an efficient allocation "plan". We present an implementation of
\texttt{MemoMalloc} in the PyTorch deep learning framework and evaluate memory
consumption and execution performance on a wide range of DNN architectures. We
find that \texttt{MemoMalloc} outperforms state-of-the-art general purpose
memory allocators, with respect to DNN inference latency, by as much as 40\%.
Related papers
- Host-Based Allocators for Device Memory [1.2289361708127877]
We pose a model where the allocation algorithm runs on host memory but allocates device memory and so incur the following constraint: the allocator can't read the memory it is allocating.
This means we are unable to use boundary tags, which is a concept that has been ubiquitous in nearly every allocation algorithm.
In this paper, we propose alternate algorithms to work around this constraint, and discuss in general the implications of this system model.
arXiv Detail & Related papers (2024-05-11T19:28:37Z) - Topology-aware Embedding Memory for Continual Learning on Expanding Networks [63.35819388164267]
We present a framework to tackle the memory explosion problem using memory replay techniques.
PDGNNs with Topology-aware Embedding Memory (TEM) significantly outperform state-of-the-art techniques.
arXiv Detail & Related papers (2024-01-24T03:03:17Z) - What Do You Mean by Memory? When Engineers Are Lost in the Maze of
Complexity [0.0]
An accepted practice to decrease applications' memory usage is to reduce the amount and frequency of memory allocations.
The industry needs detailed guidelines for optimizing memory usage targeting specific operating systems (OS) and programming language types.
arXiv Detail & Related papers (2023-12-20T22:26:15Z) - Constant Memory Attention Block [74.38724530521277]
Constant Memory Attention Block (CMAB) is a novel general-purpose attention block that computes its output in constant memory and performs updates in constant computation.
We show our proposed methods achieve results competitive with state-of-the-art while being significantly more memory efficient.
arXiv Detail & Related papers (2023-06-21T22:41:58Z) - Robust and Efficient Memory Network for Video Object Segmentation [6.7995672846437305]
This paper proposes a Robust and Efficient Memory Network, or REMN, for studying semi-supervised video object segmentation (VOS)
We introduce a local attention mechanism that tackles the background distraction by enhancing the features of foreground objects with the previous mask.
Experiments demonstrate that our REMN achieves state-of-the-art results on DAVIS 2017, with a $mathcalJ&F$ score of 86.3% and on YouTube-VOS 2018, with a $mathcalG$ over mean of 85.5%.
arXiv Detail & Related papers (2023-04-24T06:19:21Z) - MCUNetV2: Memory-Efficient Patch-based Inference for Tiny Deep Learning [72.80896338009579]
We find that the memory bottleneck is due to the imbalanced memory distribution in convolutional neural network (CNN) designs.
We propose a generic patch-by-patch inference scheduling, which significantly cuts down the peak memory.
We automate the process with neural architecture search to jointly optimize the neural architecture and inference scheduling, leading to MCUNetV2.
arXiv Detail & Related papers (2021-10-28T17:58:45Z) - Rethinking Space-Time Networks with Improved Memory Coverage for
Efficient Video Object Segmentation [68.45737688496654]
We establish correspondences directly between frames without re-encoding the mask features for every object.
With the correspondences, every node in the current query frame is inferred by aggregating features from the past in an associative fashion.
We validated that every memory node now has a chance to contribute, and experimentally showed that such diversified voting is beneficial to both memory efficiency and inference accuracy.
arXiv Detail & Related papers (2021-06-09T16:50:57Z) - Pinpointing the Memory Behaviors of DNN Training [37.78973307051419]
Training of deep neural networks (DNNs) is usually memory-hungry due to the limited device memory capacity of accelerators.
In this work, we pinpoint the memory behaviors of each device memory block of GPU during training by instrumenting the memory allocators of the runtime system.
arXiv Detail & Related papers (2021-04-01T05:30:03Z) - Efficient Regional Memory Network for Video Object Segmentation [56.587541750729045]
We propose a novel local-to-local matching solution for semi-supervised VOS, namely Regional Memory Network (RMNet)
The proposed RMNet effectively alleviates the ambiguity of similar objects in both memory and query frames.
Experimental results indicate that the proposed RMNet performs favorably against state-of-the-art methods on the DAVIS and YouTube-VOS datasets.
arXiv Detail & Related papers (2021-03-24T02:08:46Z) - Kanerva++: extending The Kanerva Machine with differentiable, locally
block allocated latent memory [75.65949969000596]
Episodic and semantic memory are critical components of the human memory model.
We develop a new principled Bayesian memory allocation scheme that bridges the gap between episodic and semantic memory.
We demonstrate that this allocation scheme improves performance in memory conditional image generation.
arXiv Detail & Related papers (2021-02-20T18:40:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.