Related papers: Memory Planning for Deep Neural Networks

Memory Planning for Deep Neural Networks

URL: http://arxiv.org/abs/2203.00448v1
Date: Wed, 23 Feb 2022 05:28:18 GMT
Title: Memory Planning for Deep Neural Networks
Authors: Maksim Levental
Abstract summary: We study memory allocation patterns in DNNs during inference. Latencies incurred due to such textttmutex contention produce undesirable bottlenecks in user-facing services. We present an implementation of textttMemoMalloc in the PyTorch deep learning framework.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We study memory allocation patterns in DNNs during inference, in the context of large-scale systems. We observe that such memory allocation patterns, in the context of multi-threading, are subject to high latencies, due to \texttt{mutex} contention in the system memory allocator. Latencies incurred due to such \texttt{mutex} contention produce undesirable bottlenecks in user-facing services. Thus, we propose a "memorization" based technique, \texttt{MemoMalloc}, for optimizing overall latency, with only moderate increases in peak memory usage. Specifically, our technique consists of a runtime component, which captures all allocations and uniquely associates them with their high-level source operation, and a static analysis component, which constructs an efficient allocation "plan". We present an implementation of \texttt{MemoMalloc} in the PyTorch deep learning framework and evaluate memory consumption and execution performance on a wide range of DNN architectures. We find that \texttt{MemoMalloc} outperforms state-of-the-art general purpose memory allocators, with respect to DNN inference latency, by as much as 40\%.

Related papers

Hardware-software co-exploration with racetrack memory based in-memory computing for CNN inference in embedded systems [54.045712360156024]
racetrack memory is a non-volatile technology that allows high data density fabrication.<n>In-memory arithmetic circuits with memory cells affects both the memory density and power efficiency.<n>We present an efficient in-memory convolutional neural network (CNN) accelerator optimized for use with racetrack memory.
arXiv Detail & Related papers (2025-07-02T07:29:53Z)
Quantifying Memory Utilization with Effective State-Size [73.52115209375343]
We develop a measure of textitmemory utilization' This metric is tailored to the fundamental class of systems with textitinput-invariant and textitinput-varying linear operators
arXiv Detail & Related papers (2025-04-28T08:12:30Z)
Optimal Gradient Checkpointing for Sparse and Recurrent Architectures using Off-Chip Memory [0.8321953606016751]
We introduce memory-efficient gradient checkpointing strategies tailored for the general class of sparse RNNs and Spiking Neural Networks. We find that Double Checkpointing emerges as the most effective method, optimizing the use of local memory resources while minimizing recomputation overhead.
arXiv Detail & Related papers (2024-12-16T14:23:31Z)
Host-Based Allocators for Device Memory [1.2289361708127877]
We pose a model where the allocation algorithm runs on host memory but allocates device memory and so incur the following constraint: the allocator can't read the memory it is allocating. This means we are unable to use boundary tags, which is a concept that has been ubiquitous in nearly every allocation algorithm. In this paper, we propose alternate algorithms to work around this constraint, and discuss in general the implications of this system model.
arXiv Detail & Related papers (2024-05-11T19:28:37Z)
Topology-aware Embedding Memory for Continual Learning on Expanding Networks [63.35819388164267]
We present a framework to tackle the memory explosion problem using memory replay techniques. PDGNNs with Topology-aware Embedding Memory (TEM) significantly outperform state-of-the-art techniques.
arXiv Detail & Related papers (2024-01-24T03:03:17Z)
What Do You Mean by Memory? When Engineers Are Lost in the Maze of Complexity [0.0]
An accepted practice to decrease applications' memory usage is to reduce the amount and frequency of memory allocations. The industry needs detailed guidelines for optimizing memory usage targeting specific operating systems (OS) and programming language types.
arXiv Detail & Related papers (2023-12-20T22:26:15Z)
Constant Memory Attention Block [74.38724530521277]
Constant Memory Attention Block (CMAB) is a novel general-purpose attention block that computes its output in constant memory and performs updates in constant computation. We show our proposed methods achieve results competitive with state-of-the-art while being significantly more memory efficient.
arXiv Detail & Related papers (2023-06-21T22:41:58Z)
Robust and Efficient Memory Network for Video Object Segmentation [6.7995672846437305]
This paper proposes a Robust and Efficient Memory Network, or REMN, for studying semi-supervised video object segmentation (VOS) We introduce a local attention mechanism that tackles the background distraction by enhancing the features of foreground objects with the previous mask. Experiments demonstrate that our REMN achieves state-of-the-art results on DAVIS 2017, with a $mathcalJ&F$ score of 86.3% and on YouTube-VOS 2018, with a $mathcalG$ over mean of 85.5%.
arXiv Detail & Related papers (2023-04-24T06:19:21Z)
MCUNetV2: Memory-Efficient Patch-based Inference for Tiny Deep Learning [72.80896338009579]
We find that the memory bottleneck is due to the imbalanced memory distribution in convolutional neural network (CNN) designs. We propose a generic patch-by-patch inference scheduling, which significantly cuts down the peak memory. We automate the process with neural architecture search to jointly optimize the neural architecture and inference scheduling, leading to MCUNetV2.
arXiv Detail & Related papers (2021-10-28T17:58:45Z)
Rethinking Space-Time Networks with Improved Memory Coverage for Efficient Video Object Segmentation [68.45737688496654]
We establish correspondences directly between frames without re-encoding the mask features for every object. With the correspondences, every node in the current query frame is inferred by aggregating features from the past in an associative fashion. We validated that every memory node now has a chance to contribute, and experimentally showed that such diversified voting is beneficial to both memory efficiency and inference accuracy.
arXiv Detail & Related papers (2021-06-09T16:50:57Z)
Pinpointing the Memory Behaviors of DNN Training [37.78973307051419]
Training of deep neural networks (DNNs) is usually memory-hungry due to the limited device memory capacity of accelerators. In this work, we pinpoint the memory behaviors of each device memory block of GPU during training by instrumenting the memory allocators of the runtime system.
arXiv Detail & Related papers (2021-04-01T05:30:03Z)
Efficient Regional Memory Network for Video Object Segmentation [56.587541750729045]
We propose a novel local-to-local matching solution for semi-supervised VOS, namely Regional Memory Network (RMNet) The proposed RMNet effectively alleviates the ambiguity of similar objects in both memory and query frames. Experimental results indicate that the proposed RMNet performs favorably against state-of-the-art methods on the DAVIS and YouTube-VOS datasets.
arXiv Detail & Related papers (2021-03-24T02:08:46Z)
Kanerva++: extending The Kanerva Machine with differentiable, locally block allocated latent memory [75.65949969000596]
Episodic and semantic memory are critical components of the human memory model. We develop a new principled Bayesian memory allocation scheme that bridges the gap between episodic and semantic memory. We demonstrate that this allocation scheme improves performance in memory conditional image generation.
arXiv Detail & Related papers (2021-02-20T18:40:40Z)
Memformer: A Memory-Augmented Transformer for Sequence Modeling [55.780849185884996]
We present Memformer, an efficient neural network for sequence modeling. Our model achieves linear time complexity and constant memory space complexity when processing long sequences.
arXiv Detail & Related papers (2020-10-14T09:03:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.