Related papers: Optimizing Memory Mapping Using Deep Reinforcement Learning

Optimizing Memory Mapping Using Deep Reinforcement Learning

URL: http://arxiv.org/abs/2305.07440v2
Date: Tue, 17 Oct 2023 09:53:45 GMT
Title: Optimizing Memory Mapping Using Deep Reinforcement Learning
Authors: Pengming Wang, Mikita Sazanovich, Berkin Ilbeyi, Phitchaya Mangpo Phothilimthana, Manish Purohit, Han Yang Tay, Ng\^an V\~u, Miaosen Wang, Cosmin Paduraru, Edouard Leurent, Anton Zhernov, Po-Sen Huang, Julian Schrittwieser, Thomas Hubert, Robert Tung, Paula Kurylowicz, Kieran Milan, Oriol Vinyals and Daniel J. Mankowitz
Abstract summary: This paper focuses on the memory mapping problem that occurs during compilation of machine learning programs. We introduce an approach for solving the memory mapping problem using Reinforcement Learning. We also introduce a Reinforcement Learning agent, mallocMuZero, and show that it is capable of playing this game to discover new and improved memory mapping solutions.
Score: 29.48627805378257
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Resource scheduling and allocation is a critical component of many high impact systems ranging from congestion control to cloud computing. Finding more optimal solutions to these problems often has significant impact on resource and time savings, reducing device wear-and-tear, and even potentially improving carbon emissions. In this paper, we focus on a specific instance of a scheduling problem, namely the memory mapping problem that occurs during compilation of machine learning programs: That is, mapping tensors to different memory layers to optimize execution time. We introduce an approach for solving the memory mapping problem using Reinforcement Learning. RL is a solution paradigm well-suited for sequential decision making problems that are amenable to planning, and combinatorial search spaces with high-dimensional data inputs. We formulate the problem as a single-player game, which we call the mallocGame, such that high-reward trajectories of the game correspond to efficient memory mappings on the target hardware. We also introduce a Reinforcement Learning agent, mallocMuZero, and show that it is capable of playing this game to discover new and improved memory mapping solutions that lead to faster execution times on real ML workloads on ML accelerators. We compare the performance of mallocMuZero to the default solver used by the Accelerated Linear Algebra (XLA) compiler on a benchmark of realistic ML workloads. In addition, we show that mallocMuZero is capable of improving the execution time of the recently published AlphaTensor matrix multiplication model.

Related papers

Efficient Arbitrary Precision Acceleration for Large Language Models on GPU Tensor Cores [3.6385567224218556]
Large language models (LLMs) have been widely applied but face challenges in efficient inference. We introduce a novel bipolar-INT data format that facilitates parallel computing and supports symmetric quantization. We implement an arbitrary precision matrix multiplication scheme that decomposes and recovers at the bit level, enabling flexible precision.
arXiv Detail & Related papers (2024-09-26T14:17:58Z)
Robust Implementation of Retrieval-Augmented Generation on Edge-based Computing-in-Memory Architectures [26.183960625493807]
Large Language Models (LLMs) deployed on edge devices learn through fine-tuning and updating a certain portion of their parameters. Retrieval-Augmented Generation (RAG) is a resource-efficient LLM learning method. We propose a novel framework to accelerate RAG via Computing-in-Memory (CiM) architectures.
arXiv Detail & Related papers (2024-05-07T22:31:50Z)
Revisiting Zeroth-Order Optimization for Memory-Efficient LLM Fine-Tuning: A Benchmark [166.40879020706151]
This paper proposes a shift towards BP-free, zeroth-order (ZO) optimization as a solution for reducing memory costs during fine-tuning. Unlike traditional ZO-SGD methods, our work expands the exploration to a wider array of ZO optimization techniques. Our study unveils previously overlooked optimization principles, highlighting the importance of task alignment, the role of the forward gradient method, and the balance between algorithm complexity and fine-tuning performance.
arXiv Detail & Related papers (2024-02-18T14:08:48Z)
EVEREST: Efficient Masked Video Autoencoder by Removing Redundant Spatiotemporal Tokens [57.354304637367555]
We present EVEREST, a surprisingly efficient MVA approach for video representation learning. It finds tokens containing rich motion features and discards uninformative ones during both pre-training and fine-tuning. Our method significantly reduces the computation and memory requirements of MVA.
arXiv Detail & Related papers (2022-11-19T09:57:01Z)
Improving information retention in large scale online continual learning [99.73847522194549]
Online continual learning aims to adapt efficiently to new data while retaining existing knowledge. Recent work suggests that information retention remains a problem in large scale OCL even when the replay buffer is unlimited. We propose using a moving average family of methods to improve optimization for non-stationary objectives.
arXiv Detail & Related papers (2022-10-12T16:59:43Z)
Memory Safe Computations with XLA Compiler [14.510796427699459]
XLA compiler extension adjusts the representation of an algorithm according to a user-specified memory limit. We show that k-nearest neighbour and sparse Gaussian process regression methods can be run at a much larger scale on a single device.
arXiv Detail & Related papers (2022-06-28T16:59:28Z)
Recurrent Dynamic Embedding for Video Object Segmentation [54.52527157232795]
We propose a Recurrent Dynamic Embedding (RDE) to build a memory bank of constant size. We propose an unbiased guidance loss during the training stage, which makes SAM more robust in long videos. We also design a novel self-correction strategy so that the network can repair the embeddings of masks with different qualities in the memory bank.
arXiv Detail & Related papers (2022-05-08T02:24:43Z)
Mesa: A Memory-saving Training Framework for Transformers [58.78933015299703]
We present Mesa, a memory-saving training framework for Transformers. Mesa uses exact activations during forward pass while storing a low-precision version of activations to reduce memory consumption during training. Experiments on ImageNet, CIFAR-100 and ADE20K demonstrate that Mesa can reduce half of the memory footprints during training.
arXiv Detail & Related papers (2021-11-22T11:23:01Z)
Memformer: A Memory-Augmented Transformer for Sequence Modeling [55.780849185884996]
We present Memformer, an efficient neural network for sequence modeling. Our model achieves linear time complexity and constant memory space complexity when processing long sequences.
arXiv Detail & Related papers (2020-10-14T09:03:36Z)
Diagonal Memory Optimisation for Machine Learning on Micro-controllers [21.222568055417717]
Micro controllers and low power CPUs are increasingly being used to perform inference with machine learning models. Small amounts of RAM available on these targets sets limits on size of models which can be executed. diagonal memory optimisation technique is described and shown to achieve memory savings of up to 34.5% when applied to eleven common models.
arXiv Detail & Related papers (2020-10-04T19:45:55Z)
Improving compute efficacy frontiers with SliceOut [31.864949424541344]
We introduce SliceOut -- a dropout-inspired scheme to train deep learning models faster without impacting final test accuracy. At test time, turning off SliceOut performs an implicit ensembling across a linear number of architectures that preserves test accuracy. This leads to faster processing of large computational workloads overall, and significantly reduce the resulting energy consumption and CO2emissions.
arXiv Detail & Related papers (2020-07-21T15:59:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.