Related papers: Memory Safe Computations with XLA Compiler

Memory Safe Computations with XLA Compiler

URL: http://arxiv.org/abs/2206.14148v1
Date: Tue, 28 Jun 2022 16:59:28 GMT
Title: Memory Safe Computations with XLA Compiler
Authors: Artem Artemev, Tilman Roeder, Mark van der Wilk
Abstract summary: XLA compiler extension adjusts the representation of an algorithm according to a user-specified memory limit. We show that k-nearest neighbour and sparse Gaussian process regression methods can be run at a much larger scale on a single device.
Score: 14.510796427699459
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Software packages like TensorFlow and PyTorch are designed to support linear algebra operations, and their speed and usability determine their success. However, by prioritising speed, they often neglect memory requirements. As a consequence, the implementations of memory-intensive algorithms that are convenient in terms of software design can often not be run for large problems due to memory overflows. Memory-efficient solutions require complex programming approaches with significant logic outside the computational framework. This impairs the adoption and use of such algorithms. To address this, we developed an XLA compiler extension that adjusts the computational data-flow representation of an algorithm according to a user-specified memory limit. We show that k-nearest neighbour and sparse Gaussian process regression methods can be run at a much larger scale on a single device, where standard implementations would have failed. Our approach leads to better use of hardware resources. We believe that further focus on removing memory constraints at a compiler level will widen the range of machine learning methods that can be developed in the future.

Related papers

Cost-Efficient Continual Learning with Sufficient Exemplar Memory [55.77835198580209]
Continual learning (CL) research typically assumes highly constrained exemplar memory resources. In this work, we investigate CL in a novel setting where exemplar memory is ample. Our method achieves state-of-the-art performance while reducing the computational cost to a quarter or third of existing methods.
arXiv Detail & Related papers (2025-02-11T05:40:52Z)
SAGA: Synthesis Augmentation with Genetic Algorithms for In-Memory Sequence Optimization [0.0]
MAGIC, or Memristor Aided Logic, is an approach which uses memory circuits which physically perform computation through write operations to memory. We detail the formation and implementation of these genetic algorithms and evaluate them over a number of open circuit implementations. Over the 10 benchmark circuits evaluated, these modifications lead to an overall improvement in the efficiency of in-memory circuit evaluation of 128% in the best case and 27.5% on average.
arXiv Detail & Related papers (2024-06-14T03:00:42Z)
Hierarchical Context Merging: Better Long Context Understanding for Pre-trained LLMs [61.40047491337793]
We present Hierarchical cOntext MERging (HOMER), a new training-free scheme designed to overcome the limitations of large language models. HomeR uses a divide-and-conquer algorithm, dividing long inputs into manageable chunks. A token reduction technique precedes each merging, ensuring memory usage efficiency.
arXiv Detail & Related papers (2024-04-16T06:34:08Z)
Fast, Scalable, Warm-Start Semidefinite Programming with Spectral Bundling and Sketching [53.91395791840179]
We present Unified Spectral Bundling with Sketching (USBS), a provably correct, fast and scalable algorithm for solving massive SDPs. USBS provides a 500x speed-up over the state-of-the-art scalable SDP solver on an instance with over 2 billion decision variables.
arXiv Detail & Related papers (2023-12-19T02:27:22Z)
Constant Memory Attention Block [74.38724530521277]
Constant Memory Attention Block (CMAB) is a novel general-purpose attention block that computes its output in constant memory and performs updates in constant computation. We show our proposed methods achieve results competitive with state-of-the-art while being significantly more memory efficient.
arXiv Detail & Related papers (2023-06-21T22:41:58Z)
Optimizing Memory Mapping Using Deep Reinforcement Learning [29.48627805378257]
This paper focuses on the memory mapping problem that occurs during compilation of machine learning programs. We introduce an approach for solving the memory mapping problem using Reinforcement Learning. We also introduce a Reinforcement Learning agent, mallocMuZero, and show that it is capable of playing this game to discover new and improved memory mapping solutions.
arXiv Detail & Related papers (2023-05-11T11:55:16Z)
Memory-Efficient Differentiable Programming for Quantum Optimal Control of Discrete Lattices [1.5012666537539614]
Quantum optimal control problems are typically solved by gradient-based algorithms such as GRAPE. QOC reveals that memory requirements are a barrier for simulating large models or long time spans. We employ a nonstandard differentiable programming approach that significantly reduces the memory requirements at the cost of a reasonable amount of recomputation.
arXiv Detail & Related papers (2022-10-15T20:59:23Z)
Reducing Memory Requirements of Quantum Optimal Control [0.0]
gradient-based algorithms such as GRAPE suffer from exponential growth in storage with increasing number of qubits and linear growth in memory requirements with increasing number of time steps. We have created a nonstandard automatic differentiation technique that can compute gradients needed by GRAPE by exploiting the fact that the inverse of a unitary matrix is its conjugate transpose. Our approach significantly reduces the memory requirements for GRAPE, at the cost of a reasonable amount of recomputation.
arXiv Detail & Related papers (2022-03-23T20:42:54Z)
Photonic co-processors in HPC: using LightOn OPUs for Randomized Numerical Linear Algebra [53.13961454500934]
We show that the randomization step for dimensionality reduction may itself become the computational bottleneck on traditional hardware. We show that randomization can be significantly accelerated, at negligible precision loss, in a wide range of important RandNLA algorithms.
arXiv Detail & Related papers (2021-04-29T15:48:52Z)
Kernel methods through the roof: handling billions of points efficiently [94.31450736250918]
Kernel methods provide an elegant and principled approach to nonparametric learning, but so far could hardly be used in large scale problems. Recent advances have shown the benefits of a number of algorithmic ideas, for example combining optimization, numerical linear algebra and random projections. Here, we push these efforts further to develop and test a solver that takes full advantage of GPU hardware.
arXiv Detail & Related papers (2020-06-18T08:16:25Z)
PolyDL: Polyhedral Optimizations for Creation of High Performance DL primitives [55.79741270235602]
We present compiler algorithms to automatically generate high performance implementations of Deep Learning primitives. We develop novel data reuse analysis algorithms using the polyhedral model. We also show that such a hybrid compiler plus a minimal library-use approach results in state-of-the-art performance.
arXiv Detail & Related papers (2020-06-02T06:44:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.