Memory Safe Computations with XLA Compiler
- URL: http://arxiv.org/abs/2206.14148v1
- Date: Tue, 28 Jun 2022 16:59:28 GMT
- Title: Memory Safe Computations with XLA Compiler
- Authors: Artem Artemev, Tilman Roeder, Mark van der Wilk
- Abstract summary: XLA compiler extension adjusts the representation of an algorithm according to a user-specified memory limit.
We show that k-nearest neighbour and sparse Gaussian process regression methods can be run at a much larger scale on a single device.
- Score: 14.510796427699459
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Software packages like TensorFlow and PyTorch are designed to support linear
algebra operations, and their speed and usability determine their success.
However, by prioritising speed, they often neglect memory requirements. As a
consequence, the implementations of memory-intensive algorithms that are
convenient in terms of software design can often not be run for large problems
due to memory overflows. Memory-efficient solutions require complex programming
approaches with significant logic outside the computational framework. This
impairs the adoption and use of such algorithms. To address this, we developed
an XLA compiler extension that adjusts the computational data-flow
representation of an algorithm according to a user-specified memory limit. We
show that k-nearest neighbour and sparse Gaussian process regression methods
can be run at a much larger scale on a single device, where standard
implementations would have failed. Our approach leads to better use of hardware
resources. We believe that further focus on removing memory constraints at a
compiler level will widen the range of machine learning methods that can be
developed in the future.
Related papers
- SAGA: Synthesis Augmentation with Genetic Algorithms for In-Memory Sequence Optimization [0.0]
MAGIC, or Memristor Aided Logic, is an approach which uses memory circuits which physically perform computation through write operations to memory.
We detail the formation and implementation of these genetic algorithms and evaluate them over a number of open circuit implementations.
Over the 10 benchmark circuits evaluated, these modifications lead to an overall improvement in the efficiency of in-memory circuit evaluation of 128% in the best case and 27.5% on average.
arXiv Detail & Related papers (2024-06-14T03:00:42Z) - Hierarchical Context Merging: Better Long Context Understanding for Pre-trained LLMs [61.40047491337793]
We present Hierarchical cOntext MERging (HOMER), a new training-free scheme designed to overcome the limitations of large language models.
HomeR uses a divide-and-conquer algorithm, dividing long inputs into manageable chunks.
A token reduction technique precedes each merging, ensuring memory usage efficiency.
arXiv Detail & Related papers (2024-04-16T06:34:08Z) - Fast, Scalable, Warm-Start Semidefinite Programming with Spectral
Bundling and Sketching [53.91395791840179]
We present Unified Spectral Bundling with Sketching (USBS), a provably correct, fast and scalable algorithm for solving massive SDPs.
USBS provides a 500x speed-up over the state-of-the-art scalable SDP solver on an instance with over 2 billion decision variables.
arXiv Detail & Related papers (2023-12-19T02:27:22Z) - Constant Memory Attention Block [74.38724530521277]
Constant Memory Attention Block (CMAB) is a novel general-purpose attention block that computes its output in constant memory and performs updates in constant computation.
We show our proposed methods achieve results competitive with state-of-the-art while being significantly more memory efficient.
arXiv Detail & Related papers (2023-06-21T22:41:58Z) - Optimizing Memory Mapping Using Deep Reinforcement Learning [29.48627805378257]
This paper focuses on the memory mapping problem that occurs during compilation of machine learning programs.
We introduce an approach for solving the memory mapping problem using Reinforcement Learning.
We also introduce a Reinforcement Learning agent, mallocMuZero, and show that it is capable of playing this game to discover new and improved memory mapping solutions.
arXiv Detail & Related papers (2023-05-11T11:55:16Z) - Memory-Efficient Differentiable Programming for Quantum Optimal Control
of Discrete Lattices [1.5012666537539614]
Quantum optimal control problems are typically solved by gradient-based algorithms such as GRAPE.
QOC reveals that memory requirements are a barrier for simulating large models or long time spans.
We employ a nonstandard differentiable programming approach that significantly reduces the memory requirements at the cost of a reasonable amount of recomputation.
arXiv Detail & Related papers (2022-10-15T20:59:23Z) - Reducing Memory Requirements of Quantum Optimal Control [0.0]
gradient-based algorithms such as GRAPE suffer from exponential growth in storage with increasing number of qubits and linear growth in memory requirements with increasing number of time steps.
We have created a nonstandard automatic differentiation technique that can compute gradients needed by GRAPE by exploiting the fact that the inverse of a unitary matrix is its conjugate transpose.
Our approach significantly reduces the memory requirements for GRAPE, at the cost of a reasonable amount of recomputation.
arXiv Detail & Related papers (2022-03-23T20:42:54Z) - Photonic co-processors in HPC: using LightOn OPUs for Randomized
Numerical Linear Algebra [53.13961454500934]
We show that the randomization step for dimensionality reduction may itself become the computational bottleneck on traditional hardware.
We show that randomization can be significantly accelerated, at negligible precision loss, in a wide range of important RandNLA algorithms.
arXiv Detail & Related papers (2021-04-29T15:48:52Z) - Kernel methods through the roof: handling billions of points efficiently [94.31450736250918]
Kernel methods provide an elegant and principled approach to nonparametric learning, but so far could hardly be used in large scale problems.
Recent advances have shown the benefits of a number of algorithmic ideas, for example combining optimization, numerical linear algebra and random projections.
Here, we push these efforts further to develop and test a solver that takes full advantage of GPU hardware.
arXiv Detail & Related papers (2020-06-18T08:16:25Z) - PolyDL: Polyhedral Optimizations for Creation of High Performance DL
primitives [55.79741270235602]
We present compiler algorithms to automatically generate high performance implementations of Deep Learning primitives.
We develop novel data reuse analysis algorithms using the polyhedral model.
We also show that such a hybrid compiler plus a minimal library-use approach results in state-of-the-art performance.
arXiv Detail & Related papers (2020-06-02T06:44:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.