Related papers: Memory Efficient Neural Processes via Constant Memory Attention Block

Memory Efficient Neural Processes via Constant Memory Attention Block

URL: http://arxiv.org/abs/2305.14567v3
Date: Mon, 27 May 2024 17:06:51 GMT
Title: Memory Efficient Neural Processes via Constant Memory Attention Block
Authors: Leo Feng, Frederick Tung, Hossein Hajimirsadeghi, Yoshua Bengio, Mohamed Osama Ahmed,
Abstract summary: Constant Memory Attentive Neural Processes (CMANPs) are an NP variant that only requires constant memory. We show CMANPs achieve state-of-the-art results on popular NP benchmarks while being significantly more memory efficient than prior methods.
Score: 55.82269384896986
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Neural Processes (NPs) are popular meta-learning methods for efficiently modelling predictive uncertainty. Recent state-of-the-art methods, however, leverage expensive attention mechanisms, limiting their applications, particularly in low-resource settings. In this work, we propose Constant Memory Attentive Neural Processes (CMANPs), an NP variant that only requires constant memory. To do so, we first propose an efficient update operation for Cross Attention. Leveraging the update operation, we propose Constant Memory Attention Block (CMAB), a novel attention block that (i) is permutation invariant, (ii) computes its output in constant memory, and (iii) performs constant computation updates. Finally, building on CMAB, we detail Constant Memory Attentive Neural Processes. Empirically, we show CMANPs achieve state-of-the-art results on popular NP benchmarks while being significantly more memory efficient than prior methods.

Related papers

Logarithmic Memory Networks (LMNs): Efficient Long-Range Sequence Modeling for Resource-Constrained Environments [0.0]
This paper introduces Logarithmic Memory Networks (LMNs), a novel architecture that leverages a hierarchical logarithmic tree structure to efficiently store and retrieve past information. LMNs dynamically summarize historical context, significantly reducing the memory footprint and computational complexity of attention mechanisms. These features make LMNs a robust and scalable solution for processing long-range sequences in resource-constrained environments.
arXiv Detail & Related papers (2025-01-14T07:50:09Z)
Towards Model-Size Agnostic, Compute-Free, Memorization-based Inference of Deep Learning [5.41530201129053]
This paper proposes a novel memorization-based inference (MBI) that is compute free and only requires lookups. Specifically, our work capitalizes on the inference mechanism of the recurrent attention model (RAM) By leveraging the low-dimensionality of glimpse, our inference procedure stores key value pairs comprising of glimpse location, patch vector, etc. in a table. The computations are obviated during inference by utilizing the table to read out key-value pairs and performing compute-free inference by memorization.
arXiv Detail & Related papers (2023-07-14T21:01:59Z)
Constant Memory Attention Block [74.38724530521277]
Constant Memory Attention Block (CMAB) is a novel general-purpose attention block that computes its output in constant memory and performs updates in constant computation. We show our proposed methods achieve results competitive with state-of-the-art while being significantly more memory efficient.
arXiv Detail & Related papers (2023-06-21T22:41:58Z)
Blockwise Parallel Transformer for Large Context Models [70.97386897478238]
Blockwise Parallel Transformer (BPT) is a blockwise computation of self-attention and feedforward network fusion to minimize memory costs. By processing longer input sequences while maintaining memory efficiency, BPT enables training sequences 32 times longer than vanilla Transformers and up to 4 times longer than previous memory-efficient methods.
arXiv Detail & Related papers (2023-05-30T19:25:51Z)
Neural Attention Memory [6.345523830122167]
We propose a novel perspective of the attention mechanism by reinventing it as a memory architecture for neural networks, namely Neural Attention Memory (NAM) NAM is a memory structure that is both readable and writable via differentiable linear algebra operations. We explore three use cases of NAM: memory-augmented neural network (MANN), few-shot learning, and efficient long-range attention.
arXiv Detail & Related papers (2023-02-18T21:19:21Z)
Versatile Neural Processes for Learning Implicit Neural Representations [57.090658265140384]
We propose Versatile Neural Processes (VNP), which largely increases the capability of approximating functions. Specifically, we introduce a bottleneck encoder that produces fewer and informative context tokens, relieving the high computational cost. We demonstrate the effectiveness of the proposed VNP on a variety of tasks involving 1D, 2D and 3D signals.
arXiv Detail & Related papers (2023-01-21T04:08:46Z)
Pex: Memory-efficient Microcontroller Deep Learning through Partial Execution [11.336229510791481]
We discuss a novel execution paradigm for microcontroller deep learning. It modifies the execution of neural networks to avoid materialising full buffers in memory. This is achieved by exploiting the properties of operators, which can consume/produce a fraction of their input/output at a time.
arXiv Detail & Related papers (2022-11-30T18:47:30Z)
ABC: Attention with Bounded-memory Control [67.40631793251997]
We show that bounded-memory control (ABC) can be subsumed into one abstraction, attention with bounded-memory control (ABC) ABC reveals new, unexplored possibilities. First, it connects several efficient attention variants that would otherwise seem apart. Last, we present a new instance of ABC, which draws inspiration from existing ABC approaches, but replaces their memory-organizing functions with a learned, contextualized one.
arXiv Detail & Related papers (2021-10-06T03:53:25Z)
Learning the Step-size Policy for the Limited-Memory Broyden-Fletcher-Goldfarb-Shanno Algorithm [3.7470451129384825]
We consider the problem of how to learn a step-size policy for the Limited-Memory Broyden-Fletcher-Goldfarb-Shanno (L-BFGS) algorithm. We propose a neural network architecture with local information of the current gradient as the input. The step-length policy is learned from data of similar optimization problems, avoids additional evaluations of the objective function, and guarantees that the output step remains inside a pre-defined interval.
arXiv Detail & Related papers (2020-10-03T09:34:03Z)
Bootstrapping Neural Processes [114.97111530885093]
Neural Processes (NPs) implicitly define a broad class of processes with neural networks. NPs still rely on an assumption that uncertainty in processes is modeled by a single latent variable. We propose the Boostrapping Neural Process (BNP), a novel extension of the NP family using the bootstrap.
arXiv Detail & Related papers (2020-08-07T02:23:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.