Related papers: Neural Attention Memory

Neural Attention Memory

URL: http://arxiv.org/abs/2302.09422v2
Date: Sat, 14 Oct 2023 04:36:47 GMT
Title: Neural Attention Memory
Authors: Hyoungwook Nam, Seung Byum Seo
Abstract summary: We propose a novel perspective of the attention mechanism by reinventing it as a memory architecture for neural networks, namely Neural Attention Memory (NAM) NAM is a memory structure that is both readable and writable via differentiable linear algebra operations. We explore three use cases of NAM: memory-augmented neural network (MANN), few-shot learning, and efficient long-range attention.
Score: 6.345523830122167
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We propose a novel perspective of the attention mechanism by reinventing it as a memory architecture for neural networks, namely Neural Attention Memory (NAM). NAM is a memory structure that is both readable and writable via differentiable linear algebra operations. We explore three use cases of NAM: memory-augmented neural network (MANN), few-shot learning, and efficient long-range attention. First, we design two NAM-based MANNs of Long Short-term Memory (LSAM) and NAM Turing Machine (NAM-TM) that show better computational powers in algorithmic zero-shot generalization tasks compared to other baselines such as differentiable neural computer (DNC). Next, we apply NAM to the N-way K-shot learning task and show that it is more effective at reducing false positives compared to the baseline cosine classifier. Finally, we implement an efficient Transformer with NAM and evaluate it with long-range arena tasks to show that NAM can be an efficient and effective alternative for scaled dot-product attention.

Related papers

MT-NAM: An Efficient and Adaptive Model for Epileptic Seizure Detection [51.87482627771981]
Micro Tree-based NAM (MT-NAM) is a distilled model based on the recently proposed Neural Additive Models (NAM) MT-NAM achieves a remarkable 100$times$ improvement in inference speed compared to standard NAM, without compromising accuracy. We evaluate our approach on the CHB-MIT scalp EEG dataset, which includes recordings from 24 patients with varying numbers of sessions and seizures.
arXiv Detail & Related papers (2025-03-11T10:14:53Z)
Titans: Learning to Memorize at Test Time [20.12643072017223]
We present a new neural long-term memory module that learns to memorize historical context. We show that this neural memory has the advantage of fast parallelizable training while maintaining a fast inference. We introduce a new family of architectures, called Titans, and present three variants to address how one can effectively incorporate memory into this architecture.
arXiv Detail & Related papers (2024-12-31T22:32:03Z)
Optimal Gradient Checkpointing for Sparse and Recurrent Architectures using Off-Chip Memory [0.8321953606016751]
We introduce memory-efficient gradient checkpointing strategies tailored for the general class of sparse RNNs and Spiking Neural Networks. We find that Double Checkpointing emerges as the most effective method, optimizing the use of local memory resources while minimizing recomputation overhead.
arXiv Detail & Related papers (2024-12-16T14:23:31Z)
Gaussian Process Neural Additive Models [3.7969209746164325]
We propose a new subclass of Neural Additive Models (NAMs) that use a single-layer neural network construction of the Gaussian process via random Fourier features. GP-NAMs have the advantage of a convex objective function and number of trainable parameters that grows linearly with feature dimensionality. We show that GP-NAM achieves comparable or better performance in both classification and regression tasks with a large reduction in the number of parameters.
arXiv Detail & Related papers (2024-02-19T20:29:34Z)
Heterogenous Memory Augmented Neural Networks [84.29338268789684]
We introduce a novel heterogeneous memory augmentation approach for neural networks. By introducing learnable memory tokens with attention mechanism, we can effectively boost performance without huge computational overhead. We show our approach on various image and graph-based tasks under both in-distribution (ID) and out-of-distribution (OOD) conditions.
arXiv Detail & Related papers (2023-10-17T01:05:28Z)
Memory Efficient Neural Processes via Constant Memory Attention Block [55.82269384896986]
Constant Memory Attentive Neural Processes (CMANPs) are an NP variant that only requires constant memory. We show CMANPs achieve state-of-the-art results on popular NP benchmarks while being significantly more memory efficient than prior methods.
arXiv Detail & Related papers (2023-05-23T23:10:19Z)
Nesting Forward Automatic Differentiation for Memory-Efficient Deep Neural Network Training [23.536294640280087]
We propose the nested forward automatic differentiation (Forward-AD) for the element-wise activation function for memory-efficient training. Our evaluation shows that nested Forward-AD reduces the memory footprint up to 1.97x than the baseline model.
arXiv Detail & Related papers (2022-09-22T04:48:48Z)
GLEAM: Greedy Learning for Large-Scale Accelerated MRI Reconstruction [50.248694764703714]
Unrolled neural networks have recently achieved state-of-the-art accelerated MRI reconstruction. These networks unroll iterative optimization algorithms by alternating between physics-based consistency and neural-network based regularization. We propose Greedy LEarning for Accelerated MRI reconstruction, an efficient training strategy for high-dimensional imaging settings.
arXiv Detail & Related papers (2022-07-18T06:01:29Z)
Neural Additive Models for Nowcasting [1.8275108630751844]
We propose neural additive models (NAMs) to provide explanatory power for neural network predictions. We show that the proposed NAM-NC successfully explains each input value's importance for multiple variables and time steps. We also examine parameter-sharing networks using NAM-NC to decrease their complexity, and NAM-MC's hard-tied feature net extracted explanations with good performance.
arXiv Detail & Related papers (2022-05-20T08:25:18Z)
Universal Hopfield Networks: A General Framework for Single-Shot Associative Memory Models [41.58529335439799]
We propose a general framework for understanding the operation of memory networks as a sequence of three operations. We derive all these memory models as instances of our general framework with differing similarity and separation functions.
arXiv Detail & Related papers (2022-02-09T16:48:06Z)
Training Feedback Spiking Neural Networks by Implicit Differentiation on the Equilibrium State [66.2457134675891]
Spiking neural networks (SNNs) are brain-inspired models that enable energy-efficient implementation on neuromorphic hardware. Most existing methods imitate the backpropagation framework and feedforward architectures for artificial neural networks. We propose a novel training method that does not rely on the exact reverse of the forward computation.
arXiv Detail & Related papers (2021-09-29T07:46:54Z)
Incremental Training of a Recurrent Neural Network Exploiting a Multi-Scale Dynamic Memory [79.42778415729475]
We propose a novel incrementally trained recurrent architecture targeting explicitly multi-scale learning. We show how to extend the architecture of a simple RNN by separating its hidden state into different modules. We discuss a training algorithm where new modules are iteratively added to the model to learn progressively longer dependencies.
arXiv Detail & Related papers (2020-06-29T08:35:49Z)
Neural Additive Models: Interpretable Machine Learning with Neural Nets [77.66871378302774]
Deep neural networks (DNNs) are powerful black-box predictors that have achieved impressive performance on a wide variety of tasks. We propose Neural Additive Models (NAMs) which combine some of the expressivity of DNNs with the inherent intelligibility of generalized additive models. NAMs learn a linear combination of neural networks that each attend to a single input feature.
arXiv Detail & Related papers (2020-04-29T01:28:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.