Neural Attention Memory
- URL: http://arxiv.org/abs/2302.09422v2
- Date: Sat, 14 Oct 2023 04:36:47 GMT
- Title: Neural Attention Memory
- Authors: Hyoungwook Nam, Seung Byum Seo
- Abstract summary: We propose a novel perspective of the attention mechanism by reinventing it as a memory architecture for neural networks, namely Neural Attention Memory (NAM)
NAM is a memory structure that is both readable and writable via differentiable linear algebra operations.
We explore three use cases of NAM: memory-augmented neural network (MANN), few-shot learning, and efficient long-range attention.
- Score: 6.345523830122167
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We propose a novel perspective of the attention mechanism by reinventing it
as a memory architecture for neural networks, namely Neural Attention Memory
(NAM). NAM is a memory structure that is both readable and writable via
differentiable linear algebra operations. We explore three use cases of NAM:
memory-augmented neural network (MANN), few-shot learning, and efficient
long-range attention. First, we design two NAM-based MANNs of Long Short-term
Memory (LSAM) and NAM Turing Machine (NAM-TM) that show better computational
powers in algorithmic zero-shot generalization tasks compared to other
baselines such as differentiable neural computer (DNC). Next, we apply NAM to
the N-way K-shot learning task and show that it is more effective at reducing
false positives compared to the baseline cosine classifier. Finally, we
implement an efficient Transformer with NAM and evaluate it with long-range
arena tasks to show that NAM can be an efficient and effective alternative for
scaled dot-product attention.
Related papers
- Gaussian Process Neural Additive Models [3.7969209746164325]
We propose a new subclass of Neural Additive Models (NAMs) that use a single-layer neural network construction of the Gaussian process via random Fourier features.
GP-NAMs have the advantage of a convex objective function and number of trainable parameters that grows linearly with feature dimensionality.
We show that GP-NAM achieves comparable or better performance in both classification and regression tasks with a large reduction in the number of parameters.
arXiv Detail & Related papers (2024-02-19T20:29:34Z) - Heterogenous Memory Augmented Neural Networks [84.29338268789684]
We introduce a novel heterogeneous memory augmentation approach for neural networks.
By introducing learnable memory tokens with attention mechanism, we can effectively boost performance without huge computational overhead.
We show our approach on various image and graph-based tasks under both in-distribution (ID) and out-of-distribution (OOD) conditions.
arXiv Detail & Related papers (2023-10-17T01:05:28Z) - Memory Efficient Neural Processes via Constant Memory Attention Block [55.82269384896986]
Constant Memory Attentive Neural Processes (CMANPs) are an NP variant that only requires constant memory.
We show CMANPs achieve state-of-the-art results on popular NP benchmarks while being significantly more memory efficient than prior methods.
arXiv Detail & Related papers (2023-05-23T23:10:19Z) - Nesting Forward Automatic Differentiation for Memory-Efficient Deep
Neural Network Training [23.536294640280087]
We propose the nested forward automatic differentiation (Forward-AD) for the element-wise activation function for memory-efficient training.
Our evaluation shows that nested Forward-AD reduces the memory footprint up to 1.97x than the baseline model.
arXiv Detail & Related papers (2022-09-22T04:48:48Z) - GLEAM: Greedy Learning for Large-Scale Accelerated MRI Reconstruction [50.248694764703714]
Unrolled neural networks have recently achieved state-of-the-art accelerated MRI reconstruction.
These networks unroll iterative optimization algorithms by alternating between physics-based consistency and neural-network based regularization.
We propose Greedy LEarning for Accelerated MRI reconstruction, an efficient training strategy for high-dimensional imaging settings.
arXiv Detail & Related papers (2022-07-18T06:01:29Z) - Neural Additive Models for Nowcasting [1.8275108630751844]
We propose neural additive models (NAMs) to provide explanatory power for neural network predictions.
We show that the proposed NAM-NC successfully explains each input value's importance for multiple variables and time steps.
We also examine parameter-sharing networks using NAM-NC to decrease their complexity, and NAM-MC's hard-tied feature net extracted explanations with good performance.
arXiv Detail & Related papers (2022-05-20T08:25:18Z) - Universal Hopfield Networks: A General Framework for Single-Shot
Associative Memory Models [41.58529335439799]
We propose a general framework for understanding the operation of memory networks as a sequence of three operations.
We derive all these memory models as instances of our general framework with differing similarity and separation functions.
arXiv Detail & Related papers (2022-02-09T16:48:06Z) - Training Feedback Spiking Neural Networks by Implicit Differentiation on
the Equilibrium State [66.2457134675891]
Spiking neural networks (SNNs) are brain-inspired models that enable energy-efficient implementation on neuromorphic hardware.
Most existing methods imitate the backpropagation framework and feedforward architectures for artificial neural networks.
We propose a novel training method that does not rely on the exact reverse of the forward computation.
arXiv Detail & Related papers (2021-09-29T07:46:54Z) - Incremental Training of a Recurrent Neural Network Exploiting a
Multi-Scale Dynamic Memory [79.42778415729475]
We propose a novel incrementally trained recurrent architecture targeting explicitly multi-scale learning.
We show how to extend the architecture of a simple RNN by separating its hidden state into different modules.
We discuss a training algorithm where new modules are iteratively added to the model to learn progressively longer dependencies.
arXiv Detail & Related papers (2020-06-29T08:35:49Z) - Neural Additive Models: Interpretable Machine Learning with Neural Nets [77.66871378302774]
Deep neural networks (DNNs) are powerful black-box predictors that have achieved impressive performance on a wide variety of tasks.
We propose Neural Additive Models (NAMs) which combine some of the expressivity of DNNs with the inherent intelligibility of generalized additive models.
NAMs learn a linear combination of neural networks that each attend to a single input feature.
arXiv Detail & Related papers (2020-04-29T01:28:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.