Related papers: Enhancing Length Extrapolation in Sequential Models with Pointer-Augmented Neural Memory

Enhancing Length Extrapolation in Sequential Models with Pointer-Augmented Neural Memory

URL: http://arxiv.org/abs/2404.11870v1
Date: Thu, 18 Apr 2024 03:03:46 GMT
Title: Enhancing Length Extrapolation in Sequential Models with Pointer-Augmented Neural Memory
Authors: Hung Le, Dung Nguyen, Kien Do, Svetha Venkatesh, Truyen Tran,
Abstract summary: We propose Pointer-Augmented Neural Memory (PANM) to help neural networks understand and apply symbol processing to new, longer sequences of data. PANM integrates an external neural memory that uses novel physical addresses and pointer manipulation techniques to mimic human and computer symbol processing abilities.
Score: 66.88278207591294
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We propose Pointer-Augmented Neural Memory (PANM) to help neural networks understand and apply symbol processing to new, longer sequences of data. PANM integrates an external neural memory that uses novel physical addresses and pointer manipulation techniques to mimic human and computer symbol processing abilities. PANM facilitates pointer assignment, dereference, and arithmetic by explicitly using physical pointers to access memory content. Remarkably, it can learn to perform these operations through end-to-end training on sequence data, powering various sequential models. Our experiments demonstrate PANM's exceptional length extrapolating capabilities and improved performance in tasks that require symbol processing, such as algorithmic reasoning and Dyck language recognition. PANM helps Transformer achieve up to 100% generalization accuracy in compositional learning tasks and significantly better results in mathematical reasoning, question answering and machine translation tasks.

Related papers

Quantum Convolutional Neural Network with Flexible Stride [7.362858964229726]
We propose a novel quantum convolutional neural network algorithm. It can flexibly adjust the stride to accommodate different tasks. It can achieve exponential acceleration of data scale in less memory compared with its classical counterpart.
arXiv Detail & Related papers (2024-12-01T02:37:06Z)
Spiking representation learning for associative memories [0.0]
We introduce a novel artificial spiking neural network (SNN) that performs unsupervised representation learning and associative memory operations. The architecture of our model derives from the neocortical columnar organization and combines feedforward projections for learning hidden representations and recurrent projections for forming associative memories.
arXiv Detail & Related papers (2024-06-05T08:30:11Z)
PARMESAN: Parameter-Free Memory Search and Transduction for Dense Prediction Tasks [5.5127111704068374]
This work addresses flexibility in deep learning by means of transductive reasoning. We propose PARMESAN, a scalable method which leverages a memory module for solving dense prediction tasks. Our method is compatible with commonly used architectures and canonically transfers to 1D, 2D, and 3D grid-based data.
arXiv Detail & Related papers (2024-03-18T12:55:40Z)
Token Turing Machines [53.22971546637947]
Token Turing Machines (TTM) is a sequential, autoregressive Transformer model with memory for real-world sequential visual understanding. Our model is inspired by the seminal Neural Turing Machine, and has an external memory consisting of a set of tokens which summarise the previous history.
arXiv Detail & Related papers (2022-11-16T18:59:18Z)
Inducing Gaussian Process Networks [80.40892394020797]
We propose inducing Gaussian process networks (IGN), a simple framework for simultaneously learning the feature space as well as the inducing points. The inducing points, in particular, are learned directly in the feature space, enabling a seamless representation of complex structured domains. We report on experimental results for real-world data sets showing that IGNs provide significant advances over state-of-the-art methods.
arXiv Detail & Related papers (2022-04-21T05:27:09Z)
Graph Convolutional Memory for Deep Reinforcement Learning [8.229775890542967]
We present graph convolutional memory (GCM) for solving POMDPs using deep reinforcement learning. Unlike recurrent neural networks (RNNs) or transformers, GCM embeds domain-specific priors into the memory recall process via a knowledge graph. Using graph convolutions, GCM extracts hierarchical graph features, analogous to image features in a convolutional neural network (CNN)
arXiv Detail & Related papers (2021-06-27T00:22:51Z)
Robust High-dimensional Memory-augmented Neural Networks [13.82206983716435]
Memory-augmented neural networks enhance neural networks with an explicit memory to overcome these issues. Access to this explicit memory occurs via soft read and write operations involving every individual memory entry. We propose a robust architecture that employs a computational memory unit as the explicit memory performing analog in-memory computation on high-dimensional (HD) vectors.
arXiv Detail & Related papers (2020-10-05T12:01:56Z)
Optimizing Memory Placement using Evolutionary Graph Reinforcement Learning [56.83172249278467]
We introduce Evolutionary Graph Reinforcement Learning (EGRL), a method designed for large search spaces. We train and validate our approach directly on the Intel NNP-I chip for inference. We additionally achieve 28-78% speed-up compared to the native NNP-I compiler on all three workloads.
arXiv Detail & Related papers (2020-07-14T18:50:12Z)
Incremental Training of a Recurrent Neural Network Exploiting a Multi-Scale Dynamic Memory [79.42778415729475]
We propose a novel incrementally trained recurrent architecture targeting explicitly multi-scale learning. We show how to extend the architecture of a simple RNN by separating its hidden state into different modules. We discuss a training algorithm where new modules are iteratively added to the model to learn progressively longer dependencies.
arXiv Detail & Related papers (2020-06-29T08:35:49Z)
Memory Transformer [0.31406146587437894]
Transformer-based models have achieved state-of-the-art results in many natural language processing tasks. Memory-augmented neural networks (MANNs) extend traditional neural architectures with general-purpose memory for representations. We evaluate these memory augmented Transformers and demonstrate that presence of memory positively correlates with the model performance.
arXiv Detail & Related papers (2020-06-20T09:06:27Z)
Encoding-based Memory Modules for Recurrent Neural Networks [79.42778415729475]
We study the memorization subtask from the point of view of the design and training of recurrent neural networks. We propose a new model, the Linear Memory Network, which features an encoding-based memorization component built with a linear autoencoder for sequences.
arXiv Detail & Related papers (2020-01-31T11:14:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.