Token Turing Machines
- URL: http://arxiv.org/abs/2211.09119v2
- Date: Thu, 13 Apr 2023 15:23:10 GMT
- Title: Token Turing Machines
- Authors: Michael S. Ryoo, Keerthana Gopalakrishnan, Kumara Kahatapitiya, Ted
Xiao, Kanishka Rao, Austin Stone, Yao Lu, Julian Ibarz, Anurag Arnab
- Abstract summary: Token Turing Machines (TTM) is a sequential, autoregressive Transformer model with memory for real-world sequential visual understanding.
Our model is inspired by the seminal Neural Turing Machine, and has an external memory consisting of a set of tokens which summarise the previous history.
- Score: 53.22971546637947
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose Token Turing Machines (TTM), a sequential, autoregressive
Transformer model with memory for real-world sequential visual understanding.
Our model is inspired by the seminal Neural Turing Machine, and has an external
memory consisting of a set of tokens which summarise the previous history
(i.e., frames). This memory is efficiently addressed, read and written using a
Transformer as the processing unit/controller at each step. The model's memory
module ensures that a new observation will only be processed with the contents
of the memory (and not the entire history), meaning that it can efficiently
process long sequences with a bounded computational cost at each step. We show
that TTM outperforms other alternatives, such as other Transformer models
designed for long sequences and recurrent neural networks, on two real-world
sequential visual understanding tasks: online temporal activity detection from
videos and vision-based robot action policy learning.
Code is publicly available at:
https://github.com/google-research/scenic/tree/main/scenic/projects/token_turing
Related papers
- HMT: Hierarchical Memory Transformer for Long Context Language Processing [35.730941605490194]
Hierarchical Memory Transformer (HMT) is a novel framework that enables and improves models' long-context processing ability.
We show that HMT steadily improves the long-context processing ability of context-constrained and long-context models.
arXiv Detail & Related papers (2024-05-09T19:32:49Z) - Parallel Decoding via Hidden Transfer for Lossless Large Language Model Acceleration [54.897493351694195]
We propose a novel parallel decoding approach, namely textithidden transfer, which decodes multiple successive tokens simultaneously in a single forward pass.
In terms of acceleration metrics, we outperform all the single-model acceleration techniques, including Medusa and Self-Speculative decoding.
arXiv Detail & Related papers (2024-04-18T09:17:06Z) - Enhancing Length Extrapolation in Sequential Models with Pointer-Augmented Neural Memory [66.88278207591294]
We propose Pointer-Augmented Neural Memory (PANM) to help neural networks understand and apply symbol processing to new, longer sequences of data.
PANM integrates an external neural memory that uses novel physical addresses and pointer manipulation techniques to mimic human and computer symbol processing abilities.
arXiv Detail & Related papers (2024-04-18T03:03:46Z) - Memory-and-Anticipation Transformer for Online Action Understanding [52.24561192781971]
We propose a novel memory-anticipation-based paradigm to model an entire temporal structure, including the past, present, and future.
We present Memory-and-Anticipation Transformer (MAT), a memory-anticipation-based approach, to address the online action detection and anticipation tasks.
arXiv Detail & Related papers (2023-08-15T17:34:54Z) - Robot Learning with Sensorimotor Pre-training [98.7755895548928]
We present a self-supervised sensorimotor pre-training approach for robotics.
Our model, called RPT, is a Transformer that operates on sequences of sensorimotor tokens.
We find that sensorimotor pre-training consistently outperforms training from scratch, has favorable scaling properties, and enables transfer across different tasks, environments, and robots.
arXiv Detail & Related papers (2023-06-16T17:58:10Z) - Recurrent Memory Transformer [0.3529736140137003]
We study a memory-augmented segment-level recurrent Transformer (Recurrent Memory Transformer)
We implement a memory mechanism with no changes to Transformer model by adding special memory tokens to the input or output sequence.
Our model performs on par with the Transformer-XL on language modeling for smaller memory sizes and outperforms it for tasks that require longer sequence processing.
arXiv Detail & Related papers (2022-07-14T13:00:22Z) - Memory Transformer [0.31406146587437894]
Transformer-based models have achieved state-of-the-art results in many natural language processing tasks.
Memory-augmented neural networks (MANNs) extend traditional neural architectures with general-purpose memory for representations.
We evaluate these memory augmented Transformers and demonstrate that presence of memory positively correlates with the model performance.
arXiv Detail & Related papers (2020-06-20T09:06:27Z) - Reservoir memory machines [79.79659145328856]
We propose reservoir memory machines, which are able to solve some of the benchmark tests for Neural Turing Machines.
Our model can also be seen as an extension of echo state networks with an external memory, enabling arbitrarily long storage without interference.
arXiv Detail & Related papers (2020-02-12T01:45:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.