Complexity of Symbolic Representation in Working Memory of Transformer Correlates with the Complexity of a Task
- URL: http://arxiv.org/abs/2406.14213v1
- Date: Thu, 20 Jun 2024 11:27:29 GMT
- Title: Complexity of Symbolic Representation in Working Memory of Transformer Correlates with the Complexity of a Task
- Authors: Alsu Sagirova, Mikhail Burtsev,
- Abstract summary: This paper explores the properties of the content of symbolic working memory added to the Transformer model decoder.
translated text keywords are stored in the working memory, pointing to the relevance of memory content to the processed text.
The diversity of tokens and parts of speech stored in memory correlates with the complexity of the corpora for machine translation task.
- Score: 3.1331371767476366
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Even though Transformers are extensively used for Natural Language Processing tasks, especially for machine translation, they lack an explicit memory to store key concepts of processed texts. This paper explores the properties of the content of symbolic working memory added to the Transformer model decoder. Such working memory enhances the quality of model predictions in machine translation task and works as a neural-symbolic representation of information that is important for the model to make correct translations. The study of memory content revealed that translated text keywords are stored in the working memory, pointing to the relevance of memory content to the processed text. Also, the diversity of tokens and parts of speech stored in memory correlates with the complexity of the corpora for machine translation task.
Related papers
- Enhancing Length Extrapolation in Sequential Models with Pointer-Augmented Neural Memory [66.88278207591294]
We propose Pointer-Augmented Neural Memory (PANM) to help neural networks understand and apply symbol processing to new, longer sequences of data.
PANM integrates an external neural memory that uses novel physical addresses and pointer manipulation techniques to mimic human and computer symbol processing abilities.
arXiv Detail & Related papers (2024-04-18T03:03:46Z) - Cached Transformers: Improving Transformers with Differentiable Memory
Cache [71.28188777209034]
This work introduces a new Transformer model called Cached Transformer.
It uses Gated Recurrent Cached (GRC) attention to extend the self-attention mechanism with a differentiable memory cache of tokens.
arXiv Detail & Related papers (2023-12-20T03:30:51Z) - Implicit Memory Transformer for Computationally Efficient Simultaneous
Speech Translation [0.20305676256390928]
We propose an Implicit Memory Transformer that implicitly retains memory through a new left context method.
Experiments on the MuST-C dataset show that the Implicit Memory Transformer provides a substantial speedup on the encoder forward pass.
arXiv Detail & Related papers (2023-07-03T22:20:21Z) - Word Order Matters when you Increase Masking [70.29624135819884]
We study the effect of removing position encodings on the pre-training objective itself, to test whether models can reconstruct position information from co-occurrences alone.
We find that the necessity of position information increases with the amount of masking, and that masked language models without position encodings are not able to reconstruct this information on the task.
arXiv Detail & Related papers (2022-11-08T18:14:04Z) - Stateful Memory-Augmented Transformers for Efficient Dialogue Modeling [69.31802246621963]
We propose a novel memory-augmented transformer that is compatible with existing pre-trained encoder-decoder models.
By incorporating a separate memory module alongside the pre-trained transformer, the model can effectively interchange information between the memory states and the current input context.
arXiv Detail & Related papers (2022-09-15T22:37:22Z) - Memory-Driven Text-to-Image Generation [126.58244124144827]
We introduce a memory-driven semi-parametric approach to text-to-image generation.
Non-parametric component is a memory bank of image features constructed from a training set of images.
parametric component is a generative adversarial network.
arXiv Detail & Related papers (2022-08-15T06:32:57Z) - Recurrent Memory Transformer [0.3529736140137003]
We study a memory-augmented segment-level recurrent Transformer (Recurrent Memory Transformer)
We implement a memory mechanism with no changes to Transformer model by adding special memory tokens to the input or output sequence.
Our model performs on par with the Transformer-XL on language modeling for smaller memory sizes and outperforms it for tasks that require longer sequence processing.
arXiv Detail & Related papers (2022-07-14T13:00:22Z) - Entropic Associative Memory for Manuscript Symbols [0.0]
Manuscript symbols can be stored, recognized and retrieved from an entropic digital memory that is associative and distributed but yet declarative.
We discuss the operational characteristics of the entropic associative memory for retrieving objects with both complete and incomplete information.
arXiv Detail & Related papers (2022-02-17T02:29:33Z) - Remember What You have drawn: Semantic Image Manipulation with Memory [84.74585786082388]
We propose a memory-based Image Manipulation Network (MIM-Net) to generate realistic and text-conformed manipulated images.
To learn a robust memory, we propose a novel randomized memory training loss.
Experiments on the four popular datasets show the better performance of our method compared to the existing ones.
arXiv Detail & Related papers (2021-07-27T03:41:59Z) - Learning to Summarize Long Texts with Memory Compression and Transfer [3.5407857489235206]
We introduce Mem2Mem, a memory-to-memory mechanism for hierarchical recurrent neural network based encoder decoder architectures.
Our memory regularization compresses an encoded input article into a more compact set of sentence representations.
arXiv Detail & Related papers (2020-10-21T21:45:44Z) - Memory Transformer [0.31406146587437894]
Transformer-based models have achieved state-of-the-art results in many natural language processing tasks.
Memory-augmented neural networks (MANNs) extend traditional neural architectures with general-purpose memory for representations.
We evaluate these memory augmented Transformers and demonstrate that presence of memory positively correlates with the model performance.
arXiv Detail & Related papers (2020-06-20T09:06:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.