Related papers: Implicit Memory Transformer for Computationally Efficient Simultaneous Speech Translation

Implicit Memory Transformer for Computationally Efficient Simultaneous Speech Translation

URL: http://arxiv.org/abs/2307.01381v1
Date: Mon, 3 Jul 2023 22:20:21 GMT
Title: Implicit Memory Transformer for Computationally Efficient Simultaneous Speech Translation
Authors: Matthew Raffel, Lizhong Chen
Abstract summary: We propose an Implicit Memory Transformer that implicitly retains memory through a new left context method. Experiments on the MuST-C dataset show that the Implicit Memory Transformer provides a substantial speedup on the encoder forward pass.
Score: 0.20305676256390928
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Simultaneous speech translation is an essential communication task difficult for humans whereby a translation is generated concurrently with oncoming speech inputs. For such a streaming task, transformers using block processing to break an input sequence into segments have achieved state-of-the-art performance at a reduced cost. Current methods to allow information to propagate across segments, including left context and memory banks, have faltered as they are both insufficient representations and unnecessarily expensive to compute. In this paper, we propose an Implicit Memory Transformer that implicitly retains memory through a new left context method, removing the need to explicitly represent memory with memory banks. We generate the left context from the attention output of the previous segment and include it in the keys and values of the current segment's attention calculation. Experiments on the MuST-C dataset show that the Implicit Memory Transformer provides a substantial speedup on the encoder forward pass with nearly identical translation quality when compared with the state-of-the-art approach that employs both left context and memory banks.

Related papers

Compact Recurrent Transformer with Persistent Memory [16.48606806238812]
The Transformer architecture has shown significant success in many language processing and visual tasks.<n>We propose a novel and efficient Compact Recurrent Transformer (CRT)<n>CRT combines shallow Transformer models that process short local segments with recurrent neural networks to compress and manage a single persistent memory vector.<n>We evaluate CRT on WordPTB and WikiText-103 for next-token-prediction tasks, as well as on the Toyota Smarthome video dataset for classification.
arXiv Detail & Related papers (2025-05-02T00:11:44Z)
EdgeInfinite: A Memory-Efficient Infinite-Context Transformer for Edge Devices [3.739419555718102]
Transformer-based large language models (LLMs) encounter challenges in processing long sequences on edge devices. We present EdgeInfinite, a memory-efficient solution for infinite contexts that integrates compressed memory into Transformer-based LLMs.
arXiv Detail & Related papers (2025-03-28T07:26:37Z)
UIO-LLMs: Unbiased Incremental Optimization for Long-Context LLMs [111.12010207132204]
UIO-LLMs is an incremental optimization approach for memory-enhanced transformers under long-context settings. We refine the training process using the Truncated Backpropagation Through Time (TBPTT) algorithm. UIO-LLMs successfully handle long context, such as extending the context window of Llama2-7b-chat from 4K to 100K tokens with minimal 2% additional parameters.
arXiv Detail & Related papers (2024-06-26T08:44:36Z)
Complexity of Symbolic Representation in Working Memory of Transformer Correlates with the Complexity of a Task [3.1331371767476366]
This paper explores the properties of the content of symbolic working memory added to the Transformer model decoder. translated text keywords are stored in the working memory, pointing to the relevance of memory content to the processed text. The diversity of tokens and parts of speech stored in memory correlates with the complexity of the corpora for machine translation task.
arXiv Detail & Related papers (2024-06-20T11:27:29Z)
Blockwise Parallel Transformer for Large Context Models [70.97386897478238]
Blockwise Parallel Transformer (BPT) is a blockwise computation of self-attention and feedforward network fusion to minimize memory costs. By processing longer input sequences while maintaining memory efficiency, BPT enables training sequences 32 times longer than vanilla Transformers and up to 4 times longer than previous memory-efficient methods.
arXiv Detail & Related papers (2023-05-30T19:25:51Z)
Recurrent Memory Transformer [0.3529736140137003]
We study a memory-augmented segment-level recurrent Transformer (Recurrent Memory Transformer) We implement a memory mechanism with no changes to Transformer model by adding special memory tokens to the input or output sequence. Our model performs on par with the Transformer-XL on language modeling for smaller memory sizes and outperforms it for tasks that require longer sequence processing.
arXiv Detail & Related papers (2022-07-14T13:00:22Z)
LaMemo: Language Modeling with Look-Ahead Memory [50.6248714811912]
We propose Look-Ahead Memory (LaMemo) that enhances the recurrence memory by incrementally attending to the right-side tokens. LaMemo embraces bi-directional attention and segment recurrence with an additional overhead only linearly proportional to the memory length. Experiments on widely used language modeling benchmarks demonstrate its superiority over the baselines equipped with different types of memory.
arXiv Detail & Related papers (2022-04-15T06:11:25Z)
Linearizing Transformer with Key-Value Memory Bank [54.83663647680612]
We propose MemSizer, an approach to project the source sequence into lower dimension representation. MemSizer not only achieves the same linear time complexity but also enjoys efficient recurrent-style autoregressive generation. We demonstrate that MemSizer provides an improved tradeoff between efficiency and accuracy over the vanilla transformer.
arXiv Detail & Related papers (2022-03-23T18:10:18Z)
Multimodal Transformer with Variable-length Memory for Vision-and-Language Navigation [79.1669476932147]
Vision-and-Language Navigation (VLN) is a task that an agent is required to follow a language instruction to navigate to the goal position. Recent Transformer-based VLN methods have made great progress benefiting from the direct connections between visual observations and the language instruction. We introduce Multimodal Transformer with Variable-length Memory (MTVM) for visually-grounded natural language navigation.
arXiv Detail & Related papers (2021-11-10T16:04:49Z)
Streaming Simultaneous Speech Translation with Augmented Memory Transformer [29.248366441276662]
Transformer-based models have achieved state-of-the-art performance on speech translation tasks. We propose an end-to-end transformer-based sequence-to-sequence model, equipped with an augmented memory transformer encoder.
arXiv Detail & Related papers (2020-10-30T18:28:42Z)
Learning to Summarize Long Texts with Memory Compression and Transfer [3.5407857489235206]
We introduce Mem2Mem, a memory-to-memory mechanism for hierarchical recurrent neural network based encoder decoder architectures. Our memory regularization compresses an encoded input article into a more compact set of sentence representations.
arXiv Detail & Related papers (2020-10-21T21:45:44Z)
Memory Transformer [0.31406146587437894]
Transformer-based models have achieved state-of-the-art results in many natural language processing tasks. Memory-augmented neural networks (MANNs) extend traditional neural architectures with general-purpose memory for representations. We evaluate these memory augmented Transformers and demonstrate that presence of memory positively correlates with the model performance.
arXiv Detail & Related papers (2020-06-20T09:06:27Z)
MART: Memory-Augmented Recurrent Transformer for Coherent Video Paragraph Captioning [128.36951818335046]
We propose a new approach called Memory-Augmented Recurrent Transformer (MART) MART uses a memory module to augment the transformer architecture. MART generates more coherent and less repetitive paragraph captions than baseline methods.
arXiv Detail & Related papers (2020-05-11T20:01:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.