Implicit Memory Transformer for Computationally Efficient Simultaneous
Speech Translation
- URL: http://arxiv.org/abs/2307.01381v1
- Date: Mon, 3 Jul 2023 22:20:21 GMT
- Title: Implicit Memory Transformer for Computationally Efficient Simultaneous
Speech Translation
- Authors: Matthew Raffel, Lizhong Chen
- Abstract summary: We propose an Implicit Memory Transformer that implicitly retains memory through a new left context method.
Experiments on the MuST-C dataset show that the Implicit Memory Transformer provides a substantial speedup on the encoder forward pass.
- Score: 0.20305676256390928
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Simultaneous speech translation is an essential communication task difficult
for humans whereby a translation is generated concurrently with oncoming speech
inputs. For such a streaming task, transformers using block processing to break
an input sequence into segments have achieved state-of-the-art performance at a
reduced cost. Current methods to allow information to propagate across
segments, including left context and memory banks, have faltered as they are
both insufficient representations and unnecessarily expensive to compute. In
this paper, we propose an Implicit Memory Transformer that implicitly retains
memory through a new left context method, removing the need to explicitly
represent memory with memory banks. We generate the left context from the
attention output of the previous segment and include it in the keys and values
of the current segment's attention calculation. Experiments on the MuST-C
dataset show that the Implicit Memory Transformer provides a substantial
speedup on the encoder forward pass with nearly identical translation quality
when compared with the state-of-the-art approach that employs both left context
and memory banks.
Related papers
- UIO-LLMs: Unbiased Incremental Optimization for Long-Context LLMs [111.12010207132204]
UIO-LLMs is an incremental optimization approach for memory-enhanced transformers under long-context settings.
We refine the training process using the Truncated Backpropagation Through Time (TBPTT) algorithm.
UIO-LLMs successfully handle long context, such as extending the context window of Llama2-7b-chat from 4K to 100K tokens with minimal 2% additional parameters.
arXiv Detail & Related papers (2024-06-26T08:44:36Z) - Complexity of Symbolic Representation in Working Memory of Transformer Correlates with the Complexity of a Task [3.1331371767476366]
This paper explores the properties of the content of symbolic working memory added to the Transformer model decoder.
translated text keywords are stored in the working memory, pointing to the relevance of memory content to the processed text.
The diversity of tokens and parts of speech stored in memory correlates with the complexity of the corpora for machine translation task.
arXiv Detail & Related papers (2024-06-20T11:27:29Z) - Blockwise Parallel Transformer for Large Context Models [70.97386897478238]
Blockwise Parallel Transformer (BPT) is a blockwise computation of self-attention and feedforward network fusion to minimize memory costs.
By processing longer input sequences while maintaining memory efficiency, BPT enables training sequences 32 times longer than vanilla Transformers and up to 4 times longer than previous memory-efficient methods.
arXiv Detail & Related papers (2023-05-30T19:25:51Z) - Recurrent Memory Transformer [0.3529736140137003]
We study a memory-augmented segment-level recurrent Transformer (Recurrent Memory Transformer)
We implement a memory mechanism with no changes to Transformer model by adding special memory tokens to the input or output sequence.
Our model performs on par with the Transformer-XL on language modeling for smaller memory sizes and outperforms it for tasks that require longer sequence processing.
arXiv Detail & Related papers (2022-07-14T13:00:22Z) - LaMemo: Language Modeling with Look-Ahead Memory [50.6248714811912]
We propose Look-Ahead Memory (LaMemo) that enhances the recurrence memory by incrementally attending to the right-side tokens.
LaMemo embraces bi-directional attention and segment recurrence with an additional overhead only linearly proportional to the memory length.
Experiments on widely used language modeling benchmarks demonstrate its superiority over the baselines equipped with different types of memory.
arXiv Detail & Related papers (2022-04-15T06:11:25Z) - Linearizing Transformer with Key-Value Memory Bank [54.83663647680612]
We propose MemSizer, an approach to project the source sequence into lower dimension representation.
MemSizer not only achieves the same linear time complexity but also enjoys efficient recurrent-style autoregressive generation.
We demonstrate that MemSizer provides an improved tradeoff between efficiency and accuracy over the vanilla transformer.
arXiv Detail & Related papers (2022-03-23T18:10:18Z) - Multimodal Transformer with Variable-length Memory for
Vision-and-Language Navigation [79.1669476932147]
Vision-and-Language Navigation (VLN) is a task that an agent is required to follow a language instruction to navigate to the goal position.
Recent Transformer-based VLN methods have made great progress benefiting from the direct connections between visual observations and the language instruction.
We introduce Multimodal Transformer with Variable-length Memory (MTVM) for visually-grounded natural language navigation.
arXiv Detail & Related papers (2021-11-10T16:04:49Z) - Streaming Simultaneous Speech Translation with Augmented Memory
Transformer [29.248366441276662]
Transformer-based models have achieved state-of-the-art performance on speech translation tasks.
We propose an end-to-end transformer-based sequence-to-sequence model, equipped with an augmented memory transformer encoder.
arXiv Detail & Related papers (2020-10-30T18:28:42Z) - Learning to Summarize Long Texts with Memory Compression and Transfer [3.5407857489235206]
We introduce Mem2Mem, a memory-to-memory mechanism for hierarchical recurrent neural network based encoder decoder architectures.
Our memory regularization compresses an encoded input article into a more compact set of sentence representations.
arXiv Detail & Related papers (2020-10-21T21:45:44Z) - Memory Transformer [0.31406146587437894]
Transformer-based models have achieved state-of-the-art results in many natural language processing tasks.
Memory-augmented neural networks (MANNs) extend traditional neural architectures with general-purpose memory for representations.
We evaluate these memory augmented Transformers and demonstrate that presence of memory positively correlates with the model performance.
arXiv Detail & Related papers (2020-06-20T09:06:27Z) - MART: Memory-Augmented Recurrent Transformer for Coherent Video
Paragraph Captioning [128.36951818335046]
We propose a new approach called Memory-Augmented Recurrent Transformer (MART)
MART uses a memory module to augment the transformer architecture.
MART generates more coherent and less repetitive paragraph captions than baseline methods.
arXiv Detail & Related papers (2020-05-11T20:01:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.