Related papers: Learn To Remember: Transformer with Recurrent Memory for Document-Level Machine Translation

Learn To Remember: Transformer with Recurrent Memory for Document-Level Machine Translation

URL: http://arxiv.org/abs/2205.01546v1
Date: Tue, 3 May 2022 14:55:53 GMT
Title: Learn To Remember: Transformer with Recurrent Memory for Document-Level Machine Translation
Authors: Yukun Feng, Feng Li, Ziang Song, Boyuan Zheng, Philipp Koehn
Abstract summary: We introduce a recurrent memory unit to the vanilla Transformer, which supports the information exchange between the sentence and previous context. We conduct experiments on three popular datasets for document-level machine translation and our model has an average improvement of 0.91 s-BLEU over the sentence-level baseline.
Score: 14.135048254120615
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The Transformer architecture has led to significant gains in machine translation. However, most studies focus on only sentence-level translation without considering the context dependency within documents, leading to the inadequacy of document-level coherence. Some recent research tried to mitigate this issue by introducing an additional context encoder or translating with multiple sentences or even the entire document. Such methods may lose the information on the target side or have an increasing computational complexity as documents get longer. To address such problems, we introduce a recurrent memory unit to the vanilla Transformer, which supports the information exchange between the sentence and previous context. The memory unit is recurrently updated by acquiring information from sentences, and passing the aggregated knowledge back to subsequent sentence states. We follow a two-stage training strategy, in which the model is first trained at the sentence level and then finetuned for document-level translation. We conduct experiments on three popular datasets for document-level machine translation and our model has an average improvement of 0.91 s-BLEU over the sentence-level baseline. We also achieve state-of-the-art results on TED and News, outperforming the previous work by 0.36 s-BLEU and 1.49 d-BLEU on average.

Related papers

Recovering document annotations for sentence-level bitext [18.862295675088056]
We reconstruct document-level information for three datasets in German, French, Spanish, Italian, Polish, and Portuguese. We introduce a document-level filtering technique as an alternative to traditional bitext filtering. Last we train models on these longer contexts and demonstrate improvement in document-level translation without degradation of sentence-level translation.
arXiv Detail & Related papers (2024-06-06T08:58:14Z)
Document-Level Language Models for Machine Translation [37.106125892770315]
We build context-aware translation systems utilizing document-level monolingual data instead. We improve existing approaches by leveraging recent advancements in model combination. In most scenarios, back-translation gives even better results, at the cost of having to re-train the translation system.
arXiv Detail & Related papers (2023-10-18T20:10:07Z)
Improving Long Context Document-Level Machine Translation [51.359400776242786]
Document-level context for neural machine translation (NMT) is crucial to improve translation consistency and cohesion. Many works have been published on the topic of document-level NMT, but most restrict the system to just local context. We propose a constrained attention variant that focuses the attention on the most relevant parts of the sequence, while simultaneously reducing the memory consumption.
arXiv Detail & Related papers (2023-06-08T13:28:48Z)
On Search Strategies for Document-Level Neural Machine Translation [51.359400776242786]
Document-level neural machine translation (NMT) models produce a more consistent output across a document. In this work, we aim to answer the question how to best utilize a context-aware translation model in decoding.
arXiv Detail & Related papers (2023-06-08T11:30:43Z)
Escaping the sentence-level paradigm in machine translation [9.676755606927435]
Much work in document-context machine translation exists, but for various reasons has been unable to catch hold. In contrast to work on specialized architectures, we show that the standard Transformer architecture is sufficient. We propose generative variants of existing contrastive metrics that are better able to discriminate among document systems.
arXiv Detail & Related papers (2023-04-25T16:09:02Z)
HanoiT: Enhancing Context-aware Translation via Selective Context [95.93730812799798]
Context-aware neural machine translation aims to use the document-level context to improve translation quality. The irrelevant or trivial words may bring some noise and distract the model from learning the relationship between the current sentence and the auxiliary context. We propose a novel end-to-end encoder-decoder model with a layer-wise selection mechanism to sift and refine the long document context.
arXiv Detail & Related papers (2023-01-17T12:07:13Z)
Modeling Context With Linear Attention for Scalable Document-Level Translation [72.41955536834702]
We investigate the efficacy of a recent linear attention model on document translation and augment it with a sentential gate to promote a recency inductive bias. We show that sentential gating further improves translation quality on IWSLT.
arXiv Detail & Related papers (2022-10-16T03:41:50Z)
Rethinking Document-level Neural Machine Translation [73.42052953710605]
We try to answer the question: Is the capacity of current models strong enough for document-level translation? We observe that the original Transformer with appropriate training techniques can achieve strong results for document translation, even with a length of 2000 words.
arXiv Detail & Related papers (2020-10-18T11:18:29Z)
Long-Short Term Masking Transformer: A Simple but Effective Baseline for Document-level Neural Machine Translation [28.94748226472447]
We study the pros and cons of the standard transformer in document-level translation. We propose a surprisingly simple long-short term masking self-attention on top of the standard transformer. We can achieve a strong result in BLEU and capture discourse phenomena.
arXiv Detail & Related papers (2020-09-19T00:29:51Z)
Towards Making the Most of Context in Neural Machine Translation [112.9845226123306]
We argue that previous research did not make a clear use of the global context. We propose a new document-level NMT framework that deliberately models the local context of each sentence.
arXiv Detail & Related papers (2020-02-19T03:30:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.