Modeling Context With Linear Attention for Scalable Document-Level
Translation
- URL: http://arxiv.org/abs/2210.08431v1
- Date: Sun, 16 Oct 2022 03:41:50 GMT
- Title: Modeling Context With Linear Attention for Scalable Document-Level
Translation
- Authors: Zhaofeng Wu, Hao Peng, Nikolaos Pappas, Noah A. Smith
- Abstract summary: We investigate the efficacy of a recent linear attention model on document translation and augment it with a sentential gate to promote a recency inductive bias.
We show that sentential gating further improves translation quality on IWSLT.
- Score: 72.41955536834702
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Document-level machine translation leverages inter-sentence dependencies to
produce more coherent and consistent translations. However, these models,
predominantly based on transformers, are difficult to scale to long documents
as their attention layers have quadratic complexity in the sequence length.
Recent efforts on efficient attention improve scalability, but their effect on
document translation remains unexplored. In this work, we investigate the
efficacy of a recent linear attention model by Peng et al. (2021) on document
translation and augment it with a sentential gate to promote a recency
inductive bias. We evaluate the model on IWSLT 2015 and OpenSubtitles 2018
against the transformer, demonstrating substantially increased decoding speed
on long sequences with similar or better BLEU scores. We show that sentential
gating further improves translation quality on IWSLT.
Related papers
- Enhancing Document-level Translation of Large Language Model via
Translation Mixed-instructions [24.025242477280983]
Existing large language models (LLMs) for machine translation are typically fine-tuned on sentence-level translation instructions.
This challenge arises from the issue of sentence-level coverage, where subsequent sentences in the document remain untranslated.
We propose an approach that combines sentence-level and document-level translation instructions of varying lengths to fine-tune LLMs.
arXiv Detail & Related papers (2024-01-16T03:28:26Z) - Contextual Refinement of Translations: Large Language Models for Sentence and Document-Level Post-Editing [12.843274390224853]
Large Language Models (LLM's) have demonstrated considerable success in various Natural Language Processing tasks.
We show that they have yet to attain state-of-the-art performance in Neural Machine Translation.
We propose adapting LLM's as Automatic Post-Editors (APE) rather than direct translators.
arXiv Detail & Related papers (2023-10-23T12:22:15Z) - TranSFormer: Slow-Fast Transformer for Machine Translation [52.12212173775029]
We present a textbfSlow-textbfFast two-stream learning model, referred to as TrantextbfSFormer.
Our TranSFormer shows consistent BLEU improvements (larger than 1 BLEU point) on several machine translation benchmarks.
arXiv Detail & Related papers (2023-05-26T14:37:38Z) - Non-Autoregressive Neural Machine Translation: A Call for Clarity [3.1447111126465]
We take a step back and revisit several techniques that have been proposed for improving non-autoregressive translation models.
We provide novel insights for establishing strong baselines using length prediction or CTC-based architecture variants.
We contribute standardized BLEU, chrF++, and TER scores using sacreBLEU on four translation tasks.
arXiv Detail & Related papers (2022-05-21T12:15:22Z) - Do Long-Range Language Models Actually Use Long-Range Context? [27.084888397778823]
Language models are generally trained on short, truncated input sequences.
Recent efforts to improve the efficiency of self-attention have led to a proliferation of long-range Transformer language models.
arXiv Detail & Related papers (2021-09-19T12:49:43Z) - Improving Multilingual Translation by Representation and Gradient
Regularization [82.42760103045083]
We propose a joint approach to regularize NMT models at both representation-level and gradient-level.
Our results demonstrate that our approach is highly effective in both reducing off-target translation occurrences and improving zero-shot translation performance.
arXiv Detail & Related papers (2021-09-10T10:52:21Z) - Rethinking Document-level Neural Machine Translation [73.42052953710605]
We try to answer the question: Is the capacity of current models strong enough for document-level translation?
We observe that the original Transformer with appropriate training techniques can achieve strong results for document translation, even with a length of 2000 words.
arXiv Detail & Related papers (2020-10-18T11:18:29Z) - Long-Short Term Masking Transformer: A Simple but Effective Baseline for
Document-level Neural Machine Translation [28.94748226472447]
We study the pros and cons of the standard transformer in document-level translation.
We propose a surprisingly simple long-short term masking self-attention on top of the standard transformer.
We can achieve a strong result in BLEU and capture discourse phenomena.
arXiv Detail & Related papers (2020-09-19T00:29:51Z) - Learning Source Phrase Representations for Neural Machine Translation [65.94387047871648]
We propose an attentive phrase representation generation mechanism which is able to generate phrase representations from corresponding token representations.
In our experiments, we obtain significant improvements on the WMT 14 English-German and English-French tasks on top of the strong Transformer baseline.
arXiv Detail & Related papers (2020-06-25T13:43:11Z) - Applying the Transformer to Character-level Transduction [68.91664610425114]
The transformer has been shown to outperform recurrent neural network-based sequence-to-sequence models in various word-level NLP tasks.
We show that with a large enough batch size, the transformer does indeed outperform recurrent models for character-level tasks.
arXiv Detail & Related papers (2020-05-20T17:25:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.