Related papers: Modeling Context With Linear Attention for Scalable Document-Level Translation

Modeling Context With Linear Attention for Scalable Document-Level Translation

URL: http://arxiv.org/abs/2210.08431v1
Date: Sun, 16 Oct 2022 03:41:50 GMT
Title: Modeling Context With Linear Attention for Scalable Document-Level Translation
Authors: Zhaofeng Wu, Hao Peng, Nikolaos Pappas, Noah A. Smith
Abstract summary: We investigate the efficacy of a recent linear attention model on document translation and augment it with a sentential gate to promote a recency inductive bias. We show that sentential gating further improves translation quality on IWSLT.
Score: 72.41955536834702
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Document-level machine translation leverages inter-sentence dependencies to produce more coherent and consistent translations. However, these models, predominantly based on transformers, are difficult to scale to long documents as their attention layers have quadratic complexity in the sequence length. Recent efforts on efficient attention improve scalability, but their effect on document translation remains unexplored. In this work, we investigate the efficacy of a recent linear attention model by Peng et al. (2021) on document translation and augment it with a sentential gate to promote a recency inductive bias. We evaluate the model on IWSLT 2015 and OpenSubtitles 2018 against the transformer, demonstrating substantially increased decoding speed on long sequences with similar or better BLEU scores. We show that sentential gating further improves translation quality on IWSLT.

Related papers

Multilingual Contextualization of Large Language Models for Document-Level Machine Translation [30.005159724115824]
Large language models (LLMs) have demonstrated strong performance in sentence-level machine translation. We propose a method to improve LLM-based long-document translation through targeted fine-tuning on high-quality document-level data. Our approach supports multiple translation paradigms, including direct document-to-document and chunk-level translation.
arXiv Detail & Related papers (2025-04-16T14:52:22Z)
Two Intermediate Translations Are Better Than One: Fine-tuning LLMs for Document-level Translation Refinement [19.513243503109035]
Large language models (LLMs) can enhance translation quality through self-refinement. We build on this idea by extending the refinement from sentence-level to document-level translation. Since sentence-to-sentence (Sent2Sent) and Doc2Doc translation address different aspects of the translation process, we propose fine-tuning LLMs for translation refinement.
arXiv Detail & Related papers (2025-04-08T02:08:07Z)
Plan2Align: Predictive Planning Based Test-Time Preference Alignment in Paragraph-Level Machine Translation [42.89806150031301]
We introduce Plan2Align, a test-time alignment framework that treats translation as a predictive planning problem. Plan2Align significantly improves paragraph-level translation, achieving performance surpassing or on par with the existing training-time and test-time alignment methods.
arXiv Detail & Related papers (2025-02-28T07:24:33Z)
Alleviating Distribution Shift in Synthetic Data for Machine Translation Quality Estimation [55.73341401764367]
We introduce DCSQE, a novel framework for alleviating distribution shift in synthetic QE data.<n> DCSQE uses references, i.e., translation supervision signals, to guide both the generation and annotation processes.<n>Experiments demonstrate that DCSQE outperforms SOTA baselines in both supervised and unsupervised settings.
arXiv Detail & Related papers (2025-02-27T10:11:53Z)
Enhancing Document-level Translation of Large Language Model via Translation Mixed-instructions [24.025242477280983]
Existing large language models (LLMs) for machine translation are typically fine-tuned on sentence-level translation instructions. This challenge arises from the issue of sentence-level coverage, where subsequent sentences in the document remain untranslated. We propose an approach that combines sentence-level and document-level translation instructions of varying lengths to fine-tune LLMs.
arXiv Detail & Related papers (2024-01-16T03:28:26Z)
Contextual Refinement of Translations: Large Language Models for Sentence and Document-Level Post-Editing [12.843274390224853]
Large Language Models (LLM's) have demonstrated considerable success in various Natural Language Processing tasks. We show that they have yet to attain state-of-the-art performance in Neural Machine Translation. We propose adapting LLM's as Automatic Post-Editors (APE) rather than direct translators.
arXiv Detail & Related papers (2023-10-23T12:22:15Z)
TranSFormer: Slow-Fast Transformer for Machine Translation [52.12212173775029]
We present a textbfSlow-textbfFast two-stream learning model, referred to as TrantextbfSFormer. Our TranSFormer shows consistent BLEU improvements (larger than 1 BLEU point) on several machine translation benchmarks.
arXiv Detail & Related papers (2023-05-26T14:37:38Z)
Non-Autoregressive Neural Machine Translation: A Call for Clarity [3.1447111126465]
We take a step back and revisit several techniques that have been proposed for improving non-autoregressive translation models. We provide novel insights for establishing strong baselines using length prediction or CTC-based architecture variants. We contribute standardized BLEU, chrF++, and TER scores using sacreBLEU on four translation tasks.
arXiv Detail & Related papers (2022-05-21T12:15:22Z)
Do Long-Range Language Models Actually Use Long-Range Context? [27.084888397778823]
Language models are generally trained on short, truncated input sequences. Recent efforts to improve the efficiency of self-attention have led to a proliferation of long-range Transformer language models.
arXiv Detail & Related papers (2021-09-19T12:49:43Z)
Improving Multilingual Translation by Representation and Gradient Regularization [82.42760103045083]
We propose a joint approach to regularize NMT models at both representation-level and gradient-level. Our results demonstrate that our approach is highly effective in both reducing off-target translation occurrences and improving zero-shot translation performance.
arXiv Detail & Related papers (2021-09-10T10:52:21Z)
Rethinking Document-level Neural Machine Translation [73.42052953710605]
We try to answer the question: Is the capacity of current models strong enough for document-level translation? We observe that the original Transformer with appropriate training techniques can achieve strong results for document translation, even with a length of 2000 words.
arXiv Detail & Related papers (2020-10-18T11:18:29Z)
Long-Short Term Masking Transformer: A Simple but Effective Baseline for Document-level Neural Machine Translation [28.94748226472447]
We study the pros and cons of the standard transformer in document-level translation. We propose a surprisingly simple long-short term masking self-attention on top of the standard transformer. We can achieve a strong result in BLEU and capture discourse phenomena.
arXiv Detail & Related papers (2020-09-19T00:29:51Z)
Learning Source Phrase Representations for Neural Machine Translation [65.94387047871648]
We propose an attentive phrase representation generation mechanism which is able to generate phrase representations from corresponding token representations. In our experiments, we obtain significant improvements on the WMT 14 English-German and English-French tasks on top of the strong Transformer baseline.
arXiv Detail & Related papers (2020-06-25T13:43:11Z)
Applying the Transformer to Character-level Transduction [68.91664610425114]
The transformer has been shown to outperform recurrent neural network-based sequence-to-sequence models in various word-level NLP tasks. We show that with a large enough batch size, the transformer does indeed outperform recurrent models for character-level tasks.
arXiv Detail & Related papers (2020-05-20T17:25:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.