End-to-End Lexically Constrained Machine Translation for Morphologically
Rich Languages
- URL: http://arxiv.org/abs/2106.12398v2
- Date: Thu, 24 Jun 2021 14:21:48 GMT
- Title: End-to-End Lexically Constrained Machine Translation for Morphologically
Rich Languages
- Authors: Josef Jon and Jo\~ao Paulo Aires and Du\v{s}an Vari\v{s} and
Ond\v{r}ej Bojar
- Abstract summary: We investigate mechanisms to allow neural machine translation to infer the correct word inflection given lemmatized constraints.
Our experiments on the English-Czech language pair show that this approach improves the translation of constrained terms in both automatic and manual evaluation.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Lexically constrained machine translation allows the user to manipulate the
output sentence by enforcing the presence or absence of certain words and
phrases. Although current approaches can enforce terms to appear in the
translation, they often struggle to make the constraint word form agree with
the rest of the generated output. Our manual analysis shows that 46% of the
errors in the output of a baseline constrained model for English to Czech
translation are related to agreement. We investigate mechanisms to allow neural
machine translation to infer the correct word inflection given lemmatized
constraints. In particular, we focus on methods based on training the model
with constraints provided as part of the input sequence. Our experiments on the
English-Czech language pair show that this approach improves the translation of
constrained terms in both automatic and manual evaluation by reducing errors in
agreement. Our approach thus eliminates inflection errors, without introducing
new errors or decreasing the overall quality of the translation.
Related papers
- Translate-and-Revise: Boosting Large Language Models for Constrained Translation [42.37981028583618]
We leverage the capabilities of large language models (LLMs) for constrained translation.
LLMs can easily adapt to this task by taking translation instructions and constraints as prompts.
We show 15% improvement in constraint-based translation accuracy over standard LLMs.
arXiv Detail & Related papers (2024-07-18T05:08:09Z) - An Analysis of BPE Vocabulary Trimming in Neural Machine Translation [56.383793805299234]
vocabulary trimming is a postprocessing step that replaces rare subwords with their component subwords.
We show that vocabulary trimming fails to improve performance and is even prone to incurring heavy degradation.
arXiv Detail & Related papers (2024-03-30T15:29:49Z) - Crossing the Threshold: Idiomatic Machine Translation through Retrieval
Augmentation and Loss Weighting [66.02718577386426]
We provide a simple characterization of idiomatic translation and related issues.
We conduct a synthetic experiment revealing a tipping point at which transformer-based machine translation models correctly default to idiomatic translations.
To improve translation of natural idioms, we introduce two straightforward yet effective techniques.
arXiv Detail & Related papers (2023-10-10T23:47:25Z) - Negative Lexical Constraints in Neural Machine Translation [1.3124513975412255]
Negative lexical constraining is used to prohibit certain words or expressions in the translation produced by the neural translation model.
We compare various methods based on modifying either the decoding process or the training data.
We demonstrate that our method improves the constraining, although the problem still persists in many cases.
arXiv Detail & Related papers (2023-08-07T14:04:15Z) - Towards Fine-Grained Information: Identifying the Type and Location of
Translation Errors [80.22825549235556]
Existing approaches can not synchronously consider error position and type.
We build an FG-TED model to predict the textbf addition and textbfomission errors.
Experiments show that our model can identify both error type and position concurrently, and gives state-of-the-art results.
arXiv Detail & Related papers (2023-02-17T16:20:33Z) - ngram-OAXE: Phrase-Based Order-Agnostic Cross Entropy for
Non-Autoregressive Machine Translation [51.06378042344563]
A new training oaxe loss has proven effective to ameliorate the effect of multimodality for non-autoregressive translation (NAT)
We extend oaxe by only allowing reordering between ngram phrases and still requiring a strict match of word order within the phrases.
Further analyses show that ngram-oaxe indeed improves the translation of ngram phrases, and produces more fluent translation with a better modeling of sentence structure.
arXiv Detail & Related papers (2022-10-08T11:39:15Z) - DEEP: DEnoising Entity Pre-training for Neural Machine Translation [123.6686940355937]
It has been shown that machine translation models usually generate poor translations for named entities that are infrequent in the training corpus.
We propose DEEP, a DEnoising Entity Pre-training method that leverages large amounts of monolingual data and a knowledge base to improve named entity translation accuracy within sentences.
arXiv Detail & Related papers (2021-11-14T17:28:09Z) - Improving Translation Robustness with Visual Cues and Error Correction [58.97421756225425]
We introduce the idea of visual context to improve translation robustness against noisy texts.
We also propose a novel error correction training regime by treating error correction as an auxiliary task.
arXiv Detail & Related papers (2021-03-12T15:31:34Z) - Lexically Cohesive Neural Machine Translation with Copy Mechanism [21.43163704217968]
We employ a copy mechanism into a context-aware neural machine translation model to allow copying words from previous outputs.
We conduct experiments on Japanese to English translation using an evaluation dataset for discourse translation.
arXiv Detail & Related papers (2020-10-11T08:39:02Z) - Computer Assisted Translation with Neural Quality Estimation and
Automatic Post-Editing [18.192546537421673]
We propose an end-to-end deep learning framework of the quality estimation and automatic post-editing of the machine translation output.
Our goal is to provide error correction suggestions and to further relieve the burden of human translators through an interpretable model.
arXiv Detail & Related papers (2020-09-19T00:29:00Z) - Machine Translation with Unsupervised Length-Constraints [12.376309678270275]
We focus on length constraints, which are essential if the translation should be displayed in a given format.
Compared to a traditional method that first translates and then performs sentence compression, the text compression is learned completely unsupervised.
We are able to significantly improve the translation quality under constraints.
arXiv Detail & Related papers (2020-04-07T07:55:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.