Correct Me If You Can: Learning from Error Corrections and Markings
- URL: http://arxiv.org/abs/2004.11222v1
- Date: Thu, 23 Apr 2020 15:17:37 GMT
- Title: Correct Me If You Can: Learning from Error Corrections and Markings
- Authors: Julia Kreutzer, Nathaniel Berger, Stefan Riezler
- Abstract summary: We present the first user study on annotation cost and machine learnability for the less popular annotation mode of error markings.
We show that error markings for TED talks from English to German translations allow precise credit assignment while requiring significantly less human effort than correcting/post-editing.
- Score: 20.808561880051148
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Sequence-to-sequence learning involves a trade-off between signal strength
and annotation cost of training data. For example, machine translation data
range from costly expert-generated translations that enable supervised
learning, to weak quality-judgment feedback that facilitate reinforcement
learning. We present the first user study on annotation cost and machine
learnability for the less popular annotation mode of error markings. We show
that error markings for translations of TED talks from English to German allow
precise credit assignment while requiring significantly less human effort than
correcting/post-editing, and that error-marked data can be used successfully to
fine-tune neural machine translation models.
Related papers
- Enhancing Supervised Learning with Contrastive Markings in Neural
Machine Translation Training [10.498938255717066]
Supervised learning in Neural Machine Translation (NMT) typically follows a teacher forcing paradigm.
We present a simple extension of standard maximum likelihood estimation by a contrastive marking objective.
We show that training with contrastive markings yields improvements on top of supervised learning.
arXiv Detail & Related papers (2023-07-17T11:56:32Z) - The Best of Both Worlds: Combining Human and Machine Translations for
Multilingual Semantic Parsing with Active Learning [50.320178219081484]
We propose an active learning approach that exploits the strengths of both human and machine translations.
An ideal utterance selection can significantly reduce the error and bias in the translated data.
arXiv Detail & Related papers (2023-05-22T05:57:47Z) - ParroT: Translating during Chat using Large Language Models tuned with
Human Translation and Feedback [90.20262941911027]
ParroT is a framework to enhance and regulate the translation abilities during chat.
Specifically, ParroT reformulates translation data into the instruction-following style.
We propose three instruction types for finetuning ParroT models, including translation instruction, contrastive instruction, and error-guided instruction.
arXiv Detail & Related papers (2023-04-05T13:12:00Z) - DEEP: DEnoising Entity Pre-training for Neural Machine Translation [123.6686940355937]
It has been shown that machine translation models usually generate poor translations for named entities that are infrequent in the training corpus.
We propose DEEP, a DEnoising Entity Pre-training method that leverages large amounts of monolingual data and a knowledge base to improve named entity translation accuracy within sentences.
arXiv Detail & Related papers (2021-11-14T17:28:09Z) - End-to-End Lexically Constrained Machine Translation for Morphologically
Rich Languages [0.0]
We investigate mechanisms to allow neural machine translation to infer the correct word inflection given lemmatized constraints.
Our experiments on the English-Czech language pair show that this approach improves the translation of constrained terms in both automatic and manual evaluation.
arXiv Detail & Related papers (2021-06-23T13:40:13Z) - Detecting over/under-translation errors for determining adequacy in
human translations [0.0]
We present a novel approach to detecting over and under translations (OT/UT) as part of adequacy error checks in translation evaluation.
We do not restrict ourselves to machine translation (MT) outputs and specifically target applications with human generated translation pipeline.
The goal of our system is to identify OT/UT errors from human translated video subtitles with high error recall.
arXiv Detail & Related papers (2021-04-01T06:06:36Z) - Improving Translation Robustness with Visual Cues and Error Correction [58.97421756225425]
We introduce the idea of visual context to improve translation robustness against noisy texts.
We also propose a novel error correction training regime by treating error correction as an auxiliary task.
arXiv Detail & Related papers (2021-03-12T15:31:34Z) - On the Robustness of Language Encoders against Grammatical Errors [66.05648604987479]
We collect real grammatical errors from non-native speakers and conduct adversarial attacks to simulate these errors on clean text data.
Results confirm that the performance of all tested models is affected but the degree of impact varies.
arXiv Detail & Related papers (2020-05-12T11:01:44Z) - It's Easier to Translate out of English than into it: Measuring Neural
Translation Difficulty by Cross-Mutual Information [90.35685796083563]
Cross-mutual information (XMI) is an asymmetric information-theoretic metric of machine translation difficulty.
XMI exploits the probabilistic nature of most neural machine translation models.
We present the first systematic and controlled study of cross-lingual translation difficulties using modern neural translation systems.
arXiv Detail & Related papers (2020-05-05T17:38:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.