Automatic Correction of Human Translations
- URL: http://arxiv.org/abs/2206.08593v1
- Date: Fri, 17 Jun 2022 07:30:55 GMT
- Title: Automatic Correction of Human Translations
- Authors: Jessy Lin, Geza Kovacs, Aditya Shastry, Joern Wuebker, John DeNero
- Abstract summary: We introduce translation error correction (TEC), the task of automatically correcting human-generated translations.
We show that human errors in TEC exhibit a more diverse range of errors and far fewer translation errors than the MT errors in automatic post-editing datasets.
- Score: 8.137198664755598
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We introduce translation error correction (TEC), the task of automatically
correcting human-generated translations. Imperfections in machine translations
(MT) have long motivated systems for improving translations post-hoc with
automatic post-editing. In contrast, little attention has been devoted to the
problem of automatically correcting human translations, despite the intuition
that humans make distinct errors that machines would be well-suited to assist
with, from typos to inconsistencies in translation conventions. To investigate
this, we build and release the Aced corpus with three TEC datasets. We show
that human errors in TEC exhibit a more diverse range of errors and far fewer
translation fluency errors than the MT errors in automatic post-editing
datasets, suggesting the need for dedicated TEC models that are specialized to
correct human errors. We show that pre-training instead on synthetic errors
based on human errors improves TEC F-score by as much as 5.1 points. We
conducted a human-in-the-loop user study with nine professional translation
editors and found that the assistance of our TEC system led them to produce
significantly higher quality revised translations.
Related papers
- Prompting Large Language Models with Human Error Markings for Self-Correcting Machine Translation [11.351365352611658]
Post-editing (PE) is still required to correct errors and to enhance term translation quality in specialized domains.
We present a pilot study of enhancing translation memories (TM) for the needs of correct and consistent term translation in technical domains.
arXiv Detail & Related papers (2024-06-04T12:43:47Z) - Advancing Translation Preference Modeling with RLHF: A Step Towards
Cost-Effective Solution [57.42593422091653]
We explore leveraging reinforcement learning with human feedback to improve translation quality.
A reward model with strong language capabilities can more sensitively learn the subtle differences in translation quality.
arXiv Detail & Related papers (2024-02-18T09:51:49Z) - HTEC: Human Transcription Error Correction [4.241671683889168]
High-quality human transcription is essential for training and improving Automatic Speech Recognition (ASR) models.
We propose HTEC for Human Transcription Error Correction.
HTEC consists of two stages: Trans-Checker, an error detection model that predicts and masks erroneous words, and Trans-Filler, a sequence-to-sequence generative model that fills masked positions.
arXiv Detail & Related papers (2023-09-18T19:03:21Z) - The Devil is in the Errors: Leveraging Large Language Models for
Fine-grained Machine Translation Evaluation [93.01964988474755]
AutoMQM is a prompting technique which asks large language models to identify and categorize errors in translations.
We study the impact of labeled data through in-context learning and finetuning.
We then evaluate AutoMQM with PaLM-2 models, and we find that it improves performance compared to just prompting for scores.
arXiv Detail & Related papers (2023-08-14T17:17:21Z) - The Best of Both Worlds: Combining Human and Machine Translations for
Multilingual Semantic Parsing with Active Learning [50.320178219081484]
We propose an active learning approach that exploits the strengths of both human and machine translations.
An ideal utterance selection can significantly reduce the error and bias in the translated data.
arXiv Detail & Related papers (2023-05-22T05:57:47Z) - Easy Guided Decoding in Providing Suggestions for Interactive Machine
Translation [14.615314828955288]
We propose a novel constrained decoding algorithm, namely Prefix Suffix Guided Decoding (PSGD)
PSGD improves translation quality by an average of $10.87$ BLEU and $8.62$ BLEU on the WeTS and the WMT 2022 Translation Suggestion datasets.
arXiv Detail & Related papers (2022-11-14T03:40:02Z) - Non-Parametric Online Learning from Human Feedback for Neural Machine
Translation [54.96594148572804]
We study the problem of online learning with human feedback in the human-in-the-loop machine translation.
Previous methods require online model updating or additional translation memory networks to achieve high-quality performance.
We propose a novel non-parametric online learning method without changing the model structure.
arXiv Detail & Related papers (2021-09-23T04:26:15Z) - Detecting over/under-translation errors for determining adequacy in
human translations [0.0]
We present a novel approach to detecting over and under translations (OT/UT) as part of adequacy error checks in translation evaluation.
We do not restrict ourselves to machine translation (MT) outputs and specifically target applications with human generated translation pipeline.
The goal of our system is to identify OT/UT errors from human translated video subtitles with high error recall.
arXiv Detail & Related papers (2021-04-01T06:06:36Z) - Improving Translation Robustness with Visual Cues and Error Correction [58.97421756225425]
We introduce the idea of visual context to improve translation robustness against noisy texts.
We also propose a novel error correction training regime by treating error correction as an auxiliary task.
arXiv Detail & Related papers (2021-03-12T15:31:34Z) - Computer Assisted Translation with Neural Quality Estimation and
Automatic Post-Editing [18.192546537421673]
We propose an end-to-end deep learning framework of the quality estimation and automatic post-editing of the machine translation output.
Our goal is to provide error correction suggestions and to further relieve the burden of human translators through an interpretable model.
arXiv Detail & Related papers (2020-09-19T00:29:00Z) - On the Robustness of Language Encoders against Grammatical Errors [66.05648604987479]
We collect real grammatical errors from non-native speakers and conduct adversarial attacks to simulate these errors on clean text data.
Results confirm that the performance of all tested models is affected but the degree of impact varies.
arXiv Detail & Related papers (2020-05-12T11:01:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.