Efficient Machine Translation Corpus Generation
- URL: http://arxiv.org/abs/2306.11838v1
- Date: Tue, 20 Jun 2023 18:46:47 GMT
- Title: Efficient Machine Translation Corpus Generation
- Authors: Kamer Ali Yuksel, Ahmet Gunduz, Shreyas Sharma, Hassan Sawaf
- Abstract summary: Method is based on online training of a custom MT quality estimation metric on-the-fly as linguists perform post-edits.
Online estimator is used to prioritize worse hypotheses for post-editing, and auto-close best hypotheses without post-editing.
- Score: 3.441021278275805
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper proposes an efficient and semi-automated method for
human-in-the-loop post-editing for machine translation (MT) corpus generation.
The method is based on online training of a custom MT quality estimation metric
on-the-fly as linguists perform post-edits. The online estimator is used to
prioritize worse hypotheses for post-editing, and auto-close best hypotheses
without post-editing. This way, significant improvements can be achieved in the
resulting quality of post-edits at a lower cost due to reduced human
involvement. The trained estimator can also provide an online sanity check
mechanism for post-edits and remove the need for additional linguists to review
them or work on the same hypotheses. In this paper, the effect of prioritizing
with the proposed method on the resulting MT corpus quality is presented versus
scheduling hypotheses randomly. As demonstrated by experiments, the proposed
method improves the lifecycle of MT models by focusing the linguist effort on
production samples and hypotheses, which matter most for expanding MT corpora
to be used for re-training them.
Related papers
- Hindsight Quality Prediction Experiments in Multi-Candidate Human-Post-Edited Machine Translation [23.7663178803576]
This paper investigates two complementary paradigms for predicting machine translation (MT) quality: source-side difficulty prediction and candidate quality estimation (QE)<n>The rapid adoption of Large Language Models (LLMs) into MT is reshaping the research landscape, yet its impact on established quality prediction paradigms remains underexplored.
arXiv Detail & Related papers (2026-03-04T13:54:58Z) - Automatic Machine Translation Detection Using a Surrogate Multilingual Translation Model [4.750257527930005]
We propose a novel approach to distinguish between human and machine-translated sentences.<n> Experimental results show that our method outperforms current state-of-the-art techniques.
arXiv Detail & Related papers (2025-11-04T19:59:25Z) - Test-Time Scaling of Reasoning Models for Machine Translation [16.317481079574065]
Test-time scaling (TTS) has enhanced the performance of Reasoning Models (RMs) on various tasks such as math and coding.<n>This paper investigates whether increased inference-time computation improves translation quality.
arXiv Detail & Related papers (2025-10-07T21:15:18Z) - Cross-lingual Human-Preference Alignment for Neural Machine Translation with Direct Quality Optimization [4.993565079216378]
We show that applying task-alignment to neural machine translation (NMT) addresses an existing task--data mismatch in NMT.
We introduce Direct Quality Optimization (DQO), a variant of DPO leveraging a pre-trained translation quality estimation model as a proxy for human preferences.
arXiv Detail & Related papers (2024-09-26T09:32:12Z) - Improving Machine Translation with Human Feedback: An Exploration of Quality Estimation as a Reward Model [75.66013048128302]
In this work, we investigate the potential of employing the QE model as the reward model to predict human preferences for feedback training.
We first identify the overoptimization problem during QE-based feedback training, manifested as an increase in reward while translation quality declines.
To address the problem, we adopt a simple yet effective method that uses rules to detect the incorrect translations and assigns a penalty term to the reward scores of them.
arXiv Detail & Related papers (2024-01-23T16:07:43Z) - The Devil is in the Errors: Leveraging Large Language Models for
Fine-grained Machine Translation Evaluation [93.01964988474755]
AutoMQM is a prompting technique which asks large language models to identify and categorize errors in translations.
We study the impact of labeled data through in-context learning and finetuning.
We then evaluate AutoMQM with PaLM-2 models, and we find that it improves performance compared to just prompting for scores.
arXiv Detail & Related papers (2023-08-14T17:17:21Z) - Parameter-Efficient Learning for Text-to-Speech Accent Adaptation [58.356667204518985]
This paper presents a parameter-efficient learning (PEL) to develop a low-resource accent adaptation for text-to-speech (TTS)
A resource-efficient adaptation from a frozen pre-trained TTS model is developed by using only 1.2% to 0.8% of original trainable parameters.
Experiment results show that the proposed methods can achieve competitive naturalness with parameter-efficient decoder fine-tuning.
arXiv Detail & Related papers (2023-05-18T22:02:59Z) - HOPE: A Task-Oriented and Human-Centric Evaluation Framework Using
Professional Post-Editing Towards More Effective MT Evaluation [0.0]
In this work, we introduce HOPE, a task-oriented and human-centric evaluation framework for machine translation output.
It contains only a limited number of commonly occurring error types, and use a scoring model with geometric progression of error penalty points (EPPs) reflecting error severity level to each translation unit.
The approach has several key advantages, such as ability to measure and compare less than perfect MT output from different systems, ability to indicate human perception of quality, immediate estimation of the labor effort required to bring MT output to premium quality, low-cost and faster application, as well as higher IRR.
arXiv Detail & Related papers (2021-12-27T18:47:43Z) - Non-Parametric Online Learning from Human Feedback for Neural Machine
Translation [54.96594148572804]
We study the problem of online learning with human feedback in the human-in-the-loop machine translation.
Previous methods require online model updating or additional translation memory networks to achieve high-quality performance.
We propose a novel non-parametric online learning method without changing the model structure.
arXiv Detail & Related papers (2021-09-23T04:26:15Z) - Improving Text Generation with Student-Forcing Optimal Transport [122.11881937642401]
We propose using optimal transport (OT) to match the sequences generated in training and testing modes.
An extension is also proposed to improve the OT learning, based on the structural and contextual information of the text sequences.
The effectiveness of the proposed method is validated on machine translation, text summarization, and text generation tasks.
arXiv Detail & Related papers (2020-10-12T19:42:25Z) - Computer Assisted Translation with Neural Quality Estimation and
Automatic Post-Editing [18.192546537421673]
We propose an end-to-end deep learning framework of the quality estimation and automatic post-editing of the machine translation output.
Our goal is to provide error correction suggestions and to further relieve the burden of human translators through an interpretable model.
arXiv Detail & Related papers (2020-09-19T00:29:00Z) - On the Inference Calibration of Neural Machine Translation [54.48932804996506]
We study the correlation between calibration and translation performance and linguistic properties of miscalibration.
We propose a new graduated label smoothing method that can improve both inference calibration and translation performance.
arXiv Detail & Related papers (2020-05-03T02:03:56Z) - Revisiting Round-Trip Translation for Quality Estimation [0.0]
Quality estimation (QE) is the task of automatically evaluating the quality of translations without human-translated references.
In this paper, we employ semantic embeddings to RTT-based QE.
Our method achieves the highest correlations with human judgments, compared to previous WMT 2019 quality estimation metric task submissions.
arXiv Detail & Related papers (2020-04-29T03:20:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.