Quality Estimation without Human-labeled Data
- URL: http://arxiv.org/abs/2102.04020v1
- Date: Mon, 8 Feb 2021 06:25:46 GMT
- Title: Quality Estimation without Human-labeled Data
- Authors: Yi-Lin Tuan, Ahmed El-Kishky, Adithya Renduchintala, Vishrav
Chaudhary, Francisco Guzm\'an, Lucia Specia
- Abstract summary: Quality estimation aims to measure the quality of translated content without access to a reference translation.
We propose a technique that does not rely on examples from human-annotators and instead uses synthetic training data.
We train off-the-shelf architectures for supervised quality estimation on our synthetic data and show that the resulting models achieve comparable performance to models trained on human-annotated data.
- Score: 25.25993509174361
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Quality estimation aims to measure the quality of translated content without
access to a reference translation. This is crucial for machine translation
systems in real-world scenarios where high-quality translation is needed. While
many approaches exist for quality estimation, they are based on supervised
machine learning requiring costly human labelled data. As an alternative, we
propose a technique that does not rely on examples from human-annotators and
instead uses synthetic training data. We train off-the-shelf architectures for
supervised quality estimation on our synthetic data and show that the resulting
models achieve comparable performance to models trained on human-annotated
data, both for sentence and word-level prediction.
Related papers
- Are Large Language Models State-of-the-art Quality Estimators for Machine Translation of User-generated Content? [6.213698466889738]
This paper investigates whether large language models (LLMs) are state-of-the-art quality estimators for machine translation of user-generated content (UGC)
We employ an existing emotion-related dataset with human-annotated errors and calculate quality evaluation scores based on the Multi-dimensional Quality Metrics.
arXiv Detail & Related papers (2024-10-08T20:16:59Z) - Scaling Parameter-Constrained Language Models with Quality Data [32.35610029333478]
Scaling laws in language modeling traditionally quantify training loss as a function of dataset size and model parameters.
We extend the conventional understanding of scaling law by offering a microscopic view of data quality within the original formulation.
arXiv Detail & Related papers (2024-10-04T02:07:17Z) - Advancing Translation Preference Modeling with RLHF: A Step Towards
Cost-Effective Solution [57.42593422091653]
We explore leveraging reinforcement learning with human feedback to improve translation quality.
A reward model with strong language capabilities can more sensitively learn the subtle differences in translation quality.
arXiv Detail & Related papers (2024-02-18T09:51:49Z) - Improving Machine Translation with Human Feedback: An Exploration of Quality Estimation as a Reward Model [75.66013048128302]
In this work, we investigate the potential of employing the QE model as the reward model to predict human preferences for feedback training.
We first identify the overoptimization problem during QE-based feedback training, manifested as an increase in reward while translation quality declines.
To address the problem, we adopt a simple yet effective method that uses rules to detect the incorrect translations and assigns a penalty term to the reward scores of them.
arXiv Detail & Related papers (2024-01-23T16:07:43Z) - A Novel Metric for Measuring Data Quality in Classification Applications
(extended version) [0.0]
We introduce and explain a novel metric to measure data quality.
This metric is based on the correlated evolution between the classification performance and the deterioration of data.
We provide an interpretation of each criterion and examples of assessment levels.
arXiv Detail & Related papers (2023-12-13T11:20:09Z) - HanoiT: Enhancing Context-aware Translation via Selective Context [95.93730812799798]
Context-aware neural machine translation aims to use the document-level context to improve translation quality.
The irrelevant or trivial words may bring some noise and distract the model from learning the relationship between the current sentence and the auxiliary context.
We propose a novel end-to-end encoder-decoder model with a layer-wise selection mechanism to sift and refine the long document context.
arXiv Detail & Related papers (2023-01-17T12:07:13Z) - Competency-Aware Neural Machine Translation: Can Machine Translation
Know its Own Translation Quality? [61.866103154161884]
Neural machine translation (NMT) is often criticized for failures that happen without awareness.
We propose a novel competency-aware NMT by extending conventional NMT with a self-estimator.
We show that the proposed method delivers outstanding performance on quality estimation.
arXiv Detail & Related papers (2022-11-25T02:39:41Z) - Measuring Uncertainty in Translation Quality Evaluation (TQE) [62.997667081978825]
This work carries out motivated research to correctly estimate the confidence intervals citeBrown_etal2001Interval depending on the sample size of the translated text.
The methodology we applied for this work is from Bernoulli Statistical Distribution Modelling (BSDM) and Monte Carlo Sampling Analysis (MCSA)
arXiv Detail & Related papers (2021-11-15T12:09:08Z) - Detecting over/under-translation errors for determining adequacy in
human translations [0.0]
We present a novel approach to detecting over and under translations (OT/UT) as part of adequacy error checks in translation evaluation.
We do not restrict ourselves to machine translation (MT) outputs and specifically target applications with human generated translation pipeline.
The goal of our system is to identify OT/UT errors from human translated video subtitles with high error recall.
arXiv Detail & Related papers (2021-04-01T06:06:36Z) - Unsupervised Paraphrasing with Pretrained Language Models [85.03373221588707]
We propose a training pipeline that enables pre-trained language models to generate high-quality paraphrases in an unsupervised setting.
Our recipe consists of task-adaptation, self-supervision, and a novel decoding algorithm named Dynamic Blocking.
We show with automatic and human evaluations that our approach achieves state-of-the-art performance on both the Quora Question Pair and the ParaNMT datasets.
arXiv Detail & Related papers (2020-10-24T11:55:28Z) - Sentence Level Human Translation Quality Estimation with Attention-based
Neural Networks [0.30458514384586394]
This paper explores the use of Deep Learning methods for automatic estimation of quality of human translations.
Empirical results on a large human annotated dataset show that the neural model outperforms feature-based methods significantly.
arXiv Detail & Related papers (2020-03-13T16:57:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.