Quality Estimation of Machine Translated Texts based on Direct Evidence
from Training Data
- URL: http://arxiv.org/abs/2306.15399v1
- Date: Tue, 27 Jun 2023 11:52:28 GMT
- Title: Quality Estimation of Machine Translated Texts based on Direct Evidence
from Training Data
- Authors: Vibhuti Kumari, Narayana Murthy Kavi
- Abstract summary: We show that the parallel corpus used as training data for training the MT system holds direct clues for estimating the quality of translations produced by the MT system.
Our experiments show that this simple and direct method holds promise for quality estimation of translations produced by any purely data driven machine translation system.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Current Machine Translation systems achieve very good results on a growing
variety of language pairs and data sets. However, it is now well known that
they produce fluent translation outputs that often can contain important
meaning errors. Quality Estimation task deals with the estimation of quality of
translations produced by a Machine Translation system without depending on
Reference Translations. A number of approaches have been suggested over the
years. In this paper we show that the parallel corpus used as training data for
training the MT system holds direct clues for estimating the quality of
translations produced by the MT system. Our experiments show that this simple
and direct method holds promise for quality estimation of translations produced
by any purely data driven machine translation system.
Related papers
- Evaluating Automatic Metrics with Incremental Machine Translation Systems [55.78547133890403]
We introduce a dataset comprising commercial machine translations, gathered weekly over six years across 12 translation directions.
We assume commercial systems improve over time, which enables us to evaluate machine translation (MT) metrics based on their preference for more recent translations.
arXiv Detail & Related papers (2024-07-03T17:04:17Z) - Lost in the Source Language: How Large Language Models Evaluate the Quality of Machine Translation [64.5862977630713]
This study investigates how Large Language Models (LLMs) leverage source and reference data in machine translation evaluation task.
We find that reference information significantly enhances the evaluation accuracy, while surprisingly, source information sometimes is counterproductive.
arXiv Detail & Related papers (2024-01-12T13:23:21Z) - Machine Translation Impact in E-commerce Multilingual Search [0.0]
Cross-lingual information retrieval correlates highly with the quality of Machine Translation.
There may be a threshold beyond which improving query translation quality yields little or no benefit to further improve the retrieval performance.
arXiv Detail & Related papers (2023-01-31T21:59:35Z) - Extrinsic Evaluation of Machine Translation Metrics [78.75776477562087]
It is unclear if automatic metrics are reliable at distinguishing good translations from bad translations at the sentence level.
We evaluate the segment-level performance of the most widely used MT metrics (chrF, COMET, BERTScore, etc.) on three downstream cross-lingual tasks.
Our experiments demonstrate that all metrics exhibit negligible correlation with the extrinsic evaluation of the downstream outcomes.
arXiv Detail & Related papers (2022-12-20T14:39:58Z) - Competency-Aware Neural Machine Translation: Can Machine Translation
Know its Own Translation Quality? [61.866103154161884]
Neural machine translation (NMT) is often criticized for failures that happen without awareness.
We propose a novel competency-aware NMT by extending conventional NMT with a self-estimator.
We show that the proposed method delivers outstanding performance on quality estimation.
arXiv Detail & Related papers (2022-11-25T02:39:41Z) - A Bayesian approach to translators' reliability assessment [0.0]
We consider the Translation Quality Assessment process as a complex process, considering it from the physics of complex systems point of view.
We build two Bayesian models that parameterise the features involved in the TQA process, namely the translation difficulty, the characteristics of the translators involved in producing the translation and assessing its quality.
We show that reviewers reliability cannot be taken for granted even if they are expert translators.
arXiv Detail & Related papers (2022-03-14T14:29:45Z) - Measuring Uncertainty in Translation Quality Evaluation (TQE) [62.997667081978825]
This work carries out motivated research to correctly estimate the confidence intervals citeBrown_etal2001Interval depending on the sample size of the translated text.
The methodology we applied for this work is from Bernoulli Statistical Distribution Modelling (BSDM) and Monte Carlo Sampling Analysis (MCSA)
arXiv Detail & Related papers (2021-11-15T12:09:08Z) - ChrEnTranslate: Cherokee-English Machine Translation Demo with Quality
Estimation and Corrective Feedback [70.5469946314539]
ChrEnTranslate is an online machine translation demonstration system for translation between English and an endangered language Cherokee.
It supports both statistical and neural translation models as well as provides quality estimation to inform users of reliability.
arXiv Detail & Related papers (2021-07-30T17:58:54Z) - Robust Neural Machine Translation: Modeling Orthographic and
Interpunctual Variation [3.3194866396158]
We propose a simple generative noise model to generate adversarial examples of ten different types.
We show that, when tested on noisy data, systems trained using adversarial examples perform almost as well as when translating clean data.
arXiv Detail & Related papers (2020-09-11T14:12:54Z) - Can Your Context-Aware MT System Pass the DiP Benchmark Tests? :
Evaluation Benchmarks for Discourse Phenomena in Machine Translation [7.993547048820065]
We introduce the first of their kind MT benchmark datasets that aim to track and hail improvements across four main discourse phenomena.
Surprisingly, we find that existing context-aware models do not improve discourse-related translations consistently across languages and phenomena.
arXiv Detail & Related papers (2020-04-30T07:15:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.