Translation Quality Assessment: A Brief Survey on Manual and Automatic
Methods
- URL: http://arxiv.org/abs/2105.03311v1
- Date: Wed, 5 May 2021 18:28:10 GMT
- Title: Translation Quality Assessment: A Brief Survey on Manual and Automatic
Methods
- Authors: Lifeng Han, Gareth J. F. Jones and Alan F. Smeaton
- Abstract summary: We present a high-level and concise survey of translation quality assessment (TQA) methods, including both manual judgement criteria and automated evaluation metrics.
We hope that this work will be an asset for both translation model researchers and quality assessment researchers.
- Score: 9.210509295803243
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: To facilitate effective translation modeling and translation studies, one of
the crucial questions to address is how to assess translation quality. From the
perspectives of accuracy, reliability, repeatability and cost, translation
quality assessment (TQA) itself is a rich and challenging task. In this work,
we present a high-level and concise survey of TQA methods, including both
manual judgement criteria and automated evaluation metrics, which we classify
into further detailed sub-categories. We hope that this work will be an asset
for both translation model researchers and quality assessment researchers. In
addition, we hope that it will enable practitioners to quickly develop a better
understanding of the conventional TQA field, and to find corresponding closely
relevant evaluation solutions for their own needs. This work may also serve
inspire further development of quality assessment and evaluation methodologies
for other natural language processing (NLP) tasks in addition to machine
translation (MT), such as automatic text summarization (ATS), natural language
understanding (NLU) and natural language generation (NLG).
Related papers
- SpeechQE: Estimating the Quality of Direct Speech Translation [23.83384136789891]
We formulate the task of quality estimation for speech translation (SpeechQE), construct a benchmark, and evaluate a family of systems based on cascaded and end-to-end architectures.
Results suggest end-to-end approaches are better suited to estimating the quality of direct speech translation than using quality estimation systems designed for text in cascaded systems.
arXiv Detail & Related papers (2024-10-28T19:50:04Z) - Benchmarking Large Language Models for Conversational Question Answering in Multi-instructional Documents [61.41316121093604]
We present InsCoQA, a novel benchmark for evaluating large language models (LLMs) in the context of conversational question answering (CQA)
Sourced from extensive, encyclopedia-style instructional content, InsCoQA assesses models on their ability to retrieve, interpret, and accurately summarize procedural guidance from multiple documents.
We also propose InsEval, an LLM-assisted evaluator that measures the integrity and accuracy of generated responses and procedural instructions.
arXiv Detail & Related papers (2024-10-01T09:10:00Z) - What Makes a Good Story and How Can We Measure It? A Comprehensive Survey of Story Evaluation [57.550045763103334]
evaluating a story can be more challenging than other generation evaluation tasks.
We first summarize existing storytelling tasks, including text-to-text, visual-to-text, and text-to-visual.
We propose a taxonomy to organize evaluation metrics that have been developed or can be adopted for story evaluation.
arXiv Detail & Related papers (2024-08-26T20:35:42Z) - Questionnaires for Everyone: Streamlining Cross-Cultural Questionnaire Adaptation with GPT-Based Translation Quality Evaluation [6.8731197511363415]
This work presents a prototype tool that can expedite the questionnaire translation process.
The tool incorporates forward-backward translation using DeepL alongside GPT-4-generated translation quality evaluations and improvement suggestions.
arXiv Detail & Related papers (2024-07-30T07:34:40Z) - Competency-Aware Neural Machine Translation: Can Machine Translation
Know its Own Translation Quality? [61.866103154161884]
Neural machine translation (NMT) is often criticized for failures that happen without awareness.
We propose a novel competency-aware NMT by extending conventional NMT with a self-estimator.
We show that the proposed method delivers outstanding performance on quality estimation.
arXiv Detail & Related papers (2022-11-25T02:39:41Z) - A Bayesian approach to translators' reliability assessment [0.0]
We consider the Translation Quality Assessment process as a complex process, considering it from the physics of complex systems point of view.
We build two Bayesian models that parameterise the features involved in the TQA process, namely the translation difficulty, the characteristics of the translators involved in producing the translation and assessing its quality.
We show that reviewers reliability cannot be taken for granted even if they are expert translators.
arXiv Detail & Related papers (2022-03-14T14:29:45Z) - An Overview on Machine Translation Evaluation [6.85316573653194]
Machine translation (MT) has become one of the important tasks of AI and development.
The evaluation task of MT is not only to evaluate the quality of machine translation, but also to give timely feedback to machine translation researchers.
This report mainly includes a brief history of machine translation evaluation (MTE), the classification of research methods on MTE, and the the cutting-edge progress.
arXiv Detail & Related papers (2022-02-22T16:58:28Z) - Measuring Uncertainty in Translation Quality Evaluation (TQE) [62.997667081978825]
This work carries out motivated research to correctly estimate the confidence intervals citeBrown_etal2001Interval depending on the sample size of the translated text.
The methodology we applied for this work is from Bernoulli Statistical Distribution Modelling (BSDM) and Monte Carlo Sampling Analysis (MCSA)
arXiv Detail & Related papers (2021-11-15T12:09:08Z) - TextFlint: Unified Multilingual Robustness Evaluation Toolkit for
Natural Language Processing [73.16475763422446]
We propose a multilingual robustness evaluation platform for NLP tasks (TextFlint)
It incorporates universal text transformation, task-specific transformation, adversarial attack, subpopulation, and their combinations to provide comprehensive robustness analysis.
TextFlint generates complete analytical reports as well as targeted augmented data to address the shortcomings of the model's robustness.
arXiv Detail & Related papers (2021-03-21T17:20:38Z) - Unsupervised Quality Estimation for Neural Machine Translation [63.38918378182266]
Existing approaches require large amounts of expert annotated data, computation and time for training.
We devise an unsupervised approach to QE where no training or access to additional resources besides the MT system itself is required.
We achieve very good correlation with human judgments of quality, rivalling state-of-the-art supervised QE models.
arXiv Detail & Related papers (2020-05-21T12:38:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.