Beyond General Purpose Machine Translation: The Need for
Context-specific Empirical Research to Design for Appropriate User Trust
- URL: http://arxiv.org/abs/2205.06920v1
- Date: Fri, 13 May 2022 23:04:22 GMT
- Title: Beyond General Purpose Machine Translation: The Need for
Context-specific Empirical Research to Design for Appropriate User Trust
- Authors: Wesley Hanwen Deng, Nikita Mehandru, Samantha Robertson, Niloufar
Salehi
- Abstract summary: We discuss research directions to support users to calibrate trust in Machine Translation systems.
Based on our findings, we advocate for empirical research on how MT systems are used in practice.
- Score: 8.539683760001573
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Machine Translation (MT) has the potential to help people overcome language
barriers and is widely used in high-stakes scenarios, such as in hospitals.
However, in order to use MT reliably and safely, users need to understand when
to trust MT outputs and how to assess the quality of often imperfect
translation results. In this paper, we discuss research directions to support
users to calibrate trust in MT systems. We share findings from an empirical
study in which we conducted semi-structured interviews with 20 clinicians to
understand how they communicate with patients across language barriers, and if
and how they use MT systems. Based on our findings, we advocate for empirical
research on how MT systems are used in practice as an important first step to
addressing the challenges in building appropriate trust between users and MT
tools.
Related papers
- An Interdisciplinary Approach to Human-Centered Machine Translation [67.70453480427132]
Machine Translation (MT) tools are widely used today, often in contexts where professional translators are not present.<n>Despite progress in MT technology, a gap persists between system development and real-world usage.<n>This paper advocates for a human-centered approach to MT, emphasizing the alignment of system design with diverse communicative goals.
arXiv Detail & Related papers (2025-06-16T13:27:44Z) - Comparing Large Language Models and Traditional Machine Translation Tools for Translating Medical Consultation Summaries: A Pilot Study [9.136745540182423]
This study evaluates how well large language models (LLMs) and traditional machine translation (MT) tools translate medical consultation summaries from English into Arabic, Chinese, and Vietnamese.
Results showed that traditional MT tools generally performed better, especially for complex texts.
Arabic translations improved with complexity due to the language's morphology.
arXiv Detail & Related papers (2025-04-23T10:31:33Z) - Translation in the Hands of Many:Centering Lay Users in Machine Translation Interactions [17.694939962332914]
Machine Translation (MT) has become a global tool, with cross-lingual services now also supported by dialogue systems powered by multilingual Large Language Models (LLMs)
This paper traces the shift in MT user profiles, focusing on non-expert users.
We identify three key factors -- usability, trust, and literacy -- that shape these interactions and must be addressed to align MT with user needs.
arXiv Detail & Related papers (2025-02-19T14:45:17Z) - Understanding In-Context Machine Translation for Low-Resource Languages: A Case Study on Manchu [53.437954702561065]
In-context machine translation (MT) with large language models (LLMs) is a promising approach for low-resource MT.
This study systematically investigates how each resource and its quality affects the translation performance, with the Manchu language.
Our results indicate that high-quality dictionaries and good parallel examples are very helpful, while grammars hardly help.
arXiv Detail & Related papers (2025-02-17T14:53:49Z) - MT-LENS: An all-in-one Toolkit for Better Machine Translation Evaluation [1.7775825387442485]
MT-LENS is a framework designed to evaluate Machine Translation (MT) systems across a variety of tasks.
It offers a user-friendly platform to compare systems and analyze translations with interactive visualizations.
arXiv Detail & Related papers (2024-12-16T09:57:28Z) - Low-resource Machine Translation: what for? who for? An observational study on a dedicated Tetun language translation service [31.883641424813245]
We present an observational analysis of real-world MT usage for Tetun, the lingua franca of Timor-Leste.
Our analysis of $100,000$ translation requests reveals patterns that challenge assumptions based on existing corpora.
arXiv Detail & Related papers (2024-11-19T06:21:51Z) - Evaluating Automatic Metrics with Incremental Machine Translation Systems [55.78547133890403]
We introduce a dataset comprising commercial machine translations, gathered weekly over six years across 12 translation directions.
We assume commercial systems improve over time, which enables us to evaluate machine translation (MT) metrics based on their preference for more recent translations.
arXiv Detail & Related papers (2024-07-03T17:04:17Z) - Cyber Risks of Machine Translation Critical Errors : Arabic Mental Health Tweets as a Case Study [3.8779763612314637]
We introduce an authentic dataset of machine translation critical errors to point to the ethical and safety issues involved in the common use of MT.
The dataset comprises mistranslations of Arabic mental health postings manually annotated with critical error types.
We also show how the commonly used quality metrics do not penalise critical errors and highlight this as a critical issue that merits further attention from researchers.
arXiv Detail & Related papers (2024-05-19T20:24:51Z) - IMTLab: An Open-Source Platform for Building, Evaluating, and Diagnosing
Interactive Machine Translation Systems [94.39110258587887]
We present IMTLab, an open-source end-to-end interactive machine translation (IMT) system platform.
IMTLab treats the whole interactive translation process as a task-oriented dialogue with a human-in-the-loop setting.
arXiv Detail & Related papers (2023-10-17T11:29:04Z) - Automating Behavioral Testing in Machine Translation [9.151054827967933]
We propose to use Large Language Models to generate source sentences tailored to test the behavior of Machine Translation models.
We can then verify whether the MT model exhibits the expected behavior through matching candidate sets.
Our approach aims to make behavioral testing of MT systems practical while requiring only minimal human effort.
arXiv Detail & Related papers (2023-09-05T19:40:45Z) - Discourse Centric Evaluation of Machine Translation with a Densely
Annotated Parallel Corpus [82.07304301996562]
This paper presents a new dataset with rich discourse annotations, built upon the large-scale parallel corpus BWB introduced in Jiang et al.
We investigate the similarities and differences between the discourse structures of source and target languages.
We discover that MT outputs differ fundamentally from human translations in terms of their latent discourse structures.
arXiv Detail & Related papers (2023-05-18T17:36:41Z) - Extrinsic Evaluation of Machine Translation Metrics [78.75776477562087]
It is unclear if automatic metrics are reliable at distinguishing good translations from bad translations at the sentence level.
We evaluate the segment-level performance of the most widely used MT metrics (chrF, COMET, BERTScore, etc.) on three downstream cross-lingual tasks.
Our experiments demonstrate that all metrics exhibit negligible correlation with the extrinsic evaluation of the downstream outcomes.
arXiv Detail & Related papers (2022-12-20T14:39:58Z) - Self-Supervised Knowledge Assimilation for Expert-Layman Text Style
Transfer [63.72621204057025]
Expert-layman text style transfer technologies have the potential to improve communication between scientific communities and the general public.
High-quality information produced by experts is often filled with difficult jargon laypeople struggle to understand.
This is a particularly notable issue in the medical domain, where layman are often confused by medical text online.
arXiv Detail & Related papers (2021-10-06T17:57:22Z) - Difficulty-Aware Machine Translation Evaluation [19.973201669851626]
We propose a novel difficulty-aware machine translation evaluation metric.
A translation that fails to be predicted by most MT systems will be treated as a difficult one and assigned a large weight in the final score function.
Our proposed method performs well even when all the MT systems are very competitive.
arXiv Detail & Related papers (2021-07-30T02:45:36Z) - Unsupervised Quality Estimation for Neural Machine Translation [63.38918378182266]
Existing approaches require large amounts of expert annotated data, computation and time for training.
We devise an unsupervised approach to QE where no training or access to additional resources besides the MT system itself is required.
We achieve very good correlation with human judgments of quality, rivalling state-of-the-art supervised QE models.
arXiv Detail & Related papers (2020-05-21T12:38:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.