Automatic Machine Translation Detection Using a Surrogate Multilingual Translation Model
- URL: http://arxiv.org/abs/2511.02958v1
- Date: Tue, 04 Nov 2025 19:59:25 GMT
- Title: Automatic Machine Translation Detection Using a Surrogate Multilingual Translation Model
- Authors: Cristian García-Romero, Miquel Esplà-Gomis, Felipe Sánchez-Martínez,
- Abstract summary: We propose a novel approach to distinguish between human and machine-translated sentences.<n> Experimental results show that our method outperforms current state-of-the-art techniques.
- Score: 4.750257527930005
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Modern machine translation (MT) systems depend on large parallel corpora, often collected from the Internet. However, recent evidence indicates that (i) a substantial portion of these texts are machine-generated translations, and (ii) an overreliance on such synthetic content in training data can significantly degrade translation quality. As a result, filtering out non-human translations is becoming an essential pre-processing step in building high-quality MT systems. In this work, we propose a novel approach that directly exploits the internal representations of a surrogate multilingual MT model to distinguish between human and machine-translated sentences. Experimental results show that our method outperforms current state-of-the-art techniques, particularly for non-English language pairs, achieving gains of at least 5 percentage points of accuracy.
Related papers
- Beyond Literal Mapping: Benchmarking and Improving Non-Literal Translation Evaluation [57.11989521509119]
We propose a novel agentic translation evaluation framework, centered by a reflective Core Agent that invokes specialized sub-agents.<n> Experimental results indicate the efficacy of RATE, achieving an improvement of at least 3.2 meta score compared with current metrics.
arXiv Detail & Related papers (2026-01-12T09:03:42Z) - Towards Effective Disambiguation for Machine Translation with Large
Language Models [65.80775710657672]
We study the capabilities of large language models to translate "ambiguous sentences"
Experiments show that our methods can match or outperform state-of-the-art systems such as DeepL and NLLB in four out of five language directions.
arXiv Detail & Related papers (2023-09-20T22:22:52Z) - Quality Estimation of Machine Translated Texts based on Direct Evidence
from Training Data [0.0]
We show that the parallel corpus used as training data for training the MT system holds direct clues for estimating the quality of translations produced by the MT system.
Our experiments show that this simple and direct method holds promise for quality estimation of translations produced by any purely data driven machine translation system.
arXiv Detail & Related papers (2023-06-27T11:52:28Z) - Revisiting Machine Translation for Cross-lingual Classification [91.43729067874503]
Most research in the area focuses on the multilingual models rather than the Machine Translation component.
We show that, by using a stronger MT system and mitigating the mismatch between training on original text and running inference on machine translated text, translate-test can do substantially better than previously assumed.
arXiv Detail & Related papers (2023-05-23T16:56:10Z) - HilMeMe: A Human-in-the-Loop Machine Translation Evaluation Metric
Looking into Multi-Word Expressions [6.85316573653194]
We describe the design and implementation of a linguistically motivated human-in-the-loop evaluation metric looking into idiomatic and terminological Multi-word Expressions (MWEs)
MWEs can be used as one of the main factors to distinguish different MT systems by looking into their capabilities in recognising and translating MWEs in an accurate and meaning equivalent manner.
arXiv Detail & Related papers (2022-11-09T21:15:40Z) - Rethinking Round-Trip Translation for Machine Translation Evaluation [44.83568796515321]
We report the surprising finding that round-trip translation can be used for automatic evaluation without the references.
We demonstrate the rectification is overdue as round-trip translation could benefit multiple machine translation evaluation tasks.
arXiv Detail & Related papers (2022-09-15T15:06:20Z) - Supervised Visual Attention for Simultaneous Multimodal Machine
Translation [47.18251159303909]
We propose the first Transformer-based simultaneous machine translation (MMT) architecture.
We extend this model with an auxiliary supervision signal that guides its visual attention mechanism using labelled phrase-region alignments.
Our results show that supervised visual attention consistently improves the translation quality of the MMT models.
arXiv Detail & Related papers (2022-01-23T17:25:57Z) - Non-Parametric Online Learning from Human Feedback for Neural Machine
Translation [54.96594148572804]
We study the problem of online learning with human feedback in the human-in-the-loop machine translation.
Previous methods require online model updating or additional translation memory networks to achieve high-quality performance.
We propose a novel non-parametric online learning method without changing the model structure.
arXiv Detail & Related papers (2021-09-23T04:26:15Z) - When Does Translation Require Context? A Data-driven, Multilingual
Exploration [71.43817945875433]
proper handling of discourse significantly contributes to the quality of machine translation (MT)
Recent works in context-aware MT attempt to target a small set of discourse phenomena during evaluation.
We develop the Multilingual Discourse-Aware benchmark, a series of taggers that identify and evaluate model performance on discourse phenomena.
arXiv Detail & Related papers (2021-09-15T17:29:30Z) - PheMT: A Phenomenon-wise Dataset for Machine Translation Robustness on
User-Generated Contents [40.25277134147149]
We present a new dataset, PheMT, for evaluating the robustness of MT systems against specific linguistic phenomena in Japanese-English translation.
Our experiments with the created dataset revealed that not only our in-house models but even widely used off-the-shelf systems are greatly disturbed by the presence of certain phenomena.
arXiv Detail & Related papers (2020-11-04T04:44:47Z) - Translation Artifacts in Cross-lingual Transfer Learning [51.66536640084888]
We show that machine translation can introduce subtle artifacts that have a notable impact in existing cross-lingual models.
In natural language inference, translating the premise and the hypothesis independently can reduce the lexical overlap between them.
We also improve the state-of-the-art in XNLI for the translate-test and zero-shot approaches by 4.3 and 2.8 points, respectively.
arXiv Detail & Related papers (2020-04-09T17:54:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.