Related papers: Distinguishing Translations by Human, NMT, and ChatGPT: A Linguistic and Statistical Approach

Distinguishing Translations by Human, NMT, and ChatGPT: A Linguistic and Statistical Approach

URL: http://arxiv.org/abs/2312.10750v2
Date: Sat, 12 Oct 2024 10:58:29 GMT
Title: Distinguishing Translations by Human, NMT, and ChatGPT: A Linguistic and Statistical Approach
Authors: Zhaokun Jiang, Qianxi Lv, Ziyin Zhang, Lei Lei,
Abstract summary: This study investigates three key questions: (1) the distinguishability of ChatGPT-generated translations from NMT and human translation (HT), (2) the linguistic characteristics of each translation type, and (3) the degree of resemblance between ChatGPT-produced translations and HT or NMT.
Score: 1.6982207802596105
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: The growing popularity of neural machine translation (NMT) and LLMs represented by ChatGPT underscores the need for a deeper understanding of their distinct characteristics and relationships. Such understanding is crucial for language professionals and researchers to make informed decisions and tactful use of these cutting-edge translation technology, but remains underexplored. This study aims to fill this gap by investigating three key questions: (1) the distinguishability of ChatGPT-generated translations from NMT and human translation (HT), (2) the linguistic characteristics of each translation type, and (3) the degree of resemblance between ChatGPT-produced translations and HT or NMT. To achieve these objectives, we employ statistical testing, machine learning algorithms, and multidimensional analysis (MDA) to analyze Spokesperson's Remarks and their translations. After extracting a wide range of linguistic features, supervised classifiers demonstrate high accuracy in distinguishing the three translation types, whereas unsupervised clustering techniques do not yield satisfactory results. Another major finding is that ChatGPT-produced translations exhibit greater similarity with NMT than HT in most MDA dimensions, which is further corroborated by distance computing and visualization. These novel insights shed light on the interrelationships among the three translation types and have implications for the future advancements of NMT and generative AI.

Related papers

Beyond Literal Mapping: Benchmarking and Improving Non-Literal Translation Evaluation [57.11989521509119]
We propose a novel agentic translation evaluation framework, centered by a reflective Core Agent that invokes specialized sub-agents.<n> Experimental results indicate the efficacy of RATE, achieving an improvement of at least 3.2 meta score compared with current metrics.
arXiv Detail & Related papers (2026-01-12T09:03:42Z)
Liaozhai through the Looking-Glass: On Paratextual Explicitation of Culture-Bound Terms in Machine Translation [70.43884512651668]
We formalize Genette's (1987) theory of paratexts from literary and translation studies to introduce the task of paratextual explicitation for machine translation.<n>We construct a dataset of 560 expert-aligned paratexts from four English translations of the classical Chinese short story collection Liaozhai.<n>Our findings demonstrate the potential of paratextual explicitation in advancing machine translation beyond linguistic equivalence.
arXiv Detail & Related papers (2025-09-27T16:27:36Z)
Can Peter Pan Survive MT? A Stylometric Study of LLMs, NMTs, and HTs in Children's Literature Translation [0.0]
The research constructs a Peter Pan corpus, comprising 21 translations: 7 human translations (HTs), 7 large language model translations (LLMs), and 7 neural machine translation outputs (NMTs)<n>The analysis employs a generic feature set (including lexical, syntactic readability, and n-gram features) and a creative text translation (CTT-specific) feature set, which captures repetition, rhythm, translatability, and miscellaneous levels, yielding 447 linguistic features in total.<n>Results reveal that in generic features, HTs and MTs exhibit significant differences in conjunction word distributions and the ratio of 1-word-gram-Yi
arXiv Detail & Related papers (2025-06-27T09:34:40Z)
The Comparison of Translationese in Machine Translation and Human Transation in terms of Translation Relations [7.776258153133857]
The research employs two parallel corpora, each spanning nine genres with the same source texts with one translated by NMT and the other by humans. The results indicate that NMT relies on literal translation significantly more than HT across genres.
arXiv Detail & Related papers (2024-03-27T19:12:20Z)
An Empirical study of Unsupervised Neural Machine Translation: analyzing NMT output, model's behavior and sentences' contribution [5.691028372215281]
Unsupervised Neural Machine Translation (UNMT) focuses on improving NMT results under the assumption there is no human translated parallel data. We focus on three very diverse languages, French, Gujarati, and Kazakh, and train bilingual NMT models, to and from English, with various levels of supervision.
arXiv Detail & Related papers (2023-12-19T20:35:08Z)
Towards Effective Disambiguation for Machine Translation with Large Language Models [65.80775710657672]
We study the capabilities of large language models to translate "ambiguous sentences" Experiments show that our methods can match or outperform state-of-the-art systems such as DeepL and NLLB in four out of five language directions.
arXiv Detail & Related papers (2023-09-20T22:22:52Z)
Translation-Enhanced Multilingual Text-to-Image Generation [61.41730893884428]
Research on text-to-image generation (TTI) still predominantly focuses on the English language. In this work, we thus investigate multilingual TTI and the current potential of neural machine translation (NMT) to bootstrap mTTI systems. We propose Ensemble Adapter (EnsAd), a novel parameter-efficient approach that learns to weigh and consolidate the multilingual text knowledge within the mTTI framework.
arXiv Detail & Related papers (2023-05-30T17:03:52Z)
Do GPTs Produce Less Literal Translations? [20.095646048167612]
Large Language Models (LLMs) have emerged as general-purpose language models capable of addressing many natural language generation or understanding tasks. We find that translations out of English (E-X) from GPTs tend to be less literal, while exhibiting similar or better scores on Machine Translation quality metrics.
arXiv Detail & Related papers (2023-05-26T10:38:31Z)
Discourse Centric Evaluation of Machine Translation with a Densely Annotated Parallel Corpus [82.07304301996562]
This paper presents a new dataset with rich discourse annotations, built upon the large-scale parallel corpus BWB introduced in Jiang et al. We investigate the similarities and differences between the discourse structures of source and target languages. We discover that MT outputs differ fundamentally from human translations in terms of their latent discourse structures.
arXiv Detail & Related papers (2023-05-18T17:36:41Z)
Exploring Human-Like Translation Strategy with Large Language Models [93.49333173279508]
Large language models (LLMs) have demonstrated impressive capabilities in general scenarios. This work proposes the MAPS framework, which stands for Multi-Aspect Prompting and Selection. We employ a selection mechanism based on quality estimation to filter out noisy and unhelpful knowledge.
arXiv Detail & Related papers (2023-05-06T19:03:12Z)
Document-Level Machine Translation with Large Language Models [91.03359121149595]
Large language models (LLMs) can produce coherent, cohesive, relevant, and fluent answers for various natural language processing (NLP) tasks. This paper provides an in-depth evaluation of LLMs' ability on discourse modeling.
arXiv Detail & Related papers (2023-04-05T03:49:06Z)
Machine Translationese: Effects of Algorithmic Bias on Linguistic Complexity in Machine Translation [2.0625936401496237]
We go beyond the study of gender in Machine Translation and investigate how bias amplification might affect language in a broader sense. We assess the linguistic richness (on a lexical and morphological level) of translations created by different data-driven MT paradigms.
arXiv Detail & Related papers (2021-01-30T18:49:11Z)
Decoding and Diversity in Machine Translation [90.33636694717954]
We characterize differences between cost diversity paid for the BLEU scores enjoyed by NMT. Our study implicates search as a salient source of known bias when translating gender pronouns.
arXiv Detail & Related papers (2020-11-26T21:09:38Z)
On the Integration of LinguisticFeatures into Statistical and Neural Machine Translation [2.132096006921048]
We investigate the discrepancies between the strengths of statistical approaches to machine translation and the way humans translate. We identify linguistic information that is lacking in order for automatic translation systems to produce more accurate translations. We identify overgeneralization or 'algomic bias' as a potential drawback of neural MT and link it to many of the remaining linguistic issues.
arXiv Detail & Related papers (2020-03-31T16:03:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.