Can postgraduate translation students identify machine-generated text?
- URL: http://arxiv.org/abs/2504.09164v2
- Date: Fri, 18 Apr 2025 13:42:13 GMT
- Title: Can postgraduate translation students identify machine-generated text?
- Authors: Michael Farrell,
- Abstract summary: This study explores the ability of linguistically trained individuals to discern machine-generated output from human-written text.<n>Twenty-three postgraduate translation students analysed excerpts of Italian prose and assigned likelihood scores to indicate whether they believed they were human-written or AI-generated.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Given the growing use of generative artificial intelligence as a tool for creating multilingual content and bypassing both machine and traditional translation methods, this study explores the ability of linguistically trained individuals to discern machine-generated output from human-written text (HT). After brief training sessions on the textual anomalies typically found in synthetic text (ST), twenty-three postgraduate translation students analysed excerpts of Italian prose and assigned likelihood scores to indicate whether they believed they were human-written or AI-generated (ChatGPT-4o). The results show that, on average, the students struggled to distinguish between HT and ST, with only two participants achieving notable accuracy. Closer analysis revealed that the students often identified the same textual anomalies in both HT and ST, although features such as low burstiness and self-contradiction were more frequently associated with ST. These findings suggest the need for improvements in the preparatory training. Moreover, the study raises questions about the necessity of editing synthetic text to make it sound more human-like and recommends further research to determine whether AI-generated text is already sufficiently natural-sounding not to require further refinement.
Related papers
- Humans can learn to detect AI-generated texts, or at least learn when they can't [0.0]
This study investigates whether individuals can learn to accurately discriminate between human-written and AI-produced texts when provided with immediate feedback.<n>We used GPT-4o to generate several hundred texts across various genres and text types.<n>We presented randomized text pairs to 254 Czech native speakers who identified which text was human-written and which was AI-generated.
arXiv Detail & Related papers (2025-05-03T17:42:49Z) - Approaching the Limits to EFL Writing Enhancement with AI-generated Text and Diverse Learners [3.2668433085737036]
Students can compose texts by integrating their own words with AI-generated text.<n>This study investigated how 59 Hong Kong secondary school students interacted with AI-generated text to compose a feature article.
arXiv Detail & Related papers (2025-03-01T06:29:00Z) - Detecting Machine-Generated Long-Form Content with Latent-Space Variables [54.07946647012579]
Existing zero-shot detectors primarily focus on token-level distributions, which are vulnerable to real-world domain shifts.
We propose a more robust method that incorporates abstract elements, such as event transitions, as key deciding factors to detect machine versus human texts.
arXiv Detail & Related papers (2024-10-04T18:42:09Z) - Differentiating between human-written and AI-generated texts using linguistic features automatically extracted from an online computational tool [0.0]
This study aims to investigate how various linguistic components are represented in both types of texts, assessing the ability of AI to emulate human writing.
Despite AI-generated texts appearing to mimic human speech, the results revealed significant differences across multiple linguistic features.
arXiv Detail & Related papers (2024-07-04T05:37:09Z) - Text Injection for Neural Contextual Biasing [57.589903308622745]
This work proposes contextual text injection (CTI) to enhance contextual ASR.
CTI with 100 billion text sentences can achieve up to 43.3% relative WER reduction from a strong neural biasing model.
arXiv Detail & Related papers (2024-06-05T04:20:17Z) - Deep dive into language traits of AI-generated Abstracts [5.209583971923267]
Generative language models, such as ChatGPT, have garnered attention for their ability to generate human-like writing.
In this work, we attempt to detect the Abstracts generated by ChatGPT, which are much shorter in length and bounded.
We extract the texts semantic and lexical properties and observe that traditional machine learning models can confidently detect these Abstracts.
arXiv Detail & Related papers (2023-12-17T06:03:33Z) - FacTool: Factuality Detection in Generative AI -- A Tool Augmented
Framework for Multi-Task and Multi-Domain Scenarios [87.12753459582116]
A wider range of tasks now face an increasing risk of containing factual errors when handled by generative models.
We propose FacTool, a task and domain agnostic framework for detecting factual errors of texts generated by large language models.
arXiv Detail & Related papers (2023-07-25T14:20:51Z) - The Imitation Game: Detecting Human and AI-Generated Texts in the Era of
ChatGPT and BARD [3.2228025627337864]
We introduce a novel dataset of human-written and AI-generated texts in different genres.
We employ several machine learning models to classify the texts.
Results demonstrate the efficacy of these models in discerning between human and AI-generated text.
arXiv Detail & Related papers (2023-07-22T21:00:14Z) - Testing of Detection Tools for AI-Generated Text [0.0]
The paper examines the functionality of detection tools for artificial intelligence generated text.
It evaluates them based on accuracy and error type analysis.
The research covers 12 publicly available tools and two commercial systems.
arXiv Detail & Related papers (2023-06-21T16:29:44Z) - MAGE: Machine-generated Text Detection in the Wild [82.70561073277801]
Large language models (LLMs) have achieved human-level text generation, emphasizing the need for effective AI-generated text detection.
We build a comprehensive testbed by gathering texts from diverse human writings and texts generated by different LLMs.
Despite challenges, the top-performing detector can identify 86.54% out-of-domain texts generated by a new LLM, indicating the feasibility for application scenarios.
arXiv Detail & Related papers (2023-05-22T17:13:29Z) - On the Possibilities of AI-Generated Text Detection [76.55825911221434]
We argue that as machine-generated text approximates human-like quality, the sample size needed for detection bounds increases.
We test various state-of-the-art text generators, including GPT-2, GPT-3.5-Turbo, Llama, Llama-2-13B-Chat-HF, and Llama-2-70B-Chat-HF, against detectors, including oBERTa-Large/Base-Detector, GPTZero.
arXiv Detail & Related papers (2023-04-10T17:47:39Z) - Code-Switching Text Generation and Injection in Mandarin-English ASR [57.57570417273262]
We investigate text generation and injection for improving the performance of an industry commonly-used streaming model, Transformer-Transducer (T-T)
We first propose a strategy to generate code-switching text data and then investigate injecting generated text into T-T model explicitly by Text-To-Speech (TTS) conversion or implicitly by tying speech and text latent spaces.
Experimental results on the T-T model trained with a dataset containing 1,800 hours of real Mandarin-English code-switched speech show that our approaches to inject generated code-switching text significantly boost the performance of T-T models.
arXiv Detail & Related papers (2023-03-20T09:13:27Z) - How much do language models copy from their training data? Evaluating
linguistic novelty in text generation using RAVEN [63.79300884115027]
Current language models can generate high-quality text.
Are they simply copying text they have seen before, or have they learned generalizable linguistic abstractions?
We introduce RAVEN, a suite of analyses for assessing the novelty of generated text.
arXiv Detail & Related papers (2021-11-18T04:07:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.