Related papers: Humans can learn to detect AI-generated texts, or at least learn when they can't

Humans can learn to detect AI-generated texts, or at least learn when they can't

URL: http://arxiv.org/abs/2505.01877v3
Date: Wed, 07 May 2025 18:18:00 GMT
Title: Humans can learn to detect AI-generated texts, or at least learn when they can't
Authors: Jiří Milička, Anna Marklová, Ondřej Drobil, Eva Pospíšilová,
Abstract summary: This study investigates whether individuals can learn to accurately discriminate between human-written and AI-produced texts when provided with immediate feedback.<n>We used GPT-4o to generate several hundred texts across various genres and text types.<n>We presented randomized text pairs to 254 Czech native speakers who identified which text was human-written and which was AI-generated.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This study investigates whether individuals can learn to accurately discriminate between human-written and AI-produced texts when provided with immediate feedback, and if they can use this feedback to recalibrate their self-perceived competence. We also explore the specific criteria individuals rely upon when making these decisions, focusing on textual style and perceived readability. We used GPT-4o to generate several hundred texts across various genres and text types comparable to Koditex, a multi-register corpus of human-written texts. We then presented randomized text pairs to 254 Czech native speakers who identified which text was human-written and which was AI-generated. Participants were randomly assigned to two conditions: one receiving immediate feedback after each trial, the other receiving no feedback until experiment completion. We recorded accuracy in identification, confidence levels, response times, and judgments about text readability along with demographic data and participants' engagement with AI technologies prior to the experiment. Participants receiving immediate feedback showed significant improvement in accuracy and confidence calibration. Participants initially held incorrect assumptions about AI-generated text features, including expectations about stylistic rigidity and readability. Notably, without feedback, participants made the most errors precisely when feeling most confident -- an issue largely resolved among the feedback group. The ability to differentiate between human and AI-generated texts can be effectively learned through targeted training with explicit feedback, which helps correct misconceptions about AI stylistic features and readability, as well as potential other variables that were not explored, while facilitating more accurate self-assessment. This finding might be particularly important in educational contexts.

Related papers

Can postgraduate translation students identify machine-generated text? [0.0]
This study explores the ability of linguistically trained individuals to discern machine-generated output from human-written text.<n>Twenty-three postgraduate translation students analysed excerpts of Italian prose and assigned likelihood scores to indicate whether they believed they were human-written or AI-generated.
arXiv Detail & Related papers (2025-04-12T09:58:09Z)
ExaGPT: Example-Based Machine-Generated Text Detection for Human Interpretability [62.285407189502216]
Detecting texts generated by Large Language Models (LLMs) could cause grave mistakes due to incorrect decisions.<n>We introduce ExaGPT, an interpretable detection approach grounded in the human decision-making process.<n>We show that ExaGPT massively outperforms prior powerful detectors by up to +40.9 points of accuracy at a false positive rate of 1%.
arXiv Detail & Related papers (2025-02-17T01:15:07Z)
Group-Adaptive Threshold Optimization for Robust AI-Generated Text Detection [60.09665704993751]
We introduce FairOPT, an algorithm for group-specific threshold optimization in AI-generated content classifiers.<n>Our approach partitions data into subgroups based on attributes (e.g., text length and writing style) and learns decision thresholds for each group.<n>Our framework paves the way for more robust and fair classification criteria in AI-generated output detection.
arXiv Detail & Related papers (2025-02-06T21:58:48Z)
People who frequently use ChatGPT for writing tasks are accurate and robust detectors of AI-generated text [37.36534911201806]
We hire annotators to read 300 non-fiction English articles and label them as either human-written or AI-generated.<n>Experiments show that annotators who frequently use LLMs for writing tasks excel at detecting AI-generated text.<n>We release our annotated dataset and code to spur future research into both human and automated detection of AI-generated text.
arXiv Detail & Related papers (2025-01-26T19:31:34Z)
Human Bias in the Face of AI: The Role of Human Judgement in AI Generated Text Evaluation [48.70176791365903]
This study explores how bias shapes the perception of AI versus human generated content. We investigated how human raters respond to labeled and unlabeled content.
arXiv Detail & Related papers (2024-09-29T04:31:45Z)
Who Writes the Review, Human or AI? [0.36498648388765503]
This study proposes a methodology to accurately distinguish AI-generated and human-written book reviews. Our approach utilizes transfer learning, enabling the model to identify generated text across different topics. The experimental results demonstrate that it is feasible to detect the original source of text, achieving an accuracy rate of 96.86%.
arXiv Detail & Related papers (2024-05-30T17:38:44Z)
Digital Comprehensibility Assessment of Simplified Texts among Persons with Intellectual Disabilities [2.446971913303003]
We conducted an evaluation study of text comprehensibility including participants with and without intellectual disabilities reading German texts on a tablet computer. We explored four different approaches to measuring comprehensibility: multiple-choice comprehension questions, perceived difficulty ratings, response time, and reading speed. For the target group of persons with intellectual disabilities, comprehension questions emerged as the most reliable measure, while analyzing reading speed provided valuable insights into participants' reading behavior.
arXiv Detail & Related papers (2024-02-20T15:37:08Z)
Towards Possibilities & Impossibilities of AI-generated Text Detection: A Survey [97.33926242130732]
Large Language Models (LLMs) have revolutionized the domain of natural language processing (NLP) with remarkable capabilities of generating human-like text responses. Despite these advancements, several works in the existing literature have raised serious concerns about the potential misuse of LLMs. To address these concerns, a consensus among the research community is to develop algorithmic solutions to detect AI-generated text.
arXiv Detail & Related papers (2023-10-23T18:11:32Z)
On the Possibilities of AI-Generated Text Detection [76.55825911221434]
We argue that as machine-generated text approximates human-like quality, the sample size needed for detection bounds increases. We test various state-of-the-art text generators, including GPT-2, GPT-3.5-Turbo, Llama, Llama-2-13B-Chat-HF, and Llama-2-70B-Chat-HF, against detectors, including oBERTa-Large/Base-Detector, GPTZero.
arXiv Detail & Related papers (2023-04-10T17:47:39Z)
Real or Fake Text?: Investigating Human Ability to Detect Boundaries Between Human-Written and Machine-Generated Text [23.622347443796183]
We study a more realistic setting where text begins as human-written and transitions to being generated by state-of-the-art neural language models. We show that, while annotators often struggle at this task, there is substantial variance in annotator skill and that given proper incentives, annotators can improve at this task over time.
arXiv Detail & Related papers (2022-12-24T06:40:25Z)
NaturalSpeech: End-to-End Text to Speech Synthesis with Human-Level Quality [123.97136358092585]
We develop a TTS system called NaturalSpeech that achieves human-level quality on a benchmark dataset. Specifically, we leverage a variational autoencoder (VAE) for end-to-end text to waveform generation. Experiment evaluations on popular LJSpeech dataset show that our proposed NaturalSpeech achieves -0.01 CMOS to human recordings at the sentence level.
arXiv Detail & Related papers (2022-05-09T16:57:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.