Related papers: Interpretable Text Classification Applied to the Detection of LLM-generated Creative Writing

Interpretable Text Classification Applied to the Detection of LLM-generated Creative Writing

URL: http://arxiv.org/abs/2601.07368v1
Date: Mon, 12 Jan 2026 09:50:15 GMT
Title: Interpretable Text Classification Applied to the Detection of LLM-generated Creative Writing
Authors: Minerva Suvanto, Andrea McGlinchey, Mattias Wahde, Peter J Barclay,
Abstract summary: We consider the problem of distinguishing human-written creative fiction (excerpts from novels) from similar text generated by an LLM.<n>Our results show that, while human observers perform poorly (near chance levels) on this binary classification task, a variety of machine-learning models achieve accuracy in the range 0.93 - 0.98.
Score: 0.20999222360659608
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: We consider the problem of distinguishing human-written creative fiction (excerpts from novels) from similar text generated by an LLM. Our results show that, while human observers perform poorly (near chance levels) on this binary classification task, a variety of machine-learning models achieve accuracy in the range 0.93 - 0.98 over a previously unseen test set, even using only short samples and single-token (unigram) features. We therefore employ an inherently interpretable (linear) classifier (with a test accuracy of 0.98), in order to elucidate the underlying reasons for this high accuracy. In our analysis, we identify specific unigram features indicative of LLM-generated text, one of the most important being that the LLM tends to use a larger variety of synonyms, thereby skewing the probability distributions in a manner that is easy to detect for a machine learning classifier, yet very difficult for a human observer. Four additional explanation categories were also identified, namely, temporal drift, Americanisms, foreign language usage, and colloquialisms. As identification of the AI-generated text depends on a constellation of such features, the classification appears robust, and therefore not easy to circumvent by malicious actors intent on misrepresenting AI-generated text as human work.

Related papers

Computational Turing Test Reveals Systematic Differences Between Human and AI Language [0.0]
Large language models (LLMs) are increasingly used in the social sciences to simulate human behavior.<n>Existing validation efforts rely heavily on human-judgment-based evaluations.<n>This paper introduces a computational Turing test to assess how closely LLMs approximate human language.
arXiv Detail & Related papers (2025-11-06T08:56:37Z)
Textual interpretation of transient image classifications from large language models [0.0]
Large language models (LLMs) can approach the performance level of a convolutional neural network on three optical transient survey datasets.<n>Google's LLM, Gemini, achieves a 93% average accuracy across datasets that span a range of resolution and pixel scales.
arXiv Detail & Related papers (2025-10-08T12:12:46Z)
Human Texts Are Outliers: Detecting LLM-generated Texts via Out-of-distribution Detection [71.59834293521074]
We develop a framework to distinguish between human-authored and machine-generated text.<n>Our method achieves 98.3% AUROC and AUPR with only 8.9% FPR95 on DeepFake dataset.<n>Code, pretrained weights, and demo will be released.
arXiv Detail & Related papers (2025-10-07T08:14:45Z)
Diversity Boosts AI-Generated Text Detection [51.56484100374058]
DivEye is a novel framework that captures how unpredictability fluctuates across a text using surprisal-based features.<n>Our method outperforms existing zero-shot detectors by up to 33.2% and achieves competitive performance with fine-tuned baselines.
arXiv Detail & Related papers (2025-09-23T10:21:22Z)
RepreGuard: Detecting LLM-Generated Text by Revealing Hidden Representation Patterns [50.401907401444404]
Large language models (LLMs) are crucial for preventing misuse and building trustworthy AI systems.<n>We propose RepreGuard, an efficient statistics-based detection method.<n> Experimental results show that RepreGuard outperforms all baselines with average 94.92% AUROC on both in-distribution (ID) and OOD scenarios.
arXiv Detail & Related papers (2025-08-18T17:59:15Z)
mdok of KInIT: Robustly Fine-tuned LLM for Binary and Multiclass AI-Generated Text Detection [3.562613318511706]
An automated detection is able to assist humans to indicate the machine-generated texts.<n>This notebook describes our mdok approach in robust detection, based on fine-tuning smaller LLMs for text classification.<n>It is applied to both subtasks of Voight-Kampff Generative AI Detection 2025, providing remarkable performance (1st rank) in both.
arXiv Detail & Related papers (2025-06-02T14:07:32Z)
ExaGPT: Example-Based Machine-Generated Text Detection for Human Interpretability [62.285407189502216]
Detecting texts generated by Large Language Models (LLMs) could cause grave mistakes due to incorrect decisions.<n>We introduce ExaGPT, an interpretable detection approach grounded in the human decision-making process.<n>We show that ExaGPT massively outperforms prior powerful detectors by up to +40.9 points of accuracy at a false positive rate of 1%.
arXiv Detail & Related papers (2025-02-17T01:15:07Z)
Leveraging Explainable AI for LLM Text Attribution: Differentiating Human-Written and Multiple LLMs-Generated Text [1.1137087573421256]
This study aims to support efforts to detect and identify textual content generated using Generative AI Large Language Models.<n>We leverage several machine learning algorithms such as Random Forest (RF), and Recurrent Neural Networks (RNN) to understand the important features in attribution.<n>Our method is divided into 1) binary classification to differentiate between human-written and AI-text, and 2) multi classification, to differentiate between human-written text and the text generated by the five different LLM tools.
arXiv Detail & Related papers (2025-01-06T18:46:53Z)
Detecting Hallucinations in Large Language Model Generation: A Token Probability Approach [0.0]
Large Language Models (LLMs) produce inaccurate outputs, also known as hallucinations. This paper introduces a supervised learning approach employing only four numerical features derived from tokens and vocabulary probabilities obtained from other evaluators. The method yields promising results, surpassing state-of-the-art outcomes in multiple tasks across three different benchmarks.
arXiv Detail & Related papers (2024-05-30T03:00:47Z)
Who Wrote This? The Key to Zero-Shot LLM-Generated Text Detection Is GECScore [51.65730053591696]
We propose a simple yet effective black-box zero-shot detection approach based on the observation that human-written texts typically contain more grammatical errors than LLM-generated texts.<n> Experimental results show that our method outperforms current state-of-the-art (SOTA) zero-shot and supervised methods.
arXiv Detail & Related papers (2024-05-07T12:57:01Z)
Spotting LLMs With Binoculars: Zero-Shot Detection of Machine-Generated Text [98.28130949052313]
A score based on contrasting two closely related language models is highly accurate at separating human-generated and machine-generated text. We propose a novel LLM detector that only requires simple calculations using a pair of pre-trained LLMs. The method, called Binoculars, achieves state-of-the-art accuracy without any training data.
arXiv Detail & Related papers (2024-01-22T16:09:47Z)
Using Natural Language Explanations to Rescale Human Judgments [81.66697572357477]
We propose a method to rescale ordinal annotations and explanations using large language models (LLMs)<n>We feed annotators' Likert ratings and corresponding explanations into an LLM and prompt it to produce a numeric score anchored in a scoring rubric.<n>Our method rescales the raw judgments without impacting agreement and brings the scores closer to human judgments grounded in the same scoring rubric.
arXiv Detail & Related papers (2023-05-24T06:19:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.