Related papers: ChatGPT-generated texts show authorship traits that identify them as non-human

ChatGPT-generated texts show authorship traits that identify them as non-human

URL: http://arxiv.org/abs/2508.16385v1
Date: Fri, 22 Aug 2025 13:38:58 GMT
Title: ChatGPT-generated texts show authorship traits that identify them as non-human
Authors: Vittoria Dentella, Weihang Huang, Silvia Angela Mansi, Jack Grieve, Evelina Leivada,
Abstract summary: This work examines whether a language model can also be linked to a specific fingerprint.<n>We find that the model can successfully adapt its style depending on whether it is prompted to produce a Wikipedia entry vs. a college essay.<n>Our results suggest that the model prefers nouns to verbs, thus showing a distinct linguistic backbone from humans.
Score: 0.6741942263052466
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large Language Models can emulate different writing styles, ranging from composing poetry that appears indistinguishable from that of famous poets to using slang that can convince people that they are chatting with a human online. While differences in style may not always be visible to the untrained eye, we can generally distinguish the writing of different people, like a linguistic fingerprint. This work examines whether a language model can also be linked to a specific fingerprint. Through stylometric and multidimensional register analyses, we compare human-authored and model-authored texts from different registers. We find that the model can successfully adapt its style depending on whether it is prompted to produce a Wikipedia entry vs. a college essay, but not in a way that makes it indistinguishable from humans. Concretely, the model shows more limited variation when producing outputs in different registers. Our results suggest that the model prefers nouns to verbs, thus showing a distinct linguistic backbone from humans, who tend to anchor language in the highly grammaticalized dimensions of tense, aspect, and mood. It is possible that the more complex domains of grammar reflect a mode of thought unique to humans, thus acting as a litmus test for Artificial Intelligence.

Related papers

Language Modeling and Understanding Through Paraphrase Generation and Detection [4.080540555071174]
We can express the same thoughts in virtually infinite ways using different words and structures.<n> Modeling paraphrases is a keystone to meaning in computational language models.<n>I propose that decomposing paraphrases into their constituent linguistic aspects offers a more cognitively grounded view of semantic equivalence.
arXiv Detail & Related papers (2026-02-09T05:09:03Z)
Do language models accommodate their users? A study of linguistic convergence [15.958711524171362]
We find that models strongly converge to the conversation's style, often significantly overfitting relative to the human baseline.<n>We observe consistent shifts in convergence across modeling settings, with instruction-tuned and larger models converging less than their pretrained counterparts.
arXiv Detail & Related papers (2025-08-05T09:55:40Z)
A Psycholinguistic Evaluation of Language Models' Sensitivity to Argument Roles [0.06554326244334868]
We evaluate large language models' sensitivity to argument roles by replicating psycholinguistic studies on human argument role processing. We find that language models are able to distinguish verbs that appear in plausible and implausible contexts, where plausibility is determined through the relation between the verb and its preceding arguments. This indicates that language models' capacity to detect verb plausibility does not arise from the same mechanism that underlies human real-time sentence processing.
arXiv Detail & Related papers (2024-10-21T16:05:58Z)
Detecting Mode Collapse in Language Models via Narration [0.0]
We study 4,374 stories sampled from three OpenAI language models. We show successive versions of GPT-3 suffer from increasing degrees of "mode collapse" Our method and results are significant for researchers seeking to employ language models in sociological simulations.
arXiv Detail & Related papers (2024-02-06T23:52:58Z)
Multilingual Text-to-Image Generation Magnifies Gender Stereotypes and Prompt Engineering May Not Help You [64.74707085021858]
We show that multilingual models suffer from significant gender biases just as monolingual models do.<n>We propose a novel benchmark, MAGBIG, intended to foster research on gender bias in multilingual models.<n>Our results show that not only do models exhibit strong gender biases but they also behave differently across languages.
arXiv Detail & Related papers (2024-01-29T12:02:28Z)
AI, write an essay for me: A large-scale comparison of human-written versus ChatGPT-generated essays [66.36541161082856]
ChatGPT and similar generative AI models have attracted hundreds of millions of users. This study compares human-written versus ChatGPT-generated argumentative student essays.
arXiv Detail & Related papers (2023-04-24T12:58:28Z)
Testing the Ability of Language Models to Interpret Figurative Language [69.59943454934799]
Figurative and metaphorical language are commonplace in discourse. It remains an open question to what extent modern language models can interpret nonliteral phrases. We introduce Fig-QA, a Winograd-style nonliteral language understanding task.
arXiv Detail & Related papers (2022-04-26T23:42:22Z)
Estimating the Personality of White-Box Language Models [0.589889361990138]
Large-scale language models, which are trained on large corpora of text, are being used in a wide range of applications everywhere. Existing research shows that these models can and do capture human biases. Many of these biases, especially those that could potentially cause harm, are being well-investigated. However, studies that infer and change human personality traits inherited by these models have been scarce or non-existent.
arXiv Detail & Related papers (2022-04-25T23:53:53Z)
It's not Rocket Science : Interpreting Figurative Language in Narratives [48.84507467131819]
We study the interpretation of two non-compositional figurative languages (idioms and similes) Our experiments show that models based solely on pre-trained language models perform substantially worse than humans on these tasks. We additionally propose knowledge-enhanced models, adopting human strategies for interpreting figurative language.
arXiv Detail & Related papers (2021-08-31T21:46:35Z)
Uncovering Constraint-Based Behavior in Neural Models via Targeted Fine-Tuning [9.391375268580806]
We show that competing linguistic processes within a language obscure underlying linguistic knowledge. While human behavior has been found to be similar across languages, we find cross-linguistic variation in model behavior. Our results suggest that models need to learn both the linguistic constraints in a language and their relative ranking, with mismatches in either producing non-human-like behavior.
arXiv Detail & Related papers (2021-06-02T14:52:11Z)
Read Like Humans: Autonomous, Bidirectional and Iterative Language Modeling for Scene Text Recognition [80.446770909975]
Linguistic knowledge is of great benefit to scene text recognition. How to effectively model linguistic rules in end-to-end deep networks remains a research challenge. We propose an autonomous, bidirectional and iterative ABINet for scene text recognition.
arXiv Detail & Related papers (2021-03-11T06:47:45Z)
Mechanisms for Handling Nested Dependencies in Neural-Network Language Models and Humans [75.15855405318855]
We studied whether a modern artificial neural network trained with "deep learning" methods mimics a central aspect of human sentence processing. Although the network was solely trained to predict the next word in a large corpus, analysis showed the emergence of specialized units that successfully handled local and long-distance syntactic agreement. We tested the model's predictions in a behavioral experiment where humans detected violations in number agreement in sentences with systematic variations in the singular/plural status of multiple nouns.
arXiv Detail & Related papers (2020-06-19T12:00:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.