Related papers: Benchmark of stylistic variation in LLM-generated texts

Benchmark of stylistic variation in LLM-generated texts

URL: http://arxiv.org/abs/2509.10179v2
Date: Thu, 18 Sep 2025 23:31:43 GMT
Title: Benchmark of stylistic variation in LLM-generated texts
Authors: Jiří Milička, Anna Marklová, Václav Cvrček,
Abstract summary: This study investigates the register variation in texts written by humans and comparable texts produced by large language models (LLMs)<n>Similar analysis is replicated on Czech using AI-Koditex corpus and Czech multidimensional model.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This study investigates the register variation in texts written by humans and comparable texts produced by large language models (LLMs). Biber's multidimensional analysis (MDA) is applied to a sample of human-written texts and AI-created texts generated to be their counterparts to find the dimensions of variation in which LLMs differ most significantly and most systematically from humans. As textual material, a new LLM-generated corpus AI-Brown is used, which is comparable to BE-21 (a Brown family corpus representing contemporary British English). Since all languages except English are underrepresented in the training data of frontier LLMs, similar analysis is replicated on Czech using AI-Koditex corpus and Czech multidimensional model. Examined were 16 frontier models in various settings and prompts, with emphasis placed on the difference between base models and instruction-tuned models. Based on this, a benchmark is created through which models can be compared with each other and ranked in interpretable dimensions.

Related papers

RepreGuard: Detecting LLM-Generated Text by Revealing Hidden Representation Patterns [50.401907401444404]
Large language models (LLMs) are crucial for preventing misuse and building trustworthy AI systems.<n>We propose RepreGuard, an efficient statistics-based detection method.<n> Experimental results show that RepreGuard outperforms all baselines with average 94.92% AUROC on both in-distribution (ID) and OOD scenarios.
arXiv Detail & Related papers (2025-08-18T17:59:15Z)
Linguistic and Embedding-Based Profiling of Texts generated by Humans and Large Language Models [0.0]
We calculate different linguistic features such as dependency length and emotionality for characterizing human-written and machine-generated texts.<n>Our statistical analysis reveals that human-written texts tend to exhibit simpler syntactic structures and more diverse semantic content.<n>Both human and machine texts show stylistic diversity across domains, with humans displaying greater variation in our features.
arXiv Detail & Related papers (2025-07-18T02:46:55Z)
Style Extraction on Text Embeddings Using VAE and Parallel Dataset [1.8067835669244101]
The study aims to detect and analyze stylistic variations between translations using a Variational Autoencoder (VAE) model.<n>The results demonstrate that each translation exhibits a unique stylistic distribution, which can be effectively identified using the VAE model.<n>The study highlights the model's potential for broader applications in AI-based text generation and stylistic analysis.
arXiv Detail & Related papers (2025-02-12T00:24:28Z)
Examining the Robustness of Large Language Models across Language Complexity [19.184633713069353]
Large language models (LLMs) analyze textual artifacts generated by students to understand and evaluate their learning.<n>This study examines the robustness of several LLM-based student models that detect student self-regulated learning (SRL) in math problem-solving.
arXiv Detail & Related papers (2025-01-30T20:33:59Z)
Human Variability vs. Machine Consistency: A Linguistic Analysis of Texts Generated by Humans and Large Language Models [0.0]
We identify significant differences between human-written texts and those generated by large language models (LLMs)<n>Our findings indicate that humans write texts that are less cognitively demanding, with higher semantic content, and richer emotional content compared to texts generated by LLMs.
arXiv Detail & Related papers (2024-12-04T04:38:35Z)
BLESS: Benchmarking Large Language Models on Sentence Simplification [55.461555829492866]
We present BLESS, a performance benchmark of the most recent state-of-the-art large language models (LLMs) on the task of text simplification (TS) We assess a total of 44 models, differing in size, architecture, pre-training methods, and accessibility, on three test sets from different domains (Wikipedia, news, and medical) under a few-shot setting. Our evaluation indicates that the best LLMs, despite not being trained on TS, perform comparably with state-of-the-art TS baselines.
arXiv Detail & Related papers (2023-10-24T12:18:17Z)
The Imitation Game: Detecting Human and AI-Generated Texts in the Era of ChatGPT and BARD [3.2228025627337864]
We introduce a novel dataset of human-written and AI-generated texts in different genres. We employ several machine learning models to classify the texts. Results demonstrate the efficacy of these models in discerning between human and AI-generated text.
arXiv Detail & Related papers (2023-07-22T21:00:14Z)
MAGE: Machine-generated Text Detection in the Wild [82.70561073277801]
Large language models (LLMs) have achieved human-level text generation, emphasizing the need for effective AI-generated text detection. We build a comprehensive testbed by gathering texts from diverse human writings and texts generated by different LLMs. Despite challenges, the top-performing detector can identify 86.54% out-of-domain texts generated by a new LLM, indicating the feasibility for application scenarios.
arXiv Detail & Related papers (2023-05-22T17:13:29Z)
How much do language models copy from their training data? Evaluating linguistic novelty in text generation using RAVEN [63.79300884115027]
Current language models can generate high-quality text. Are they simply copying text they have seen before, or have they learned generalizable linguistic abstractions? We introduce RAVEN, a suite of analyses for assessing the novelty of generated text.
arXiv Detail & Related papers (2021-11-18T04:07:09Z)
Sentiment analysis in tweets: an assessment study from classical to modern text representation models [59.107260266206445]
Short texts published on Twitter have earned significant attention as a rich source of information. Their inherent characteristics, such as the informal, and noisy linguistic style, remain challenging to many natural language processing (NLP) tasks. This study fulfils an assessment of existing language models in distinguishing the sentiment expressed in tweets by using a rich collection of 22 datasets.
arXiv Detail & Related papers (2021-05-29T21:05:28Z)
A Comparative Study of Lexical Substitution Approaches based on Neural Language Models [117.96628873753123]
We present a large-scale comparative study of popular neural language and masked language models. We show that already competitive results achieved by SOTA LMs/MLMs can be further improved if information about the target word is injected properly.
arXiv Detail & Related papers (2020-05-29T18:43:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.