Do LLMs produce texts with "human-like" lexical diversity?
- URL: http://arxiv.org/abs/2508.00086v1
- Date: Thu, 31 Jul 2025 18:22:11 GMT
- Title: Do LLMs produce texts with "human-like" lexical diversity?
- Authors: Kelly Kendro, Jeffrey Maloney, Scott Jarvis,
- Abstract summary: This study investigates patterns of lexical diversity in LLM-generated texts from four ChatGPT models.<n>Six dimensions of lexical diversity were measured in each text: volume, abundance, variety-repetition, evenness, disparity, and dispersion.<n>Results indicate that LLMs do not produce human-like texts in relation to lexical diversity, and the newer LLMs produce less human-like texts than older models.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The degree to which LLMs produce writing that is truly human-like remains unclear despite the extensive empirical attention that this question has received. The present study addresses this question from the perspective of lexical diversity. Specifically, the study investigates patterns of lexical diversity in LLM-generated texts from four ChatGPT models (-3.5, -4, -o4 mini, and -4.5) in comparison with texts written by L1 and L2 English participants (n = 240) across four education levels. Six dimensions of lexical diversity were measured in each text: volume, abundance, variety-repetition, evenness, disparity, and dispersion. Results from one-way MANOVAs, one-way ANOVAS, and Support Vector Machines revealed that the LLM-generated texts differed significantly from human-written texts for each variable, with ChatGPT-o4 mini and -4.5 differing the most. Within these two groups, ChatGPT-4.5 demonstrated higher levels of lexical diversity despite producing fewer tokens. The human writers' lexical diversity did not differ across subgroups (i.e., education, language status). Altogether, the results indicate that LLMs do not produce human-like texts in relation to lexical diversity, and the newer LLMs produce less human-like texts than older models. We discuss the implications of these results for language pedagogy and related applications.
Related papers
- A Comprehensive Analysis of Large Language Model Outputs: Similarity, Diversity, and Bias [1.7109513360384465]
Large Language Models (LLMs) represent a major step toward artificial general intelligence.<n>Questions remain about their output similarity, variability, and ethical implications.
arXiv Detail & Related papers (2025-05-14T01:21:46Z) - PolyMath: Evaluating Mathematical Reasoning in Multilingual Contexts [79.84059473102778]
PolyMath is a multilingual mathematical reasoning benchmark covering 18 languages and 4 easy-to-hard difficulty levels.<n>Our benchmark ensures difficulty comprehensiveness, language diversity, and high-quality translation.
arXiv Detail & Related papers (2025-04-25T15:39:04Z) - Disparities in LLM Reasoning Accuracy and Explanations: A Case Study on African American English [66.97110551643722]
We investigate dialectal disparities in Large Language Models (LLMs) reasoning tasks.<n>We find that LLMs produce less accurate responses and simpler reasoning chains and explanations for AAE inputs.<n>These findings highlight systematic differences in how LLMs process and reason about different language varieties.
arXiv Detail & Related papers (2025-03-06T05:15:34Z) - Uncovering inequalities in new knowledge learning by large language models across different languages [66.687369838071]
We show that low-resource languages consistently face disadvantages across all four dimensions.<n>We aim to raise awareness of linguistic inequalities in LLMs' new knowledge learning, fostering the development of more inclusive and equitable future LLMs.
arXiv Detail & Related papers (2025-03-06T03:41:47Z) - Human Variability vs. Machine Consistency: A Linguistic Analysis of Texts Generated by Humans and Large Language Models [0.0]
We identify significant differences between human-written texts and those generated by large language models (LLMs)<n>Our findings indicate that humans write texts that are less cognitively demanding, with higher semantic content, and richer emotional content compared to texts generated by LLMs.
arXiv Detail & Related papers (2024-12-04T04:38:35Z) - Large Language Models Reflect the Ideology of their Creators [71.65505524599888]
Large language models (LLMs) are trained on vast amounts of data to generate natural language.<n>This paper shows that the ideological stance of an LLM appears to reflect the worldview of its creators.
arXiv Detail & Related papers (2024-10-24T04:02:30Z) - Do LLMs write like humans? Variation in grammatical and rhetorical styles [0.7852714805965528]
We study the rhetorical styles of large language models (LLMs)
Using Douglas Biber's set of lexical, grammatical, and rhetorical features, we identify systematic differences between LLMs and humans.
This demonstrates that despite their advanced abilities, LLMs struggle to match human styles.
arXiv Detail & Related papers (2024-10-21T15:35:44Z) - White Men Lead, Black Women Help? Benchmarking and Mitigating Language Agency Social Biases in LLMs [58.27353205269664]
Social biases can manifest in language agency in Large Language Model (LLM)-generated content.<n>We introduce the Language Agency Bias Evaluation benchmark, which comprehensively evaluates biases in LLMs.<n>Using LABE, we unveil language agency social biases in 3 recent LLMs: ChatGPT, Llama3, and Mistral.
arXiv Detail & Related papers (2024-04-16T12:27:54Z) - Whose LLM is it Anyway? Linguistic Comparison and LLM Attribution for
GPT-3.5, GPT-4 and Bard [3.419330841031544]
Large Language Models (LLMs) are capable of generating text that is similar to or surpasses human quality.
We compare the vocabulary, Part-Of-Speech (POS) distribution, dependency distribution, and sentiment of texts generated by three of the most popular LLMs to diverse inputs.
The results point to significant linguistic variations which, in turn, enable us to attribute a given text to its LLM origin with a favorable 88% accuracy.
arXiv Detail & Related papers (2024-02-22T13:25:17Z) - Contrasting Linguistic Patterns in Human and LLM-Generated News Text [20.127243508644984]
We conduct a quantitative analysis contrasting human-written English news text with comparable large language model (LLM) output.
The results reveal various measurable differences between human and AI-generated texts.
Human texts exhibit more scattered sentence length distributions, more variety of vocabulary, a distinct use of dependency and constituent types.
LLM outputs use more numbers, symbols and auxiliaries than human texts, as well as more pronouns.
arXiv Detail & Related papers (2023-08-17T15:54:38Z) - Probing Pretrained Language Models for Lexical Semantics [76.73599166020307]
We present a systematic empirical analysis across six typologically diverse languages and five different lexical tasks.
Our results indicate patterns and best practices that hold universally, but also point to prominent variations across languages and tasks.
arXiv Detail & Related papers (2020-10-12T14:24:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.