Related papers: Contrasting Linguistic Patterns in Human and LLM-Generated News Text

Contrasting Linguistic Patterns in Human and LLM-Generated News Text

URL: http://arxiv.org/abs/2308.09067v3
Date: Mon, 2 Sep 2024 07:26:46 GMT
Title: Contrasting Linguistic Patterns in Human and LLM-Generated News Text
Authors: Alberto Muñoz-Ortiz, Carlos Gómez-Rodríguez, David Vilares,
Abstract summary: We conduct a quantitative analysis contrasting human-written English news text with comparable large language model (LLM) output. The results reveal various measurable differences between human and AI-generated texts. Human texts exhibit more scattered sentence length distributions, more variety of vocabulary, a distinct use of dependency and constituent types. LLM outputs use more numbers, symbols and auxiliaries than human texts, as well as more pronouns.
Score: 20.127243508644984
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We conduct a quantitative analysis contrasting human-written English news text with comparable large language model (LLM) output from six different LLMs that cover three different families and four sizes in total. Our analysis spans several measurable linguistic dimensions, including morphological, syntactic, psychometric, and sociolinguistic aspects. The results reveal various measurable differences between human and AI-generated texts. Human texts exhibit more scattered sentence length distributions, more variety of vocabulary, a distinct use of dependency and constituent types, shorter constituents, and more optimized dependency distances. Humans tend to exhibit stronger negative emotions (such as fear and disgust) and less joy compared to text generated by LLMs, with the toxicity of these models increasing as their size grows. LLM outputs use more numbers, symbols and auxiliaries (suggesting objective language) than human texts, as well as more pronouns. The sexist bias prevalent in human text is also expressed by LLMs, and even magnified in all of them but one. Differences between LLMs and humans are larger than between LLMs.

Related papers

Do LLMs produce texts with "human-like" lexical diversity? [0.0]
This study investigates patterns of lexical diversity in LLM-generated texts from four ChatGPT models.<n>Six dimensions of lexical diversity were measured in each text: volume, abundance, variety-repetition, evenness, disparity, and dispersion.<n>Results indicate that LLMs do not produce human-like texts in relation to lexical diversity, and the newer LLMs produce less human-like texts than older models.
arXiv Detail & Related papers (2025-07-31T18:22:11Z)
Linguistic and Embedding-Based Profiling of Texts generated by Humans and Large Language Models [0.0]
We calculate different linguistic features such as dependency length and emotionality for characterizing human-written and machine-generated texts.<n>Our statistical analysis reveals that human-written texts tend to exhibit simpler syntactic structures and more diverse semantic content.<n>Both human and machine texts show stylistic diversity across domains, with humans displaying greater variation in our features.
arXiv Detail & Related papers (2025-07-18T02:46:55Z)
XToM: Exploring the Multilingual Theory of Mind for Large Language Models [57.9821865189077]
Existing evaluations of Theory of Mind in LLMs are largely limited to English.<n>We present XToM, a rigorously validated multilingual benchmark that evaluates ToM across five languages.<n>Our findings expose limitations in LLMs' ability to replicate human-like mentalizing across linguistic contexts.
arXiv Detail & Related papers (2025-06-03T05:23:25Z)
Disparities in LLM Reasoning Accuracy and Explanations: A Case Study on African American English [66.97110551643722]
We investigate dialectal disparities in Large Language Models (LLMs) reasoning tasks. We find that LLMs produce less accurate responses and simpler reasoning chains and explanations for AAE inputs. These findings highlight systematic differences in how LLMs process and reason about different language varieties.
arXiv Detail & Related papers (2025-03-06T05:15:34Z)
How Deep is Love in LLMs' Hearts? Exploring Semantic Size in Human-like Cognition [75.11808682808065]
This study investigates whether large language models (LLMs) exhibit similar tendencies in understanding semantic size. Our findings reveal that multi-modal training is crucial for LLMs to achieve more human-like understanding. Lastly, we examine whether LLMs are influenced by attention-grabbing headlines with larger semantic sizes in a real-world web shopping scenario.
arXiv Detail & Related papers (2025-03-01T03:35:56Z)
When the LM misunderstood the human chuckled: Analyzing garden path effects in humans and language models [41.929897900569905]
Modern Large Language Models (LLMs) have shown human-like abilities in many language tasks. We compare the two on a sentence comprehension task using garden-path constructions. Our findings reveal that both LLMs and humans struggle with specific syntactic complexities.
arXiv Detail & Related papers (2025-02-13T13:19:33Z)
Human Variability vs. Machine Consistency: A Linguistic Analysis of Texts Generated by Humans and Large Language Models [0.0]
We identify significant differences between human-written texts and those generated by large language models (LLMs) Our findings indicate that humans write texts that are less cognitively demanding, with higher semantic content, and richer emotional content compared to texts generated by LLMs.
arXiv Detail & Related papers (2024-12-04T04:38:35Z)
Dialectal Toxicity Detection: Evaluating LLM-as-a-Judge Consistency Across Language Varieties [23.777874316083984]
There has been little systematic study on how dialectal differences affect toxicity detection by modern LLMs. We create a multi-dialect dataset through synthetic transformations and human-assisted translations, covering 10 language clusters and 60 varieties. We then evaluated three LLMs on their ability to assess toxicity across multilingual, dialectal, and LLM-human consistency.
arXiv Detail & Related papers (2024-11-17T03:53:24Z)
Large Language Models Reflect the Ideology of their Creators [73.25935570218375]
Large language models (LLMs) are trained on vast amounts of data to generate natural language. We uncover notable diversity in the ideological stance exhibited across different LLMs and languages.
arXiv Detail & Related papers (2024-10-24T04:02:30Z)
LMLPA: Language Model Linguistic Personality Assessment [11.599282127259736]
Large Language Models (LLMs) are increasingly used in everyday life and research. measuring the personality of a given LLM is currently a challenge. This paper introduces the Language Model Linguistic Personality Assessment (LMLPA), a system designed to evaluate the linguistic personalities of LLMs.
arXiv Detail & Related papers (2024-10-23T07:48:51Z)
Do LLMs write like humans? Variation in grammatical and rhetorical styles [0.7852714805965528]
We study the rhetorical styles of large language models (LLMs) Using Douglas Biber's set of lexical, grammatical, and rhetorical features, we identify systematic differences between LLMs and humans. This demonstrates that despite their advanced abilities, LLMs struggle to match human styles.
arXiv Detail & Related papers (2024-10-21T15:35:44Z)
Hate Personified: Investigating the role of LLMs in content moderation [64.26243779985393]
For subjective tasks such as hate detection, where people perceive hate differently, the Large Language Model's (LLM) ability to represent diverse groups is unclear. By including additional context in prompts, we analyze LLM's sensitivity to geographical priming, persona attributes, and numerical information to assess how well the needs of various groups are reflected.
arXiv Detail & Related papers (2024-10-03T16:43:17Z)
Language Model Alignment in Multilingual Trolley Problems [138.5684081822807]
Building on the Moral Machine experiment, we develop a cross-lingual corpus of moral dilemma vignettes in over 100 languages called MultiTP. Our analysis explores the alignment of 19 different LLMs with human judgments, capturing preferences across six moral dimensions. We discover significant variance in alignment across languages, challenging the assumption of uniform moral reasoning in AI systems.
arXiv Detail & Related papers (2024-07-02T14:02:53Z)
White Men Lead, Black Women Help? Benchmarking Language Agency Social Biases in LLMs [58.27353205269664]
Social biases can manifest in language agency. We introduce the novel Language Agency Bias Evaluation benchmark. We unveil language agency social biases in 3 recent Large Language Model (LLM)-generated content.
arXiv Detail & Related papers (2024-04-16T12:27:54Z)
Large Language Models are as persuasive as humans, but how? About the cognitive effort and moral-emotional language of LLM arguments [0.0]
Large Language Models (LLMs) are already as persuasive as humans. This paper investigates the persuasion strategies of LLMs, comparing them with human-generated arguments.
arXiv Detail & Related papers (2024-04-14T19:01:20Z)
High-Dimension Human Value Representation in Large Language Models [60.33033114185092]
We propose UniVaR, a high-dimensional representation of human value distributions in Large Language Models (LLMs) We show that UniVaR is a powerful tool to compare the distribution of human values embedded in different LLMs with different langauge sources.
arXiv Detail & Related papers (2024-04-11T16:39:00Z)
Scaling Data Diversity for Fine-Tuning Language Models in Human Alignment [84.32768080422349]
Alignment with human preference prevents large language models from generating misleading or toxic content. We propose a new formulation of prompt diversity, implying a linear correlation with the final performance of LLMs after fine-tuning.
arXiv Detail & Related papers (2024-03-17T07:08:55Z)
Queer People are People First: Deconstructing Sexual Identity Stereotypes in Large Language Models [3.974379576408554]
Large Language Models (LLMs) are trained primarily on minimally processed web text. LLMs can inadvertently perpetuate stereotypes towards marginalized groups, like the LGBTQIA+ community.
arXiv Detail & Related papers (2023-06-30T19:39:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.