Do LLMs write like humans? Variation in grammatical and rhetorical styles
- URL: http://arxiv.org/abs/2410.16107v1
- Date: Mon, 21 Oct 2024 15:35:44 GMT
- Title: Do LLMs write like humans? Variation in grammatical and rhetorical styles
- Authors: Alex Reinhart, David West Brown, Ben Markey, Michael Laudenbach, Kachatad Pantusen, Ronald Yurko, Gordon Weinberg,
- Abstract summary: We study the rhetorical styles of large language models (LLMs)
Using Douglas Biber's set of lexical, grammatical, and rhetorical features, we identify systematic differences between LLMs and humans.
This demonstrates that despite their advanced abilities, LLMs struggle to match human styles.
- Score: 0.7852714805965528
- License:
- Abstract: Large language models (LLMs) are capable of writing grammatical text that follows instructions, answers questions, and solves problems. As they have advanced, it has become difficult to distinguish their output from human-written text. While past research has found some differences in surface features such as word choice and punctuation, and developed classifiers to detect LLM output, none has studied the rhetorical styles of LLMs. Using several variants of Llama 3 and GPT-4o, we construct two parallel corpora of human- and LLM-written texts from common prompts. Using Douglas Biber's set of lexical, grammatical, and rhetorical features, we identify systematic differences between LLMs and humans and between different LLMs. These differences persist when moving from smaller models to larger ones, and are larger for instruction-tuned models than base models. This demonstrates that despite their advanced abilities, LLMs struggle to match human styles, and hence more advanced linguistic features can detect patterns in their behavior not previously recognized.
Related papers
- Large Language Models Reflect the Ideology of their Creators [73.25935570218375]
Large language models (LLMs) are trained on vast amounts of data to generate natural language.
We uncover notable diversity in the ideological stance exhibited across different LLMs and languages.
arXiv Detail & Related papers (2024-10-24T04:02:30Z) - Reverse Modeling in Large Language Models [39.18082267850437]
Humans are accustomed to reading and writing in a forward manner.
This paper investigates whether auto-regressive large language models (LLMs) struggle with reverse modeling.
arXiv Detail & Related papers (2024-10-13T12:24:03Z) - CUTE: Measuring LLMs' Understanding of Their Tokens [54.70665106141121]
Large Language Models (LLMs) show remarkable performance on a wide variety of tasks.
This raises the question: To what extent can LLMs learn orthographic information?
We propose a new benchmark, which features a collection of tasks designed to test the orthographic knowledge of LLMs.
arXiv Detail & Related papers (2024-09-23T18:27:03Z) - LLMs' Understanding of Natural Language Revealed [0.0]
Large language models (LLMs) are the result of a massive experiment in bottom-up, data-driven reverse engineering of language at scale.
We will focus on testing LLMs for their language understanding capabilities, their supposed forte.
arXiv Detail & Related papers (2024-07-29T01:21:11Z) - Beware of Words: Evaluating the Lexical Diversity of Conversational LLMs using ChatGPT as Case Study [3.0059120458540383]
We consider the evaluation of the lexical richness of the text generated by conversational Large Language Models (LLMs) and how it depends on the model parameters.
The results show how lexical richness depends on the version of ChatGPT and some of its parameters, such as the presence penalty, or on the role assigned to the model.
arXiv Detail & Related papers (2024-02-11T13:41:17Z) - Large Language Models: A Survey [69.72787936480394]
Large Language Models (LLMs) have drawn a lot of attention due to their strong performance on a wide range of natural language tasks.
LLMs' ability of general-purpose language understanding and generation is acquired by training billions of model's parameters on massive amounts of text data.
arXiv Detail & Related papers (2024-02-09T05:37:09Z) - Language models align with human judgments on key grammatical constructions [24.187439110055404]
We re-evaluate large language models' (LLMs) performance using well-established practices.
We find that models achieve high accuracy overall, but also capture fine-grained variation in human linguistic judgments.
arXiv Detail & Related papers (2024-01-19T19:36:54Z) - How Proficient Are Large Language Models in Formal Languages? An In-Depth Insight for Knowledge Base Question Answering [52.86931192259096]
Knowledge Base Question Answering (KBQA) aims to answer natural language questions based on facts in knowledge bases.
Recent works leverage the capabilities of large language models (LLMs) for logical form generation to improve performance.
arXiv Detail & Related papers (2024-01-11T09:27:50Z) - AlignedCoT: Prompting Large Language Models via Native-Speaking Demonstrations [52.43593893122206]
Alignedcot is an in-context learning technique for invoking Large Language Models.
It achieves consistent and correct step-wise prompts in zero-shot scenarios.
We conduct experiments on mathematical reasoning and commonsense reasoning.
arXiv Detail & Related papers (2023-11-22T17:24:21Z) - Contrasting Linguistic Patterns in Human and LLM-Generated News Text [20.127243508644984]
We conduct a quantitative analysis contrasting human-written English news text with comparable large language model (LLM) output.
The results reveal various measurable differences between human and AI-generated texts.
Human texts exhibit more scattered sentence length distributions, more variety of vocabulary, a distinct use of dependency and constituent types.
LLM outputs use more numbers, symbols and auxiliaries than human texts, as well as more pronouns.
arXiv Detail & Related papers (2023-08-17T15:54:38Z) - In-Context Impersonation Reveals Large Language Models' Strengths and
Biases [56.61129643802483]
We ask LLMs to assume different personas before solving vision and language tasks.
We find that LLMs pretending to be children of different ages recover human-like developmental stages.
In a language-based reasoning task, we find that LLMs impersonating domain experts perform better than LLMs impersonating non-domain experts.
arXiv Detail & Related papers (2023-05-24T09:13:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.