Do large language models resemble humans in language use?
- URL: http://arxiv.org/abs/2303.08014v2
- Date: Tue, 26 Mar 2024 01:46:50 GMT
- Title: Do large language models resemble humans in language use?
- Authors: Zhenguang G. Cai, Xufeng Duan, David A. Haslett, Shuqi Wang, Martin J. Pickering,
- Abstract summary: Large language models (LLMs) such as ChatGPT and Vicuna have shown remarkable capacities in comprehending and producing language.
We subjected ChatGPT and Vicuna to 12 experiments ranging from sounds to dialogue, preregistered and with 1000 runs (i.e., iterations) per experiment.
ChatGPT and Vicuna replicated the human pattern of language use in 10 and 7 out of the 12 experiments, respectively.
- Score: 1.8524806794216748
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large language models (LLMs) such as ChatGPT and Vicuna have shown remarkable capacities in comprehending and producing language. However, their internal workings remain a black box, and it is unclear whether LLMs and chatbots can develop humanlike characteristics in language use. Cognitive scientists have devised many experiments that probe, and have made great progress in explaining, how people comprehend and produce language. We subjected ChatGPT and Vicuna to 12 of these experiments ranging from sounds to dialogue, preregistered and with 1000 runs (i.e., iterations) per experiment. ChatGPT and Vicuna replicated the human pattern of language use in 10 and 7 out of the 12 experiments, respectively. The models associated unfamiliar words with different meanings depending on their forms, continued to access recently encountered meanings of ambiguous words, reused recent sentence structures, attributed causality as a function of verb semantics, and accessed different meanings and retrieved different words depending on an interlocutor's identity. In addition, ChatGPT, but not Vicuna, nonliterally interpreted implausible sentences that were likely to have been corrupted by noise, drew reasonable inferences, and overlooked semantic fallacies in a sentence. Finally, unlike humans, neither model preferred using shorter words to convey less informative content, nor did they use context to resolve syntactic ambiguities. We discuss how these convergences and divergences may result from the transformer architecture. Overall, these experiments demonstrate that LLMs such as ChatGPT (and Vicuna to a lesser extent) are humanlike in many aspects of human language processing.
Related papers
- Language in Vivo vs. in Silico: Size Matters but Larger Language Models Still Do Not Comprehend Language on a Par with Humans [1.8434042562191815]
This work investigates the role of model scaling in determining whether differences between humans and models are amenable to model size.
We test three Large Language Models (LLMs) on a grammaticality judgment task featuring anaphora, center embedding, comparatives, and negative polarity.
We find that humans are overall less accurate than ChatGPT-4 (76% vs. 80% accuracy, respectively), but that this is due to ChatGPT-4 outperforming humans only in one task condition, namely on grammatical sentences.
arXiv Detail & Related papers (2024-04-23T10:09:46Z) - UNcommonsense Reasoning: Abductive Reasoning about Uncommon Situations [62.71847873326847]
We investigate the ability to model unusual, unexpected, and unlikely situations.
Given a piece of context with an unexpected outcome, this task requires reasoning abductively to generate an explanation.
We release a new English language corpus called UNcommonsense.
arXiv Detail & Related papers (2023-11-14T19:00:55Z) - Syntax and Semantics Meet in the "Middle": Probing the Syntax-Semantics
Interface of LMs Through Agentivity [68.8204255655161]
We present the semantic notion of agentivity as a case study for probing such interactions.
This suggests LMs may potentially serve as more useful tools for linguistic annotation, theory testing, and discovery.
arXiv Detail & Related papers (2023-05-29T16:24:01Z) - The language of sounds unheard: Exploring musical timbre semantics of
large language models [0.0]
Given the recent proliferation of large language models (LLMs), we asked whether such models exhibit an organisation of perceptual semantics similar to those observed in humans.
We elicited multiple responses in separate chats, analogous to having multiple human raters.
ChatGPT generated semantic profiles that only partially correlated with human ratings, yet showed robust agreement along well-known psychophysical dimensions of musical sounds.
arXiv Detail & Related papers (2023-04-16T16:50:25Z) - Collateral facilitation in humans and language models [0.6091702876917281]
We show that humans display a similar processing advantage for highly anomalous words.
We discuss the implications for our understanding of both human language comprehension and the predictions made by language models.
arXiv Detail & Related papers (2022-11-09T21:08:08Z) - Transparency Helps Reveal When Language Models Learn Meaning [71.96920839263457]
Our systematic experiments with synthetic data reveal that, with languages where all expressions have context-independent denotations, both autoregressive and masked language models learn to emulate semantic relations between expressions.
Turning to natural language, our experiments with a specific phenomenon -- referential opacity -- add to the growing body of evidence that current language models do not well-represent natural language semantics.
arXiv Detail & Related papers (2022-10-14T02:35:19Z) - Subject Verb Agreement Error Patterns in Meaningless Sentences: Humans
vs. BERT [64.40111510974957]
We test whether meaning interferes with subject-verb number agreement in English.
We generate semantically well-formed and nonsensical items.
We find that BERT and humans are both sensitive to our semantic manipulation.
arXiv Detail & Related papers (2022-09-21T17:57:23Z) - Do language models make human-like predictions about the coreferents of
Italian anaphoric zero pronouns? [0.6091702876917281]
We test whether 12 contemporary language models display expectations that reflect human behavior when exposed to sentences with zero pronouns.
We find that three models - XGLM 2.9B, 4.5B, and 7.5B - capture the human behavior from all the experiments.
This result suggests that human expectations about coreference can be derived from exposure to language, and also indicates features of language models that allow them to better reflect human behavior.
arXiv Detail & Related papers (2022-08-30T22:06:07Z) - PIGLeT: Language Grounding Through Neuro-Symbolic Interaction in a 3D
World [86.21137454228848]
We factorize PIGLeT into a physical dynamics model, and a separate language model.
PIGLeT can read a sentence, simulate neurally what might happen next, and then communicate that result through a literal symbolic representation.
It is able to correctly forecast "what happens next" given an English sentence over 80% of the time, outperforming a 100x larger, text-to-text approach by over 10%.
arXiv Detail & Related papers (2021-06-01T02:32:12Z) - Speakers Fill Lexical Semantic Gaps with Context [65.08205006886591]
We operationalise the lexical ambiguity of a word as the entropy of meanings it can take.
We find significant correlations between our estimate of ambiguity and the number of synonyms a word has in WordNet.
This suggests that, in the presence of ambiguity, speakers compensate by making contexts more informative.
arXiv Detail & Related papers (2020-10-05T17:19:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.