Event knowledge in large language models: the gap between the impossible
and the unlikely
- URL: http://arxiv.org/abs/2212.01488v4
- Date: Thu, 26 Oct 2023 13:27:31 GMT
- Title: Event knowledge in large language models: the gap between the impossible
and the unlikely
- Authors: Carina Kauf, Anna A. Ivanova, Giulia Rambelli, Emmanuele Chersoni,
Jingyuan Selena She, Zawad Chowdhury, Evelina Fedorenko, Alessandro Lenci
- Abstract summary: We show that pre-trained large language models (LLMs) possess substantial event knowledge.
They almost always assign higher likelihood to possible vs. impossible events.
However, they show less consistent preferences for likely vs. unlikely events.
- Score: 46.540380831486125
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Word co-occurrence patterns in language corpora contain a surprising amount
of conceptual knowledge. Large language models (LLMs), trained to predict words
in context, leverage these patterns to achieve impressive performance on
diverse semantic tasks requiring world knowledge. An important but understudied
question about LLMs' semantic abilities is whether they acquire generalized
knowledge of common events. Here, we test whether five pre-trained LLMs (from
2018's BERT to 2023's MPT) assign higher likelihood to plausible descriptions
of agent-patient interactions than to minimally different implausible versions
of the same event. Using three curated sets of minimal sentence pairs (total
n=1,215), we found that pre-trained LLMs possess substantial event knowledge,
outperforming other distributional language models. In particular, they almost
always assign higher likelihood to possible vs. impossible events (The teacher
bought the laptop vs. The laptop bought the teacher). However, LLMs show less
consistent preferences for likely vs. unlikely events (The nanny tutored the
boy vs. The boy tutored the nanny). In follow-up analyses, we show that (i) LLM
scores are driven by both plausibility and surface-level sentence features,
(ii) LLM scores generalize well across syntactic variants (active vs. passive
constructions) but less well across semantic variants (synonymous sentences),
(iii) some LLM errors mirror human judgment ambiguity, and (iv) sentence
plausibility serves as an organizing dimension in internal LLM representations.
Overall, our results show that important aspects of event knowledge naturally
emerge from distributional linguistic patterns, but also highlight a gap
between representations of possible/impossible and likely/unlikely events.
Related papers
- Delving into the Reversal Curse: How Far Can Large Language Models Generalize? [40.64539467276017]
A prime example is the recently debated "reversal curse", which surfaces when models, having been trained on the fact "A is B", struggle to generalize this knowledge to infer that "B is A"
In this paper, we examine the manifestation of the reversal curse across various tasks and delve into both the generalization abilities and the problem-solving mechanisms of LLMs.
arXiv Detail & Related papers (2024-10-24T14:55:09Z) - Structured Event Reasoning with Large Language Models [4.897267974042842]
Reasoning about real-life events is a unifying challenge in AI and NLP.
I show that end-to-end LLMs still systematically fail to reason about complex events.
I propose three general approaches to use LLMs in conjunction with a structured representation of events.
arXiv Detail & Related papers (2024-08-28T19:03:41Z) - Implicit Multimodal Alignment: On the Generalization of Frozen LLMs to Multimodal Inputs [63.29737699997859]
Large Language Models (LLMs) have demonstrated impressive performance on multimodal tasks, without any multimodal finetuning.
In this work, we expose frozen LLMs to image, video, audio and text inputs and analyse their internal representation.
arXiv Detail & Related papers (2024-05-26T21:31:59Z) - The Strong Pull of Prior Knowledge in Large Language Models and Its Impact on Emotion Recognition [74.04775677110179]
In-context Learning (ICL) has emerged as a powerful paradigm for performing natural language tasks with Large Language Models (LLM)
We show that LLMs have strong yet inconsistent priors in emotion recognition that ossify their predictions.
Our results suggest that caution is needed when using ICL with larger LLMs for affect-centered tasks outside their pre-training domain.
arXiv Detail & Related papers (2024-03-25T19:07:32Z) - FAC$^2$E: Better Understanding Large Language Model Capabilities by Dissociating Language and Cognition [56.76951887823882]
Large language models (LLMs) are primarily evaluated by overall performance on various text understanding and generation tasks.
We present FAC$2$E, a framework for Fine-grAined and Cognition-grounded LLMs' Capability Evaluation.
arXiv Detail & Related papers (2024-02-29T21:05:37Z) - Evaluating Gender Bias in Large Language Models via Chain-of-Thought
Prompting [87.30837365008931]
Large language models (LLMs) equipped with Chain-of-Thought (CoT) prompting are able to make accurate incremental predictions even on unscalable tasks.
This study examines the impact of LLMs' step-by-step predictions on gender bias in unscalable tasks.
arXiv Detail & Related papers (2024-01-28T06:50:10Z) - Are Large Language Models Temporally Grounded? [38.481606493496514]
We provide Large language models (LLMs) with textual narratives.
We probe them with respect to their common-sense knowledge of the structure and duration of events.
We evaluate state-of-the-art LLMs on three tasks reflecting these abilities.
arXiv Detail & Related papers (2023-11-14T18:57:15Z) - Simple Linguistic Inferences of Large Language Models (LLMs): Blind Spots and Blinds [59.71218039095155]
We evaluate language understanding capacities on simple inference tasks that most humans find trivial.
We target (i) grammatically-specified entailments, (ii) premises with evidential adverbs of uncertainty, and (iii) monotonicity entailments.
The models exhibit moderate to low performance on these evaluation sets.
arXiv Detail & Related papers (2023-05-24T06:41:09Z) - Can Large Language Models Capture Dissenting Human Voices? [7.668954669688971]
Large language models (LLMs) have shown impressive achievements in solving a broad range of tasks.
We evaluate the performance and alignment of LLM distribution with humans using two different techniques.
We show LLMs exhibit limited ability in solving NLI tasks and simultaneously fail to capture human disagreement distribution.
arXiv Detail & Related papers (2023-05-23T07:55:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.