Language Models Represent Beliefs of Self and Others
- URL: http://arxiv.org/abs/2402.18496v3
- Date: Thu, 30 May 2024 12:43:01 GMT
- Title: Language Models Represent Beliefs of Self and Others
- Authors: Wentao Zhu, Zhining Zhang, Yizhou Wang,
- Abstract summary: We show that it is possible to linearly decode the belief status from the perspectives of various agents through neural activations of language models.
We observe dramatic changes in the models' ToM performance, underscoring their pivotal role in the social reasoning process.
- Score: 14.630775330165529
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Understanding and attributing mental states, known as Theory of Mind (ToM), emerges as a fundamental capability for human social reasoning. While Large Language Models (LLMs) appear to possess certain ToM abilities, the mechanisms underlying these capabilities remain elusive. In this study, we discover that it is possible to linearly decode the belief status from the perspectives of various agents through neural activations of language models, indicating the existence of internal representations of self and others' beliefs. By manipulating these representations, we observe dramatic changes in the models' ToM performance, underscoring their pivotal role in the social reasoning process. Additionally, our findings extend to diverse social reasoning tasks that involve different causal inference patterns, suggesting the potential generalizability of these representations.
Related papers
- Bring Reason to Vision: Understanding Perception and Reasoning through Model Merging [32.70038648928894]
Vision-Language Models (VLMs) combine visual perception with the general capabilities, such as reasoning, of Large Language Models (LLMs)<n>In this work, we explore to compose perception and reasoning through model merging that connects parameters of different models.<n>We find that perception capabilities are predominantly encoded in the early layers of the model, whereas reasoning is largely facilitated by the middle-to-late layers.
arXiv Detail & Related papers (2025-05-08T17:56:23Z) - Human-like conceptual representations emerge from language prediction [72.5875173689788]
Large language models (LLMs) trained exclusively through next-token prediction over language data exhibit remarkably human-like behaviors.
Are these models developing concepts akin to humans, and if so, how are such concepts represented and organized?
Our results demonstrate that LLMs can flexibly derive concepts from linguistic descriptions in relation to contextual cues about other concepts.
These findings establish that structured, human-like conceptual representations can naturally emerge from language prediction without real-world grounding.
arXiv Detail & Related papers (2025-01-21T23:54:17Z) - Emergence of human-like polarization among large language model agents [61.622596148368906]
We simulate a networked system involving thousands of large language model agents, discovering their social interactions, result in human-like polarization.
Similarities between humans and LLM agents raise concerns about their capacity to amplify societal polarization, but also hold the potential to serve as a valuable testbed for identifying plausible strategies to mitigate it.
arXiv Detail & Related papers (2025-01-09T11:45:05Z) - Exploring the Personality Traits of LLMs through Latent Features Steering [12.142248881876355]
We investigate how factors, such as cultural norms and environmental stressors, encoded within large language models (LLMs) shape their personality traits.
We propose a training-free approach to modify the model's behavior by extracting and steering latent features corresponding to factors within the model.
arXiv Detail & Related papers (2024-10-07T21:02:34Z) - Perceptions to Beliefs: Exploring Precursory Inferences for Theory of Mind in Large Language Models [51.91448005607405]
We evaluate key human ToM precursors by annotating characters' perceptions on ToMi and FANToM.
We present PercepToM, a novel ToM method leveraging LLMs' strong perception inference capability while supplementing their limited perception-to-belief inference.
arXiv Detail & Related papers (2024-07-08T14:58:29Z) - Benchmarking Mental State Representations in Language Models [9.318796743761224]
Research into the models' internal representation of mental states remains limited.
Recent work has used probing to demonstrate that LMs can represent beliefs of themselves and others.
We report an extensive benchmark with various LM types with different model sizes.
We are the first to study how prompt variations impact probing performance on theory of mind tasks.
arXiv Detail & Related papers (2024-06-25T12:51:06Z) - LLMs as Models for Analogical Reasoning [14.412456982731467]
Analogical reasoning is fundamental to human cognition and learning.
Recent studies have shown that large language models can sometimes match humans in analogical reasoning tasks.
arXiv Detail & Related papers (2024-06-19T20:07:37Z) - Learning World Models With Hierarchical Temporal Abstractions: A Probabilistic Perspective [2.61072980439312]
Devising formalisms to develop internal world models is a critical research challenge in the domains of artificial intelligence and machine learning.
This thesis identifies several limitations with the prevalent use of state space models as internal world models.
The structure of models in formalisms facilitates exact probabilistic inference using belief propagation, as well as end-to-end learning via backpropagation through time.
These formalisms integrate the concept of uncertainty in world states, thus improving the system's capacity to emulate the nature of the real world and quantify the confidence in its predictions.
arXiv Detail & Related papers (2024-04-24T12:41:04Z) - PHAnToM: Persona-based Prompting Has An Effect on Theory-of-Mind Reasoning in Large Language Models [25.657579792829743]
We empirically evaluate how role-playing prompting influences Theory-of-Mind (ToM) reasoning capabilities.
We propose the mechanism that, beyond the inherent variance in the complexity of reasoning tasks, performance differences arise because of socially-motivated prompting differences.
arXiv Detail & Related papers (2024-03-04T17:34:34Z) - Large language models as linguistic simulators and cognitive models in human research [0.0]
The rise of large language models (LLMs) that generate human-like text has sparked debates over their potential to replace human participants in behavioral and cognitive research.
We critically evaluate this replacement perspective to appraise the fundamental utility of language models in psychology and social science.
This perspective reframes the role of language models in behavioral and cognitive science, serving as linguistic simulators and cognitive models that shed light on the similarities and differences between machine intelligence and human cognition and thoughts.
arXiv Detail & Related papers (2024-02-06T23:28:23Z) - Visual cognition in multimodal large language models [12.603212933816206]
Recent advancements have rekindled interest in the potential to emulate human-like cognitive abilities.
This paper evaluates the current state of vision-based large language models in the domains of intuitive physics, causal reasoning, and intuitive psychology.
arXiv Detail & Related papers (2023-11-27T18:58:34Z) - Think Twice: Perspective-Taking Improves Large Language Models'
Theory-of-Mind Capabilities [63.90227161974381]
SimToM is a novel prompting framework inspired by Simulation Theory's notion of perspective-taking.
Our approach, which requires no additional training and minimal prompt-tuning, shows substantial improvement over existing methods.
arXiv Detail & Related papers (2023-11-16T22:49:27Z) - Minding Language Models' (Lack of) Theory of Mind: A Plug-and-Play
Multi-Character Belief Tracker [72.09076317574238]
ToM is a plug-and-play approach to investigate the belief states of characters in reading comprehension.
We show that ToM enhances off-the-shelf neural network theory mind in a zero-order setting while showing robust out-of-distribution performance compared to supervised baselines.
arXiv Detail & Related papers (2023-06-01T17:24:35Z) - Machine Psychology [54.287802134327485]
We argue that a fruitful direction for research is engaging large language models in behavioral experiments inspired by psychology.
We highlight theoretical perspectives, experimental paradigms, and computational analysis techniques that this approach brings to the table.
It paves the way for a "machine psychology" for generative artificial intelligence (AI) that goes beyond performance benchmarks.
arXiv Detail & Related papers (2023-03-24T13:24:41Z) - Learning Theory of Mind via Dynamic Traits Attribution [59.9781556714202]
We propose a new neural ToM architecture that learns to generate a latent trait vector of an actor from the past trajectories.
This trait vector then multiplicatively modulates the prediction mechanism via a fast weights' scheme in the prediction neural network.
We empirically show that the fast weights provide a good inductive bias to model the character traits of agents and hence improves mindreading ability.
arXiv Detail & Related papers (2022-04-17T11:21:18Z) - Properties from Mechanisms: An Equivariance Perspective on Identifiable
Representation Learning [79.4957965474334]
Key goal of unsupervised representation learning is "inverting" a data generating process to recover its latent properties.
This paper asks, "Can we instead identify latent properties by leveraging knowledge of the mechanisms that govern their evolution?"
We provide a complete characterization of the sources of non-identifiability as we vary knowledge about a set of possible mechanisms.
arXiv Detail & Related papers (2021-10-29T14:04:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.