Benchmarking Mental State Representations in Language Models
- URL: http://arxiv.org/abs/2406.17513v2
- Date: Mon, 1 Jul 2024 06:48:34 GMT
- Title: Benchmarking Mental State Representations in Language Models
- Authors: Matteo Bortoletto, Constantin Ruhdorfer, Lei Shi, Andreas Bulling,
- Abstract summary: Research into the models' internal representation of mental states remains limited.
Recent work has used probing to demonstrate that LMs can represent beliefs of themselves and others.
We report an extensive benchmark with various LM types with different model sizes.
We are the first to study how prompt variations impact probing performance on theory of mind tasks.
- Score: 9.318796743761224
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: While numerous works have assessed the generative performance of language models (LMs) on tasks requiring Theory of Mind reasoning, research into the models' internal representation of mental states remains limited. Recent work has used probing to demonstrate that LMs can represent beliefs of themselves and others. However, these claims are accompanied by limited evaluation, making it difficult to assess how mental state representations are affected by model design and training choices. We report an extensive benchmark with various LM types with different model sizes, fine-tuning approaches, and prompt designs to study the robustness of mental state representations and memorisation issues within the probes. Our results show that the quality of models' internal representations of the beliefs of others increases with model size and, more crucially, with fine-tuning. We are the first to study how prompt variations impact probing performance on theory of mind tasks. We demonstrate that models' representations are sensitive to prompt variations, even when such variations should be beneficial. Finally, we complement previous activation editing experiments on Theory of Mind tasks and show that it is possible to improve models' reasoning performance by steering their activations without the need to train any probe.
Related papers
- ThinkPatterns-21k: A Systematic Study on the Impact of Thinking Patterns in LLMs [15.798087244817134]
We conduct a comprehensive analysis of the impact of various thinking types on model performance.
We introduce ThinkPatterns-21k, a curated dataset comprising 21k instruction-response pairs.
We have two key findings: (1) smaller models (30B parameters) can benefit from most of structured thinking patterns, while larger models (32B) with structured thinking like decomposition would degrade performance.
arXiv Detail & Related papers (2025-03-17T08:29:04Z) - Beyond Pattern Recognition: Probing Mental Representations of LMs [9.461066161954077]
Language Models (LMs) have demonstrated impressive capabilities in solving complex reasoning tasks.
We propose to delve deeper into the mental model of various LMs.
arXiv Detail & Related papers (2025-02-23T21:20:28Z) - Enhancing LLM Reasoning via Critique Models with Test-Time and Training-Time Supervision [120.40788744292739]
We propose a two-player paradigm that separates the roles of reasoning and critique models.
We first propose AutoMathCritique, an automated and scalable framework for collecting critique data.
We demonstrate that the critique models consistently improve the actor's performance on difficult queries at test-time.
arXiv Detail & Related papers (2024-11-25T17:11:54Z) - Explanatory Model Monitoring to Understand the Effects of Feature Shifts on Performance [61.06245197347139]
We propose a novel approach to explain the behavior of a black-box model under feature shifts.
We refer to our method that combines concepts from Optimal Transport and Shapley Values as Explanatory Performance Estimation.
arXiv Detail & Related papers (2024-08-24T18:28:19Z) - Estimating Knowledge in Large Language Models Without Generating a Single Token [12.913172023910203]
Current methods to evaluate knowledge in large language models (LLMs) query the model and then evaluate its generated responses.
In this work, we ask whether evaluation can be done before the model has generated any text.
Experiments with a variety of LLMs show that KEEN, a simple probe trained over internal subject representations, succeeds at both tasks.
arXiv Detail & Related papers (2024-06-18T14:45:50Z) - Understanding the Inner Workings of Language Models Through
Representation Dissimilarity [5.987278280211877]
representation dissimilarity measures are functions that measure the extent to which two model's internal representations differ.
Our results suggest that dissimilarity measures are a promising set of tools for shedding light on the inner workings of language models.
arXiv Detail & Related papers (2023-10-23T14:46:20Z) - A Comprehensive Evaluation and Analysis Study for Chinese Spelling Check [53.152011258252315]
We show that using phonetic and graphic information reasonably is effective for Chinese Spelling Check.
Models are sensitive to the error distribution of the test set, which reflects the shortcomings of models.
The commonly used benchmark, SIGHAN, can not reliably evaluate models' performance.
arXiv Detail & Related papers (2023-07-25T17:02:38Z) - Turning large language models into cognitive models [0.0]
We show that large language models can be turned into cognitive models.
These models offer accurate representations of human behavior, even outperforming traditional cognitive models in two decision-making domains.
Taken together, these results suggest that large, pre-trained models can be adapted to become generalist cognitive models.
arXiv Detail & Related papers (2023-06-06T18:00:01Z) - Task Formulation Matters When Learning Continually: A Case Study in
Visual Question Answering [58.82325933356066]
Continual learning aims to train a model incrementally on a sequence of tasks without forgetting previous knowledge.
We present a detailed study of how different settings affect performance for Visual Question Answering.
arXiv Detail & Related papers (2022-09-30T19:12:58Z) - Explain, Edit, and Understand: Rethinking User Study Design for
Evaluating Model Explanations [97.91630330328815]
We conduct a crowdsourcing study, where participants interact with deception detection models that have been trained to distinguish between genuine and fake hotel reviews.
We observe that for a linear bag-of-words model, participants with access to the feature coefficients during training are able to cause a larger reduction in model confidence in the testing phase when compared to the no-explanation control.
arXiv Detail & Related papers (2021-12-17T18:29:56Z) - Rethinking Generalization of Neural Models: A Named Entity Recognition
Case Study [81.11161697133095]
We take the NER task as a testbed to analyze the generalization behavior of existing models from different perspectives.
Experiments with in-depth analyses diagnose the bottleneck of existing neural NER models.
As a by-product of this paper, we have open-sourced a project that involves a comprehensive summary of recent NER papers.
arXiv Detail & Related papers (2020-01-12T04:33:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.