MindGames: Targeting Theory of Mind in Large Language Models with
Dynamic Epistemic Modal Logic
- URL: http://arxiv.org/abs/2305.03353v2
- Date: Tue, 7 Nov 2023 08:53:59 GMT
- Title: MindGames: Targeting Theory of Mind in Large Language Models with
Dynamic Epistemic Modal Logic
- Authors: Damien Sileo and Antoine Lernould
- Abstract summary: Theory of Mind (ToM) is a critical component of intelligence but its assessment remains the subject of heated debates.
Here, we leverage dynamic epistemic logic to isolate a particular component of ToM and to generate controlled problems.
Our findings indicate that some language model scaling does not consistently yield results better than random chance.
- Score: 0.6537995248511139
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Theory of Mind (ToM) is a critical component of intelligence but its
assessment remains the subject of heated debates. Prior research applied human
ToM assessments to natural language processing models using either
human-created standardized tests or rule-based templates. However, these
methods primarily focus on simplistic reasoning and require further validation.
Here, we leverage dynamic epistemic logic to isolate a particular component of
ToM and to generate controlled problems. We also introduce new verbalization
techniques to express these problems in English natural language. Our findings
indicate that some language model scaling (from 70M to 6B and 350M to 174B)
does not consistently yield results better than random chance. While GPT-4
demonstrates superior epistemic reasoning capabilities, there is still room for
improvement. Our code and datasets are publicly available
(https://huggingface.co/datasets/sileod/mindgames ,
https://github.com/sileod/llm-theory-of-mind )
Related papers
- LogicBench: Towards Systematic Evaluation of Logical Reasoning Ability of Large Language Models [52.03659714625452]
Recently developed large language models (LLMs) have been shown to perform remarkably well on a wide range of language understanding tasks.
But, can they really "reason" over the natural language?
This question has been receiving significant research attention and many reasoning skills such as commonsense, numerical, and qualitative have been studied.
arXiv Detail & Related papers (2024-04-23T21:08:49Z) - Views Are My Own, but Also Yours: Benchmarking Theory of Mind Using Common Ground [6.868969074841911]
We introduce the first ToM dataset based on naturally occurring spoken dialogs, Common-ToM, and show that LMs struggle to demonstrate ToM.
We then show that integrating a simple, explicit representation of beliefs improves LM performance on Common-ToM.
arXiv Detail & Related papers (2024-03-04T20:07:17Z) - How Do Humans Write Code? Large Models Do It the Same Way Too [14.954886191356342]
Program-of-Thought (PoT) replaces natural language-based Chain-of-Thought (CoT) as the most popular method in Large Language Models.
Using PoT introduces more reasoning errors, such as incorrect formulas or flawed logic, compared to CoT.
We propose Human-Think Language (HTL), which leverages a suite of strategies that help integrate PoT and CoT.
arXiv Detail & Related papers (2024-02-24T05:40:01Z) - MMToM-QA: Multimodal Theory of Mind Question Answering [80.87550820953236]
Theory of Mind (ToM) is an essential ingredient for developing machines with human-level social intelligence.
Recent machine learning models, particularly large language models, seem to show some aspects of ToM understanding.
Human ToM, on the other hand, is more than video or text understanding.
People can flexibly reason about another person's mind based on conceptual representations extracted from any available data.
arXiv Detail & Related papers (2024-01-16T18:59:24Z) - Minding Language Models' (Lack of) Theory of Mind: A Plug-and-Play
Multi-Character Belief Tracker [72.09076317574238]
ToM is a plug-and-play approach to investigate the belief states of characters in reading comprehension.
We show that ToM enhances off-the-shelf neural network theory mind in a zero-order setting while showing robust out-of-distribution performance compared to supervised baselines.
arXiv Detail & Related papers (2023-06-01T17:24:35Z) - ThoughtSource: A central hub for large language model reasoning data [13.185186859548326]
ThoughtSource is a meta-dataset and software library for chain-of-thought (CoT) reasoning.
The goal of ThoughtSource is to improve future artificial intelligence systems by facilitating qualitative understanding of CoTs.
arXiv Detail & Related papers (2023-01-27T08:45:53Z) - Discovering Latent Knowledge in Language Models Without Supervision [72.95136739040676]
Existing techniques for training language models can be misaligned with the truth.
We propose directly finding latent knowledge inside the internal activations of a language model in a purely unsupervised way.
We show that despite using no supervision and no model outputs, our method can recover diverse knowledge represented in large language models.
arXiv Detail & Related papers (2022-12-07T18:17:56Z) - Mind's Eye: Grounded Language Model Reasoning through Simulation [47.654525013443255]
We present Mind's Eye, a paradigm to ground language model reasoning in the physical world.
Experiments show Mind's Eye can improve reasoning ability by a large margin.
Smaller language models armed with Mind's Eye can obtain similar performance to models that are 100x larger.
arXiv Detail & Related papers (2022-10-11T11:39:23Z) - Towards Language Modelling in the Speech Domain Using Sub-word
Linguistic Units [56.52704348773307]
We propose a novel LSTM-based generative speech LM based on linguistic units including syllables and phonemes.
With a limited dataset, orders of magnitude smaller than that required by contemporary generative models, our model closely approximates babbling speech.
We show the effect of training with auxiliary text LMs, multitask learning objectives, and auxiliary articulatory features.
arXiv Detail & Related papers (2021-10-31T22:48:30Z) - TuringAdvice: A Generative and Dynamic Evaluation of Language Use [90.3029315711237]
We propose TuringAdvice, a new challenge task and dataset for language understanding models.
Given a written situation that a real person is currently facing, a model must generate helpful advice in natural language.
Empirical results show that today's models struggle at TuringAdvice.
arXiv Detail & Related papers (2020-04-07T18:00:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.