Constrained Reasoning Chains for Enhancing Theory-of-Mind in Large Language Models
- URL: http://arxiv.org/abs/2409.13490v1
- Date: Fri, 20 Sep 2024 13:27:11 GMT
- Title: Constrained Reasoning Chains for Enhancing Theory-of-Mind in Large Language Models
- Authors: Zizheng Lin, Chunkit Chan, Yangqiu Song, Xin Liu,
- Abstract summary: Theory-of-Mind (ToM) ability possessed by Large Language Models (LLMs) has been shown to be limited.
We propose Constrained Chain-of-ToM (CCoToM) that leverages domain knowledge and causal relations between ToM dimensions to address limitations.
CCoToM consistently outperforms previous state-of-the-art methods by large margins across all datasets used.
- Score: 39.81210971002642
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Theory-of-Mind (ToM) ability possessed by Large Language Models (LLMs) has been shown to be limited. Most existing methods for improving ToM in LLMs adopt zero-shot prompting, and they face challenges including poor performance in complex ToM reasoning tasks and an inability to handle non-narrative contexts. We propose a zero-shot prompting method named Constrained Chain-of-ToM (CCoToM) that leverages domain knowledge and the causal relations between ToM dimensions to address these limitations. Specifically, CCoToM guides LLMs to construct explicit reasoning chains by first prompting LLMs to infer related ToM dimensions (e.g., belief). Afterward, CCoToM prompts LLMs to infer the queried ToM dimension based on the generated related ToM dimensions and corresponding causal relations. Additionally, CCoToM adaptively imposes constraints on prompts to introduce inductive biases and improve consistency between ToM dimensions. Besides narratives, CCoToM can also handle non-narrative contexts like conversations. Extensive experiments show that CCoToM consistently outperforms previous state-of-the-art methods by large margins across all LLMs and datasets used. We also conduct in-depth analyses to gain deeper insights into CCoToM. We have made our code publicly available.
Related papers
- Understanding Chain-of-Thought in LLMs through Information Theory [16.78730663293352]
We formalize Chain-of-Thought (CoT) reasoning in Large Language Models (LLMs) through an information-theoretic lens.
Specifically, our framework quantifies the information gain' at each reasoning step, enabling the identification of failure modes.
We demonstrate the efficacy of our approach through extensive experiments on toy and GSM-8K data, where it significantly outperforms existing outcome-based methods.
arXiv Detail & Related papers (2024-11-18T19:14:36Z) - LLM Self-Correction with DeCRIM: Decompose, Critique, and Refine for Enhanced Following of Instructions with Multiple Constraints [86.59857711385833]
We introduce RealInstruct, the first benchmark designed to evaluate LLMs' ability to follow real-world multi-constrained instructions.
To address the performance gap between open-source and proprietary models, we propose the Decompose, Critique and Refine (DeCRIM) self-correction pipeline.
Our results show that DeCRIM improves Mistral's performance by 7.3% on RealInstruct and 8.0% on IFEval even with weak feedback.
arXiv Detail & Related papers (2024-10-09T01:25:10Z) - MindStar: Enhancing Math Reasoning in Pre-trained LLMs at Inference Time [51.5039731721706]
MindStar is a purely inference-based searching method for large language models.
It formulates reasoning tasks as searching problems and proposes two search ideas to identify the optimal reasoning paths.
It significantly enhances the reasoning abilities of open-source models, such as Llama-2-13B and Mistral-7B, and achieves comparable performance to GPT-3.5 and Grok-1.
arXiv Detail & Related papers (2024-05-25T15:07:33Z) - ToM-LM: Delegating Theory of Mind Reasoning to External Symbolic Executors in Large Language Models [5.455744338342196]
Theory of Mind (ToM) refers to the ability of individuals to attribute mental states to others.
Large Language Models (LLMs) have shown some promise with ToM ability, but they still struggle with complex ToM reasoning.
arXiv Detail & Related papers (2024-04-23T20:59:03Z) - Look Before You Decide: Prompting Active Deduction of MLLMs for Assumptive Reasoning [68.83624133567213]
We show that most prevalent MLLMs can be easily fooled by the introduction of a presupposition into the question.
We also propose a simple yet effective method, Active Deduction (AD), to encourage the model to actively perform composite deduction.
arXiv Detail & Related papers (2024-04-19T15:53:27Z) - Causal Prompting: Debiasing Large Language Model Prompting based on Front-Door Adjustment [32.12998469814097]
A novel causal prompting method based on front-door adjustment is proposed to effectively mitigate Large Language Models (LLMs) biases.
Experimental results show that the proposed causal prompting approach achieves excellent performance across seven natural language processing datasets.
arXiv Detail & Related papers (2024-03-05T07:47:34Z) - Can Large Language Model Summarizers Adapt to Diverse Scientific Communication Goals? [19.814974042343028]
We investigate the controllability of large language models (LLMs) on scientific summarization tasks.
We find that non-fine-tuned LLMs outperform humans in the MuP review generation task.
arXiv Detail & Related papers (2024-01-18T23:00:54Z) - Compositional Chain-of-Thought Prompting for Large Multimodal Models [46.721769077885966]
Compositional Chain-of-Thought (CCoT) is a novel zero-shot Chain-of-Thought prompting method.
We first generate an SG using the Large Language Model (LLM) and then use that SG in the prompt to produce a response.
We find that the proposed CCoT approach not only improves LMM performance but also improves the performance of several popular LMMs on general multimodal benchmarks.
arXiv Detail & Related papers (2023-11-27T22:23:27Z) - FollowBench: A Multi-level Fine-grained Constraints Following Benchmark for Large Language Models [79.62191017182518]
FollowBench is a benchmark for Fine-grained Constraints Following Benchmark for Large Language Models.
We introduce a Multi-level mechanism that incrementally adds a single constraint to the initial instruction at each increased level.
By evaluating 13 popular LLMs on FollowBench, we highlight the weaknesses of LLMs in instruction following and point towards potential avenues for future work.
arXiv Detail & Related papers (2023-10-31T12:32:38Z) - FANToM: A Benchmark for Stress-testing Machine Theory of Mind in
Interactions [94.61530480991627]
Theory of mind evaluations currently focus on testing models using passive narratives that inherently lack interactivity.
We introduce FANToM, a new benchmark designed to stress-test ToM within information-asymmetric conversational contexts via question answering.
arXiv Detail & Related papers (2023-10-24T00:24:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.