Related papers: A Notion of Complexity for Theory of Mind via Discrete World Models

Related papers

Computational Thinking Reasoning in Large Language Models [69.28428524878885]
Computational Thinking Model (CTM) is a novel framework that incorporates computational thinking paradigms into large language models (LLMs)<n>Live code execution is seamlessly integrated into the reasoning process, allowing CTM to think by computing.<n>CTM outperforms conventional reasoning models and tool-augmented baselines in terms of accuracy, interpretability, and generalizability.
arXiv Detail & Related papers (2025-06-03T09:11:15Z)
Overcoming Multi-step Complexity in Multimodal Theory-of-Mind Reasoning: A Scalable Bayesian Planner [32.33827730707331]
We propose a scalable Bayesian ToM planner that decomposes ToM reasoning into stepwise Bayesian updates.<n>Our framework introduces weak-to-strong control, allowing smaller language models to specialize in ToM-specific likelihood estimation.<n>Our method achieves a 4.6% accuracy improvement over state-of-the-art techniques on multimodal ToM benchmarks.
arXiv Detail & Related papers (2025-06-02T04:23:45Z)
Syzygy of Thoughts: Improving LLM CoT with the Minimal Free Resolution [59.39066657300045]
Chain-of-Thought (CoT) prompting enhances the reasoning of large language models (LLMs) by decomposing problems into sequential steps. We propose Syzygy of Thoughts (SoT)-a novel framework that extends CoT by introducing auxiliary, interrelated reasoning paths. SoT captures deeper logical dependencies, enabling more robust and structured problem-solving.
arXiv Detail & Related papers (2025-04-13T13:35:41Z)
Why Reasoning Matters? A Survey of Advancements in Multimodal Reasoning (v1) [66.51642638034822]
Reasoning is central to human intelligence, enabling structured problem-solving across diverse tasks. Recent advances in large language models (LLMs) have greatly enhanced their reasoning abilities in arithmetic, commonsense, and symbolic domains. This paper offers a concise yet insightful overview of reasoning techniques in both textual and multimodal LLMs.
arXiv Detail & Related papers (2025-04-04T04:04:56Z)
Multi-LLM Collaborative Search for Complex Problem Solving [54.194370845153784]
We propose the Mixture-of-Search-Agents (MoSA) paradigm to enhance search-based reasoning. MoSA integrates diverse reasoning pathways by combining independent exploration with iterative refinement among LLMs. Using Monte Carlo Tree Search (MCTS) as a backbone, MoSA enables multiple agents to propose and aggregate reasoning steps, resulting in improved accuracy.
arXiv Detail & Related papers (2025-02-26T06:31:04Z)
ZebraLogic: On the Scaling Limits of LLMs for Logical Reasoning [92.76959707441954]
We introduce ZebraLogic, a comprehensive evaluation framework for assessing LLM reasoning performance. ZebraLogic enables the generation of puzzles with controllable and quantifiable complexity. Our results reveal a significant decline in accuracy as problem complexity grows.
arXiv Detail & Related papers (2025-02-03T06:44:49Z)
Decompose-ToM: Enhancing Theory of Mind Reasoning in Large Language Models through Simulation and Task Decomposition [2.089191490381739]
Theory of Mind (ToM) is the ability to understand and reflect on the mental states of others. Large Language Models (LLMs) possess only a rudimentary understanding of ToM. We propose Decompose-ToM'': an LLM-based inference algorithm that improves model performance on complex ToM tasks.
arXiv Detail & Related papers (2025-01-15T18:44:01Z)
MTMT: Consolidating Multiple Thinking Modes to Form a Thought Tree for Strengthening LLM [15.687878949848182]
Large language models (LLMs) have shown limitations in tasks requiring complex logical reasoning and multi-step problem-solving. We introduce MTMT (Multi-thinking Modes Tree), a novel method that interacts with LLMs to construct a thought tree. We evaluate the performance of MTMT under different parameter configurations, using GPT-4o mini as the base model.
arXiv Detail & Related papers (2024-12-05T09:05:30Z)
Supervised Chain of Thought [5.389461633686935]
Chain of Thought (CoT) prompting offers a promising approach to solving complex reasoning tasks. One-prompt-for-all approach poses significant challenges for models to generate the correct reasoning steps. We show how task-specific supervision is essential for navigating the prompt space accurately and achieving optimal performance.
arXiv Detail & Related papers (2024-10-18T06:25:27Z)
Quantifying Generalization Complexity for Large Language Models [31.721781613271066]
We introduce Scylla, a dynamic evaluation framework that quantitatively measures the generalization abilities of large language models. Scylla disentangles generalization from memorization via assessing model performance on both in-distribution (ID) and out-of-distribution (OOD) data. We benchmark 28LLMs including both open-sourced models such as LLaMA and Qwen families, and close-sourced models like Claude and GPT.
arXiv Detail & Related papers (2024-10-02T17:25:37Z)
MR-Ben: A Meta-Reasoning Benchmark for Evaluating System-2 Thinking in LLMs [55.20845457594977]
Large language models (LLMs) have shown increasing capability in problem-solving and decision-making. We present a process-based benchmark MR-Ben that demands a meta-reasoning skill. Our meta-reasoning paradigm is especially suited for system-2 slow thinking.
arXiv Detail & Related papers (2024-06-20T03:50:23Z)
Limits of Deep Learning: Sequence Modeling through the Lens of Complexity Theory [15.24542569393982]
Despite their successes, deep learning models struggle with tasks requiring complex reasoning and function composition. We present a theoretical and empirical investigation into the limitations of Structured State Space Models (SSMs) and Transformers in such tasks. We highlight the need for innovative solutions to achieve reliable multi-step reasoning and compositional task-solving.
arXiv Detail & Related papers (2024-05-26T19:33:23Z)
MindStar: Enhancing Math Reasoning in Pre-trained LLMs at Inference Time [51.5039731721706]
MindStar is a purely inference-based searching method for large language models. It formulates reasoning tasks as searching problems and proposes two search ideas to identify the optimal reasoning paths. It significantly enhances the reasoning abilities of open-source models, such as Llama-2-13B and Mistral-7B, and achieves comparable performance to GPT-3.5 and Grok-1.
arXiv Detail & Related papers (2024-05-25T15:07:33Z)
Cantor: Inspiring Multimodal Chain-of-Thought of MLLM [83.6663322930814]
We argue that converging visual context acquisition and logical reasoning is pivotal for tackling visual reasoning tasks. We propose an innovative multimodal CoT framework, termed Cantor, characterized by a perception-decision architecture. Our experiments demonstrate the efficacy of the proposed framework, showing significant improvements in multimodal CoT performance.
arXiv Detail & Related papers (2024-04-24T17:59:48Z)
When Do Program-of-Thoughts Work for Reasoning? [51.2699797837818]
We propose complexity-impacted reasoning score (CIRS) to measure correlation between code and reasoning abilities. Specifically, we use the abstract syntax tree to encode the structural information and calculate logical complexity. Code will be integrated into the EasyInstruct framework at https://github.com/zjunlp/EasyInstruct.
arXiv Detail & Related papers (2023-08-29T17:22:39Z)
Faith and Fate: Limits of Transformers on Compositionality [109.79516190693415]
We investigate the limits of transformer large language models across three representative compositional tasks. These tasks require breaking problems down into sub-steps and synthesizing these steps into a precise answer. Our empirical findings suggest that transformer LLMs solve compositional tasks by reducing multi-step compositional reasoning into linearized subgraph matching.
arXiv Detail & Related papers (2023-05-29T23:24:14Z)
Faithful Question Answering with Monte-Carlo Planning [78.02429369951363]
We propose FAME (FAithful question answering with MontE-carlo planning) to answer questions based on faithful reasoning steps. We formulate the task as a discrete decision-making problem and solve it through the interaction of a reasoning environment and a controller. FAME achieves state-of-the-art performance on the standard benchmark.
arXiv Detail & Related papers (2023-05-04T05:21:36Z)
Model-agnostic Measure of Generalization Difficulty [7.183430740278161]
We propose the first model-agnostic measure of the inherent generalization difficulty of tasks. Our measure quantifies the total information required to generalize well on a task minus the information provided by the data. It scales exponentially with the intrinsic dimensionality of the space over which the model must generalize but only intuitively in resolution per dimension.
arXiv Detail & Related papers (2023-05-01T18:48:55Z)
SMT-based Weighted Model Integration with Structure Awareness [18.615397594541665]
We develop an algorithm that combines SMT-based enumeration, an efficient technique in formal verification, with an effective encoding of the problem structure. This allows our algorithm to avoid generating redundant models, resulting in substantial computational savings.
arXiv Detail & Related papers (2022-06-28T09:46:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.