Related papers: RBF++: Quantifying and Optimizing Reasoning Boundaries across Measurable and Unmeasurable Capabilities for Chain-of-Thought Reasoning

RBF++: Quantifying and Optimizing Reasoning Boundaries across Measurable and Unmeasurable Capabilities for Chain-of-Thought Reasoning

URL: http://arxiv.org/abs/2505.13307v1
Date: Mon, 19 May 2025 16:25:55 GMT
Title: RBF++: Quantifying and Optimizing Reasoning Boundaries across Measurable and Unmeasurable Capabilities for Chain-of-Thought Reasoning
Authors: Qiguang Chen, Libo Qin, Jinhao Liu, Yue Liao, Jiaqi Wang, Jingxuan Zhou, Wanxiang Che,
Abstract summary: Chain-of-Thought (CoT) reasoning has proven effective in enhancing large language models (LLMs) on complex tasks.<n>We introduce the Reasoning Boundary Framework++ (RBF++), a framework for evaluating and optimizing measurable boundaries of CoT capability.
Score: 60.84707424369494
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Chain-of-Thought (CoT) reasoning has proven effective in enhancing large language models (LLMs) on complex tasks, spurring research into its underlying mechanisms. However, two primary challenges remain for real-world applications: (1) the lack of quantitative metrics and actionable guidelines for evaluating and optimizing measurable boundaries of CoT capability, and (2) the absence of methods to assess boundaries of unmeasurable CoT capability, such as multimodal perception. To address these gaps, we introduce the Reasoning Boundary Framework++ (RBF++). To tackle the first challenge, we define the reasoning boundary (RB) as the maximum limit of CoT performance. We also propose a combination law for RBs, enabling quantitative analysis and offering actionable guidance across various CoT tasks. For the second challenge, particularly in multimodal scenarios, we introduce a constant assumption, which replaces unmeasurable RBs with scenario-specific constants. Additionally, we propose the reasoning boundary division mechanism, which divides unmeasurable RBs into two sub-boundaries, facilitating the quantification and optimization of both unmeasurable domain knowledge and multimodal perception capabilities. Extensive experiments involving 38 models across 13 tasks validate the feasibility of our framework in cross-modal settings. Additionally, we evaluate 10 CoT strategies, offer insights into optimization and decay from two complementary perspectives, and expand evaluation benchmarks for measuring RBs in LLM reasoning. We hope this work advances the understanding of RBs and optimization strategies in LLMs. Code and data are available at https://github.com/LightChen233/reasoning-boundary.

Related papers

Coherent Multimodal Reasoning with Iterative Self-Evaluation for Vision-Language Models [4.064135211977999]
Large language models (LLMs) and vision-language models (LVLMs) struggle with complex, multi-step, cross-modal common sense reasoning tasks.<n>We propose the Coherent Multimodal Reasoning Framework (CMRF), a novel approach that enhances LVLMs' common sense reasoning capabilities.<n>CMRF mimics human problem-solving by decomposing complex queries, generating step-by-step inferences, and self-correcting errors.
arXiv Detail & Related papers (2025-08-04T20:33:58Z)
LMM-R1: Empowering 3B LMMs with Strong Reasoning Abilities Through Two-Stage Rule-Based RL [32.67667242745463]
We propose a two-stage framework adapting rule-based RL for multimodal reasoning through textbfFoundational Reasoning Enhancement (FRE) followed by textbfMultimodal Generalization Training (MGT).<n>Experiments on Qwen2.5-VL-Instruct-3B demonstrate that LMM-R1 achieves 4.83% and 4.5% average improvements over baselines in multimodal and text-only benchmarks, respectively, with a 3.63% gain in complex Football Game tasks.
arXiv Detail & Related papers (2025-03-10T17:04:14Z)
LR^2Bench: Evaluating Long-chain Reflective Reasoning Capabilities of Large Language Models via Constraint Satisfaction Problems [7.379503137362718]
We introduce LR$2$Bench, a novel benchmark designed to evaluate the Long-chain Reflective Reasoning capabilities of Large Language Models.<n>Our evaluation reveals that even the most advanced LRMs, such as DeepSeek-R1 and OpenAI o1-preview, struggle with tasks in LR$2$Bench.
arXiv Detail & Related papers (2025-02-25T04:51:17Z)
Offline Learning for Combinatorial Multi-armed Bandits [56.96242764723241]
Off-CMAB is the first offline learning framework for CMAB.<n>Off-CMAB combines pessimistic reward estimations with solvers.<n>Experiments on synthetic and real-world datasets highlight the superior performance of CLCB.
arXiv Detail & Related papers (2025-01-31T16:56:18Z)
Unlocking the Capabilities of Thought: A Reasoning Boundary Framework to Quantify and Optimize Chain-of-Thought [61.588465852846646]
Chain-of-Thought (CoT) reasoning has emerged as a promising approach for enhancing the performance of large language models (LLMs) In this work, we introduce a novel reasoning boundary framework (RBF) to address these challenges.
arXiv Detail & Related papers (2024-10-08T05:26:28Z)
Making Large Language Models Better Planners with Reasoning-Decision Alignment [70.5381163219608]
We motivate an end-to-end decision-making model based on multimodality-augmented LLM. We propose a reasoning-decision alignment constraint between the paired CoTs and planning results. We dub our proposed large language planners with reasoning-decision alignment as RDA-Driver.
arXiv Detail & Related papers (2024-08-25T16:43:47Z)
MR-Ben: A Meta-Reasoning Benchmark for Evaluating System-2 Thinking in LLMs [55.20845457594977]
Large language models (LLMs) have shown increasing capability in problem-solving and decision-making.<n>We present a process-based benchmark MR-Ben that demands a meta-reasoning skill.<n>Our meta-reasoning paradigm is especially suited for system-2 slow thinking.
arXiv Detail & Related papers (2024-06-20T03:50:23Z)
Cantor: Inspiring Multimodal Chain-of-Thought of MLLM [83.6663322930814]
We argue that converging visual context acquisition and logical reasoning is pivotal for tackling visual reasoning tasks. We propose an innovative multimodal CoT framework, termed Cantor, characterized by a perception-decision architecture. Our experiments demonstrate the efficacy of the proposed framework, showing significant improvements in multimodal CoT performance.
arXiv Detail & Related papers (2024-04-24T17:59:48Z)
Insights into Alignment: Evaluating DPO and its Variants Across Multiple Tasks [35.36615140853107]
This study evaluates Direct Preference Optimization (DPO) and its variants for aligning Large Language Models (LLMs) with human preferences.<n>Our evaluation spans 13 benchmarks covering dialogue, reasoning, mathematical problem-solving, question answering, truthfulness, MT-Bench, Big Bench, and the Open LLM Leaderboard.<n>We find that alignment methods often achieve near optimal performance even with smaller subsets of training data.
arXiv Detail & Related papers (2024-04-23T03:55:01Z)
Efficient Knowledge Compilation Beyond Weighted Model Counting [7.828647825246474]
We introduce Second Level Algebraic Model Counting (2AMC) as a generic framework for these kinds of problems. First level techniques based on Knowledge Compilation (KC) have been adapted for specific 2AMC instances by imposing variable order constraints. We show that we can exploit the logical structure of a 2AMC problem to omit parts of these constraints, thus limiting the negative effect.
arXiv Detail & Related papers (2022-05-16T08:10:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.