Fugu-MT 論文翻訳(概要): RBF++: Quantifying and Optimizing Reasoning Boundaries across Measurable and Unmeasurable Capabilities for Chain-of-Thought Reasoning

論文の概要: RBF++: Quantifying and Optimizing Reasoning Boundaries across Measurable and Unmeasurable Capabilities for Chain-of-Thought Reasoning

arxiv url: http://arxiv.org/abs/2505.13307v1
Date: Mon, 19 May 2025 16:25:55 GMT
ステータス: 翻訳完了
システム内更新日: 2025-05-20 14:57:11.729224
Title: RBF++: Quantifying and Optimizing Reasoning Boundaries across Measurable and Unmeasurable Capabilities for Chain-of-Thought Reasoning
Title（参考訳）: RBF++:Chain-of-Thought Reasoningのための測定可能な、計測不可能な機能にわたる推論境界の定量化と最適化
Authors: Qiguang Chen, Libo Qin, Jinhao Liu, Yue Liao, Jiaqi Wang, Jingxuan Zhou, Wanxiang Che,
Abstract要約: CoT(Chain-of-Thought)推論は、複雑なタスクにおける大規模言語モデル(LLM)の強化に有効であることが証明されている。 CoT 機能の計測可能なバウンダリの評価と最適化を行うフレームワークである Reasoning Boundary Framework++ (RBF++) を紹介する。
参考スコア（独自算出の注目度）: 60.84707424369494
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Chain-of-Thought (CoT) reasoning has proven effective in enhancing large language models (LLMs) on complex tasks, spurring research into its underlying mechanisms. However, two primary challenges remain for real-world applications: (1) the lack of quantitative metrics and actionable guidelines for evaluating and optimizing measurable boundaries of CoT capability, and (2) the absence of methods to assess boundaries of unmeasurable CoT capability, such as multimodal perception. To address these gaps, we introduce the Reasoning Boundary Framework++ (RBF++). To tackle the first challenge, we define the reasoning boundary (RB) as the maximum limit of CoT performance. We also propose a combination law for RBs, enabling quantitative analysis and offering actionable guidance across various CoT tasks. For the second challenge, particularly in multimodal scenarios, we introduce a constant assumption, which replaces unmeasurable RBs with scenario-specific constants. Additionally, we propose the reasoning boundary division mechanism, which divides unmeasurable RBs into two sub-boundaries, facilitating the quantification and optimization of both unmeasurable domain knowledge and multimodal perception capabilities. Extensive experiments involving 38 models across 13 tasks validate the feasibility of our framework in cross-modal settings. Additionally, we evaluate 10 CoT strategies, offer insights into optimization and decay from two complementary perspectives, and expand evaluation benchmarks for measuring RBs in LLM reasoning. We hope this work advances the understanding of RBs and optimization strategies in LLMs. Code and data are available at https://github.com/LightChen233/reasoning-boundary.
Abstract（参考訳）: CoT(Chain-of-Thought)推論は、複雑なタスクにおける大規模言語モデル(LLM)の強化に有効であることが証明されており、その基盤となるメカニズムの研究を刺激している。しかし, 実世界の応用には, 1) CoT能力の計測不能境界の評価と最適化のための量的指標と実行可能なガイドラインの欠如, (2) マルチモーダル知覚などの測定不能CoT能力の境界を評価する方法の欠如, の2つの課題が残っている。これらのギャップに対処するために、Reasoning Boundary Framework++ (RBF++)を紹介します。最初の課題に取り組むために、推論境界(RB)をCoT性能の最大限として定義する。また、RBの組合せ法則を提案し、定量分析を可能にし、様々なCoTタスクに対して実行可能なガイダンスを提供する。 2つ目の課題は、特にマルチモーダルシナリオにおいて、測定不能なRBをシナリオ固有の定数に置き換える定数仮定を導入することである。さらに、測定不能なRBを2つのサブバウンダリに分割し、測定不能なドメイン知識とマルチモーダル認識能力の両方の定量化と最適化を容易にする推論境界分割機構を提案する。 13タスクにわたる38のモデルを含む大規模な実験は、クロスモーダル環境でのフレームワークの実現可能性を検証する。さらに,10個のCoT戦略を評価し,2つの相補的な視点から最適化と崩壊の洞察を与え,LLM推論におけるRBの測定のための評価ベンチマークを拡張した。 LLMにおけるRBの理解と最適化戦略の進展を願っている。コードとデータはhttps://github.com/LightChen233/reasoning-boundary.comで公開されている。

関連論文リスト

Coherent Multimodal Reasoning with Iterative Self-Evaluation for Vision-Language Models [4.064135211977999]
大規模言語モデル (LLMs) と視覚言語モデル (LVLMs) は複雑で多段階のクロスモーダルな常識推論タスクに苦しむ。我々は,LVLMの共通感覚推論能力を高める新しいアプローチであるコヒーレント・マルチモーダル推論フレームワーク(CMRF)を提案する。 CMRFは複雑なクエリを分解し、ステップバイステップの推論を生成し、エラーを自己修正することで人間の問題解決を模倣する。
論文参考訳（メタデータ） (2025-08-04T20:33:58Z)
LMM-R1: Empowering 3B LMMs with Strong Reasoning Abilities Through Two-Stage Rule-Based RL [32.67667242745463]
規則に基づく多モーダル推論のための2段階のフレームワークをtextbfFoundational Reasoning Enhancement (FRE) と textbfMultimodal Generalization Training (MGT) で提案する。 Qwen2.5-VL-Instruct-3Bの実験では、LMM-R1はマルチモーダルとテキストのみのベンチマークでそれぞれ平均4.83%、平均4.5%向上し、複雑なフットボールゲームでは3.63%向上した。
論文参考訳（メタデータ） (2025-03-10T17:04:14Z)
LR^2Bench: Evaluating Long-chain Reflective Reasoning Capabilities of Large Language Models via Constraint Satisfaction Problems [7.379503137362718]
LR$2$Benchは,Long-chain Reflective Reasoning機能を評価するために設計された新しいベンチマークである。評価の結果,DeepSeek-R1 や OpenAI o1-preview のような先進的な LRM でさえ,LR$2$Bench のタスクと競合することが明らかとなった。
論文参考訳（メタデータ） (2025-02-25T04:51:17Z)
Offline Learning for Combinatorial Multi-armed Bandits [56.96242764723241]
Off-CMABはCMABの最初のオフライン学習フレームワークである。 Off-CMABは悲観的な報酬推定と解法を組み合わせる。合成および実世界のデータセットの実験は、CLCBの優れた性能を強調している。
論文参考訳（メタデータ） (2025-01-31T16:56:18Z)
Unlocking the Capabilities of Thought: A Reasoning Boundary Framework to Quantify and Optimize Chain-of-Thought [61.588465852846646]
大型言語モデル(LLM)の性能向上のための有望なアプローチとして、Chain-of-Thought(CoT)推論が登場した。本稿では,これらの課題に対処するための新しい推論境界フレームワーク(RBF)を提案する。
論文参考訳（メタデータ） (2024-10-08T05:26:28Z)
Making Large Language Models Better Planners with Reasoning-Decision Alignment [70.5381163219608]
マルチモーダリティ強化LLMに基づくエンドツーエンド意思決定モデルを提案する。ペア化されたCoTと計画結果との推論・決定アライメントの制約を提案する。提案する大規模言語プランナをRDA-Driverとして推論・決定アライメントする。
論文参考訳（メタデータ） (2024-08-25T16:43:47Z)
MR-Ben: A Meta-Reasoning Benchmark for Evaluating System-2 Thinking in LLMs [55.20845457594977]
大規模言語モデル(LLM)は、問題解決と意思決定の能力の向上を示している。本稿ではメタ推論技術を必要とするプロセスベースのベンチマークMR-Benを提案する。メタ推論のパラダイムは,システム2のスロー思考に特に適しています。
論文参考訳（メタデータ） (2024-06-20T03:50:23Z)
Cantor: Inspiring Multimodal Chain-of-Thought of MLLM [83.6663322930814]
視覚的コンテキスト獲得と論理的推論の集約は、視覚的推論タスクに取り組む上で重要であると我々は主張する。我々はCantorと呼ばれる革新的なマルチモーダルCoTフレームワークを提案し、その特徴は知覚決定アーキテクチャである。提案手法の有効性を実証し,マルチモーダルCoT性能の大幅な向上を示した。
論文参考訳（メタデータ） (2024-04-24T17:59:48Z)
Insights into Alignment: Evaluating DPO and its Variants Across Multiple Tasks [35.36615140853107]
本研究では,多言語モデル(LLM)と人間の嗜好を整合させるためのDPOとその変種について検討する。評価対象は、対話、推論、数学的問題解決、質問応答、真理性、MT-Bench、Big Bench、Open LLM Leaderboardを含む13のベンチマークである。トレーニングデータのサブセットが小さい場合でも,アライメント手法がほぼ最適に近い性能を達成できることが判明した。
論文参考訳（メタデータ） (2024-04-23T03:55:01Z)
Efficient Knowledge Compilation Beyond Weighted Model Counting [7.828647825246474]
このような問題に対する一般的なフレームワークとして,第2レベル代数モデルカウント (2AMC) を導入している。 KC(Knowledge Compilation)に基づく第1レベルのテクニックは、変数順序制約を課すことで、特定の2AMCインスタンスに適応している。 2AMC問題の論理構造を利用して、これらの制約の一部を省略し、負の効果を制限できることが示される。
論文参考訳（メタデータ） (2022-05-16T08:10:40Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。