Fugu-MT 論文翻訳(概要): Pushing LLMs to Their Logical Reasoning Bound: The Role of Data Reasoning Intensity

論文の概要: Pushing LLMs to Their Logical Reasoning Bound: The Role of Data Reasoning Intensity

arxiv url: http://arxiv.org/abs/2509.24836v1
Date: Mon, 29 Sep 2025 14:20:04 GMT
ステータス: 翻訳完了
システム内更新日: 2025-09-30 22:32:20.042758
Title: Pushing LLMs to Their Logical Reasoning Bound: The Role of Data Reasoning Intensity
Title（参考訳）: LLMを論理的推論境界にプッシュする:データ推論強度の役割
Authors: Zhen Bi, Zhenlin Hu, Jinnan Yang, Mingyang Chen, Cheng Deng, Yida Xue, Zeyu Yang, Qing Shen, Zhenfang Liu, Kang Zhao, Ningyu Zhang, Jungang Lou,
Abstract要約: データ推論強度 (Data Reasoning Intensity, DRI) は, サンプルの潜在論理的推論複雑性を定量化する新しい指標である。次に、学習データの論理的推論強度を体系的に強化する再認識最適化戦略を導入する。
参考スコア（独自算出の注目度）: 59.27594125465172
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent advances in large language models (LLMs) highlight the importance of training data structure and quality in shaping reasoning behavior. However, most existing approaches focus on transforming data formats while neglecting the internal reasoning complexity of training samples, leaving the reasoning potential of data under-explored and underutilized. In this work, we posit that LLM logical reasoning performance is jointly constrained by the potential of the training data and the cognitive capacity of the model. To make this relationship measurable, we introduce Data Reasoning Intensity (DRI), a novel metric that quantifies the latent logical reasoning complexity of samples by decomposing and aggregating their logical structures. This allows us to analyze how well current LLMs utilize logical reasoning signals and identify performance gaps relative to data potential. Based on this insight, we introduce a re-cognizing optimization strategy that systematically enhances the logical reasoning intensity of training data.Rather than increasing data volume, our method re-optimizes existing samples to better align with the LLM's logical reasoning boundary. Extensive experiments show that our approach significantly improves performance and generalization over data-centric strategies. We further validate our method under a reinforcement learning framework. Our results indicate that prioritizing reasoning complexity in data rather than sheer scale or superficial form is essential to realizing LLMs' full cognitive potential.
Abstract（参考訳）: 大規模言語モデル(LLM)の最近の進歩は、学習データ構造の重要性と、形状推論行動における品質を強調している。しかしながら、既存のほとんどのアプローチでは、トレーニングサンプルの内部推論の複雑さを無視しながら、データフォーマットの変換に重点を置いている。本研究では,LLM論理推論性能が,学習データのポテンシャルとモデルの認知能力によって協調的に制約されていることを示唆する。この関係を測定可能にするために,データ推論インテンシティ (Data Reasoning Intensity, DRI) を導入する。これにより、現在のLLMが論理的推論信号をどのように活用しているかを分析し、データポテンシャルに対する性能ギャップを特定することができる。この知見に基づいて、トレーニングデータの論理的推論強度を体系的に向上する再認識最適化戦略を導入し、データ量を増やすのではなく、既存のサンプルを再最適化し、LLMの論理的推論境界に適合させる。大規模な実験により,本手法はデータ中心戦略よりも性能と一般化を著しく向上させることが示された。我々はさらに、強化学習の枠組みの下で、我々の方法を検証する。以上の結果から,LLMの完全な認知能力を実現するには,スケールや表面形態よりもデータにおける推論複雑性の優先順位付けが不可欠であることが示唆された。

論文の概要: Pushing LLMs to Their Logical Reasoning Bound: The Role of Data Reasoning Intensity

関連論文リスト