Fugu-MT 論文翻訳(概要): Probing How Scalable Table Data Enhances General Long-Context Reasoning

論文の概要: Probing How Scalable Table Data Enhances General Long-Context Reasoning

arxiv url: http://arxiv.org/abs/2603.21719v1
Date: Mon, 23 Mar 2026 09:05:46 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-24 19:11:39.579525
Title: Probing How Scalable Table Data Enhances General Long-Context Reasoning
Title（参考訳）: 拡張性のあるテーブルデータによって一般的なロングコンテキスト推論が実現される
Authors: Huaibing Xie, Guoliang Zhao, Yang Liu, Shihan Dou, Siming Huang, Yanling Xiao, Shaolei Wang, Yiting Liu, Cheng Zhang, Shaofan Liu, Pluto Zhou,
Abstract要約: 周期構造を持つ構造化テーブルデータから,長文推論の可能性が示唆された。高品質で多種多様で検証可能な構造化テーブルデータを合成するための,シンプルでスケーラブルなパイプライン(TableLong)を提案する。
参考スコア（独自算出の注目度）: 18.383487310920597
License: http://creativecommons.org/licenses/by/4.0/
Abstract: As real-world tasks grow increasingly complex, long-context reasoning has become a core capability for Large Language Models (LLMs). However, few studies explore which data types are effective for long-context reasoning and why. We find that structured table data with periodic structures shows strong potential for long-context reasoning. Motivated by this observation, we mathematically analyze tabular dependency structures using mutual information, revealing periodic non-vanishing dependencies in table data. Furthermore, we systematically analyze the capabilities of structured table data, conduct relevant scaling experiments, and validate its underlying mechanisms for enhancing long-context reasoning, yielding several meaningful insights. Leveraging these insights, we propose a simple yet scalable pipeline(TableLong) for synthesizing high-quality, diverse, and verifiable structured table data to boost long-context reasoning via RL. Extensive experimental results demonstrate that table data significantly enhances the long-context reasoning capability of LLMs across multiple long-context benchmarks (+8.24\% on average), and even improves performance on out-of-domain benchmarks (+8.06\% on average). We hope that our insights provide practical guidance for effective post-training data to enhance long-context reasoning in LLMs.
Abstract（参考訳）: 現実世界のタスクが複雑化するにつれ、長期コンテキスト推論は大規模言語モデル(LLM)のコア機能となっている。しかし、長文推論にどのようなデータ型が有効か、なぜ有効かを調べる研究はほとんどない。周期構造を持つ構造化テーブルデータから,長文推論の可能性が示唆された。本研究の目的は,相互情報を用いた表従属構造を数学的に解析し,表データの周期的非消滅的依存関係を明らかにすることである。さらに、構造化テーブルデータの性能を体系的に分析し、関連するスケーリング実験を行い、その基盤となるメカニズムを長文推論の強化に検証し、いくつかの意味のある洞察を得る。これらの知見を生かして、RLによる長文推論を促進するために、高品質で多種多様で検証可能な構造化テーブルデータを合成するための、シンプルでスケーラブルなパイプライン(TableLong)を提案する。大規模な実験結果によると、テーブルデータは複数の長文ベンチマーク(平均で+8.24\%)でLLMの長文推論能力を著しく向上し、ドメイン外のベンチマーク(平均で+8.06\%)のパフォーマンスも向上している。我々は,LLMの長文推論を強化するために,効果的な後学習データのための実践的なガイダンスを提供することを期待している。

論文の概要: Probing How Scalable Table Data Enhances General Long-Context Reasoning

関連論文リスト