Fugu-MT 論文翻訳(概要): $π^2$: Structure-Originated Reasoning Data Improves Long-Context Reasoning Ability of Large Language Models

論文の概要: $π^2$: Structure-Originated Reasoning Data Improves Long-Context Reasoning Ability of Large Language Models

arxiv url: http://arxiv.org/abs/2604.05114v1
Date: Mon, 06 Apr 2026 19:19:58 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-08 17:42:09.460467
Title: $π^2$: Structure-Originated Reasoning Data Improves Long-Context Reasoning Ability of Large Language Models
Title（参考訳）: $π^2$: 構造化推論データによる大規模言語モデルの長期推論能力の向上
Authors: Quyet V. Do, Thinh Pham, Nguyen Nguyen, Sha Li, Pratibha Zunjare, Tu Vu,
Abstract要約: 大規模言語モデル(LLM)における長文推論を改善するために,初期構造化データから推論データをキュレートするパイプラインについて検討する。当社のアプローチは、厳格なQAキュレーションを通じて高品質な推論データを構築します。我々のデータセットは自己蒸留を促進するが、textscsmallgpt-oss-20bは平均性能を+4.4%向上させる。
参考スコア（独自算出の注目度）: 17.718858777963415
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We study a pipeline that curates reasoning data from initial structured data for improving long-context reasoning in large language models (LLMs). Our approach, $π^2$, constructs high-quality reasoning data through rigorous QA curation: 1) extracting and expanding tables from Wikipedia, 2) from the collected tables and relevant context, generating realistic and multi-hop analytical reasoning questions whose answers are automatically determined and verified through dual-path code execution, and 3) back-translating step-by-step structured reasoning traces as solutions of QA pairs given realistic web-search context. Supervised fine-tuning with \textsc{\small{gpt-oss-20b}} and \textsc{\small{Qwen3-4B-Instruct-2507}} on $π^2$ yields consistent improvements across four long-context reasoning benchmarks and our alike $π^2$-Bench, with average absolute accuracy gains of +4.3% and +2.7% respectively. Notably, our dataset facilitates self-distillation, where \textsc{\small{gpt-oss-20b}} even improves its average performance by +4.4% with its own reasoning traces, demonstrating $π^2$'s usefulness. Our code, data, and models are open-source at https://github.com/vt-pi-squared/pi-squared.
Abstract（参考訳）: 本研究では,大規模言語モデル(LLM)の長文推論を改善するために,初期構造化データから推論データをキュレートするパイプラインについて検討する。我々のアプローチである$π^2$は厳密なQAキュレーションを通して高品質な推論データを構築する。 1)ウィキペディアから表を抽出し、拡張する。 2) 収集した表と関連状況から, 解答が自動決定され, 二重パスコード実行によって検証される現実的, マルチホップ解析的推論質問を生成する。 3) リアルなWeb検索コンテキストを与えられたQAペアのソリューションとして,ステップバイステップの構造化推論トレースをバック翻訳する。 π^2$ での \textsc{\small{gpt-oss-20b}} と \textsc{\small{Qwen3-4B-Instruct-2507}} による超微調整は、4つの長文推論ベンチマークと我々の類似の $π^2$-Bench で、平均絶対精度は +4.3% と +2.7% である。特に、我々のデータセットは自己蒸留を促進するが、そこでは \textsc{\small{gpt-oss-20b}} は、その平均性能を+4.4%向上させる。私たちのコード、データ、モデルはhttps://github.com/vt-pi-squared/pi-squared.comでオープンソース化されています。

論文の概要: $π^2$: Structure-Originated Reasoning Data Improves Long-Context Reasoning Ability of Large Language Models

関連論文リスト