Fugu-MT 論文翻訳(概要): PromptCoT 2.0: Scaling Prompt Synthesis for Large Language Model Reasoning

論文の概要: PromptCoT 2.0: Scaling Prompt Synthesis for Large Language Model Reasoning

arxiv url: http://arxiv.org/abs/2509.19894v1
Date: Wed, 24 Sep 2025 08:46:29 GMT
ステータス: 翻訳完了
システム内更新日: 2025-09-25 20:53:19.744615
Title: PromptCoT 2.0: Scaling Prompt Synthesis for Large Language Model Reasoning
Title（参考訳）: PromptCoT 2.0:大規模言語モデル推論のためのスケーリングプロンプト合成
Authors: Xueliang Zhao, Wei Wu, Jian Guan, Zhuocheng Gong, Lingpeng Kong,
Abstract要約: 大規模言語モデル(LLM)は、会話システムからオリンピアード数学や競合プログラミングといったタスクの強力な推論へと進化している。本稿では,手作り合成を期待最大化ループで置き換えるスケーラブルなフレームワークであるPromptCoT 2.0を提案する。これにより、以前のコーパスよりも難しく、より多様な問題が発生する。
参考スコア（独自算出の注目度）: 55.78158607697319
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large language models (LLMs) are evolving from conversational systems into strong reasoners for tasks such as Olympiad mathematics and competitive programming. While scaling parameters and test-time computation has driven progress, a key bottleneck is the lack of high-quality training problems: human-curated datasets are costly and limited, while existing synthetic corpora are often too easy or narrow. PromptCoT 1.0 showed that injecting rationales into prompt synthesis increases problem difficulty. Building on this, we present PromptCoT 2.0, a scalable framework that replaces hand-crafted heuristics with an expectation-maximization (EM) loop, where rationales are iteratively refined to guide prompt construction. This produces problems that are both harder and more diverse than prior corpora. The synthetic prompts support two post-training regimes: (1) Self-Play, where strong models improve autonomously via verifiable feedback without stronger teachers; and (2) Supervised Fine-Tuning (SFT), where weaker models learn from teacher-distilled traces. Extensive experiments demonstrate the effectiveness of this approach. In self-play, applying PromptCoT 2.0 to Qwen3-30B-A3B-Thinking-2507 sets new state-of-the-art results at the 30B scale, with +4.4, +4.8, and +5.3 on AIME 24/25 and HMMT 25, +6.1 and +5.0 on LiveCodeBench v5/v6, and +35 Elo on Codeforces. In SFT, training Qwen2.5-7B-Instruct solely on synthetic prompts boosts accuracy to 73.1 (AIME 24), 65.6 (AIME 25), and 53.4 (LiveCodeBench v5), surpassing models trained on human or hybrid data. Analyses further confirm that PromptCoT 2.0 yields fundamentally harder and distributionally distinct problems. These results establish prompt synthesis as a new axis for scaling reasoning and position PromptCoT 2.0 as a scalable foundation for future open-source models. The implementation is available at https://github.com/inclusionAI/PromptCoT.
Abstract（参考訳）: 大規模言語モデル(LLM)は、会話システムからオリンピアード数学や競合プログラミングといったタスクの強力な推論へと進化している。パラメータのスケーリングとテストタイムの計算が進歩しているが、重要なボトルネックは、高品質なトレーニング問題の欠如である。 PromptCoT 1.0は、素早い合成に合理性を注入すると問題の難しさが増すことを示した。これに基づいて,手作りのヒューリスティックを期待最大化(EM)ループに置き換えるスケーラブルなフレームワークであるPromptCoT 2.0を提案する。これにより、以前のコーパスよりも難しく、より多様な問題が発生する。本研究は,(1)より強い教師を伴わない検証可能なフィードバックによって,強いモデルが自律的に向上するセルフプレイ,(2)弱いモデルが教師に教えられた痕跡から学習するスーパーバイザードファインチューニング(SFT)の2つのポストトレーニング体制を支援する。大規模な実験は、このアプローチの有効性を実証している。セルフプレイでは、Qwen3-30B-A3B-Thinking-2507にPromptCoT 2.0を適用し、AIME 24/25およびHMMT 25、+6.1、+5.0、LiveCodeBench v5/v6、+35 Elo on Codeforcesでは+4.4、+4.8、+5.3という新しい最先端の結果を30Bスケールで設定する。 SFTでは、合成プロンプトのみに基づくQwen2.5-7B-Instructのトレーニングにより、精度は73.1(AIME 24)、65.6(AIME 25)、53.4(LiveCodeBench v5)に向上し、人間またはハイブリッドデータで訓練されたモデルを上回る。分析により、PromptCoT 2.0が根本的に難しく、分布的に異なる問題をもたらすことが確認される。これらの結果は、プロンプトCoT 2.0を将来のオープンソースモデルのためのスケーラブルな基盤として位置づけ、推論をスケールするための新しい軸として即時合成を確立する。実装はhttps://github.com/inclusionAI/PromptCoT.comで公開されている。

論文の概要: PromptCoT 2.0: Scaling Prompt Synthesis for Large Language Model Reasoning

関連論文リスト