Fugu-MT 論文翻訳(概要): QED-Nano: Teaching a Tiny Model to Prove Hard Theorems

論文の概要: QED-Nano: Teaching a Tiny Model to Prove Hard Theorems

arxiv url: http://arxiv.org/abs/2604.04898v1
Date: Mon, 06 Apr 2026 17:44:25 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-07 15:49:19.318215
Title: QED-Nano: Teaching a Tiny Model to Prove Hard Theorems
Title（参考訳）: QED-Nano: 厳密な理論を証明するためのTiny Modelを教える
Authors: LM-Provers, Yuxiao Qu, Amrith Setlur, Jasper Dekoninck, Edward Beeching, Jia Li, Ian Wu, Lewis Tunstall, Aviral Kumar,
Abstract要約: 我々は,オリンピアードレベルの証明のための4BモデルであるQED-Nanoを構築した。 QED-NanoとQED-Nano-SFTモデル、FineProofs-SFTとFineProofs-RLデータセット、トレーニングおよび評価コードを含む、完全なQED-Nanoパイプラインをリリースする。
参考スコア（独自算出の注目度）: 34.119608370222245
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Proprietary AI systems have recently demonstrated impressive capabilities on complex proof-based problems, with gold-level performance reported at the 2025 International Mathematical Olympiad (IMO). However, the training pipelines behind these systems remain largely undisclosed, and their reliance on large "internal" models and scaffolds makes them expensive to run, difficult to reproduce, and hard to study or improve upon. This raises a central question: can small, open models also be trained to achieve competitive reasoning performance on difficult Olympiad-level math? In this paper, we answer this question by building QED-Nano, a 4B model post-trained for Olympiad-level proofs. Our training recipe has three stages: (1) supervised fine-tuning to imbue good proof-writing styles by distilling from DeepSeek-Math-V2, (2) reinforcement learning (RL) with rubric-based rewards, and (3) expanding RL with a reasoning cache, which decomposes long proofs into iterative summarize-and-refine cycles and enables stronger test-time reasoning. QED-Nano surpasses the proof-generation performance of much larger open models, including Nomos-1 and GPT-OSS-120B, and approaches the performance of proprietary models like Gemini 3 Pro, at a fraction of the inference cost. To support further research on open mathematical reasoning, we release the full QED-Nano pipeline, including the QED-Nano and QED-Nano-SFT models, the FineProofs-SFT and FineProofs-RL datasets, and the training and evaluation code.
Abstract（参考訳）: プロプライエタリなAIシステムは、2025年の国際数学オリンピック(IMO)で報告された、複雑な証明に基づく問題に対する印象的な能力を最近示した。しかし、これらのシステムの背後にある訓練パイプラインはほとんど開示されておらず、大きな「内部」モデルや足場に依存しているため、実行が難しく、再現が難しく、研究や改善が難しい。コンパクトでオープンなモデルは、難しいオリンピアードレベルの数学において、競争力のある推論性能を達成するために訓練できるのか? 本稿では,オリンピアードレベルの証明のための4BモデルであるQED-Nanoを構築することで,この問題に答える。トレーニングレシピは,(1)DeepSeek-Math-V2から蒸留して良質な証明書字スタイルを指導し,(2)強化学習(RL)と,(3)長い証明を反復的な要約と再定義のサイクルに分解し,より強力なテスト時間推論を可能にする推論キャッシュを備えたRLの3段階からなる。 QED-Nano は Nomos-1 や GPT-OSS-120B など、はるかに大きなオープンモデルの実証世代性能を上回り、ジェミニ3 Pro のようなプロプライエタリなモデルの性能を推論コストのごく一部で実現している。オープンな数学的推論のさらなる研究を支援するため、QED-NanoとQED-Nano-SFTモデル、FinProofs-SFTとFinProofs-RLデータセット、トレーニングおよび評価コードを含む完全なQED-Nanoパイプラインをリリースする。

論文の概要: QED-Nano: Teaching a Tiny Model to Prove Hard Theorems

関連論文リスト