Fugu-MT 論文翻訳(概要): Achieving Gold-Medal-Level Olympiad Reasoning via Simple and Unified Scaling

論文の概要: Achieving Gold-Medal-Level Olympiad Reasoning via Simple and Unified Scaling

arxiv url: http://arxiv.org/abs/2605.13301v1
Date: Wed, 13 May 2026 10:13:26 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-14 23:30:27.975192
Title: Achieving Gold-Medal-Level Olympiad Reasoning via Simple and Unified Scaling
Title（参考訳）: シンプルで統一されたスケーリングによるゴールド・メダル・レベルオリンピアド推論の実現
Authors: Yafu Li, Runzhe Zhan, Haoran Zhang, Shunkai Zhang, Yizhuo Li, Zhilin Wang, Jiacheng Chen, Futing Wang, Xuyang Hu, Yuchen Fan, Bangjie Xu, Yucheng Su, Xinmiao Han, Chenxi Li, Haodi Lei, Yufeng Zhao, Zejin Lin, Qianjia Cheng, Tong Zhu, Xiaoye Qu, Ganqu Cui, Peng Ye, Yun Luo, Zhouchen Lin, Yu Qiao, Bowen Zhou, Ning Ding, Yu Cheng,
Abstract要約: 訓練後,背骨を厳密なオリンピックレベル解法に変換するためのシンプルで統一的なレシピを紹介した。約340Kのサブ8K軌道上でSFTで30B-A3Bのバックボーンをトレーニングし,200RLステップを行った。結果として得られるモデル SU-01 は、100Kトークンを超える軌道上の難しい問題に対する安定な推論をサポートする。
参考スコア（独自算出の注目度）: 108.48818215929494
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent progress in reasoning models has substantially advanced long-horizon mathematical and scientific problem solving, with several systems now reaching gold-medal-level performance on International Mathematical Olympiad (IMO) and International Physics Olympiad (IPhO) problems. In this paper, we introduce a simple and unified recipe for converting a post-trained reasoning backbone into a rigorous olympiad-level solver. The recipe first uses a reverse-perplexity curriculum for SFT to instill rigorous proof-search and self-checking behaviors, then scales these behaviors through a two-stage RL pipeline that progresses from RL with verifiable rewards to more delicate proof-level RL, and finally boosts solving performance with test-time scaling. Applying this recipe, we train a 30B-A3B backbone with SFT on around 340K sub-8K-token trajectories followed by 200 RL steps. The resulting model, SU-01, supports stable reasoning on difficult problems with trajectories exceeding 100K tokens, while achieving gold-medal-level performance on mathematical and physical olympiad competitions, including IMO 2025/USAMO 2026 and IPhO 2024/2025. It also demonstrates strong generalization of scientific reasoning to domains beyond mathematics and physics.
Abstract（参考訳）: 推論モデルの最近の進歩は、数学的および科学的な問題解決に大きく進歩しており、いくつかのシステムは国際数学オリンピアード(IMO)と国際物理オリンピアード(IPhO)の問題をゴールド・メディカルレベルに到達している。本稿では,学習後の推論バックボーンを厳格なオリンピックレベルの解法に変換するための,シンプルで統一的なレシピを提案する。レシピはまず、厳密な証明探索と自己チェックの振る舞いを取り入れるために、SFTの逆パープレキシティカリキュラムを使用し、それから2段階のRLパイプラインを通じてこれらの振る舞いをスケールし、検証可能な報酬でRLからより繊細な証明レベルのRLへと発展させ、最終的にテスト時間スケーリングによるパフォーマンスの解決を向上する。このレシピを応用して、約340Kサブ8Kトーケン軌道上で30B-A3BバックボーンをSFTでトレーニングし、200RLステップで処理する。結果として得られたSU-01は100Kトークンを超えるトラジェクトリの難解な問題に対する安定した推論をサポートし、IMO 2025/USAMO 2026やIPhO 2024/2025を含む数学的および物理的オリンピックにおけるゴールド・メディカルレベルのパフォーマンスを実現している。また、数学や物理学以外の領域への科学的推論の強い一般化も示している。

論文の概要: Achieving Gold-Medal-Level Olympiad Reasoning via Simple and Unified Scaling

関連論文リスト