Fugu-MT 論文翻訳(概要): Compass-Thinker-7B Technical Report

論文の概要: Compass-Thinker-7B Technical Report

arxiv url: http://arxiv.org/abs/2508.08909v1
Date: Tue, 12 Aug 2025 12:58:12 GMT
ステータス: 翻訳完了
システム内更新日: 2025-08-13 21:07:34.42694
Title: Compass-Thinker-7B Technical Report
Title（参考訳）: Compass-Thinker-7B 技術報告
Authors: Anxiang Zeng, Haibo Zhang, Kaixiang Mo, Long Zhang, Shuman Liu, Yanhui Huang, Yawen Liu, Yuepeng Sheng, Yuwei Huang,
Abstract要約: 我々は,少ない計算資源とコストでReinforcement Learn-ingの可能性を探るため,Compass-Thinker-7Bモデルを提案する。 Compass-Thinker-7Bは、Spe-cially設計のReinforcement Learning Pipelineを通じて、オープンソースモデルからトレーニングされている。
参考スコア（独自算出の注目度）: 8.496143273813718
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent R1-Zero-like research further demonstrates that reasoning extension has given large language models (LLMs) unprecedented reasoning capabilities, and Reinforcement Learning is the core tech- nology to elicit its complex reasoning. However, conducting RL experiments directly on hyperscale models involves high computational costs and resource demands, posing significant risks. We pro- pose the Compass-Thinker-7B model, which aims to explore the potential of Reinforcement Learn- ing with less computational resources and costs, and provides insights for further research into RL recipes for larger models. Compass-Thinker-7B is trained from an open source model through a spe- cially designed Reinforcement Learning Pipeline. we curate a dataset of 30k verifiable mathematics problems for the Reinforcement Learning Pipeline. By configuring data and training settings with dif- ferent difficulty distributions for different stages, the potential of the model is gradually released and the training efficiency is improved. Extensive evaluations show that Compass-Thinker-7B possesses exceptional reasoning potential, and achieves superior performance on mathematics compared to the same-sized RL model.Especially in the challenging AIME2024 evaluation, Compass-Thinker-7B achieves 40% accuracy.
Abstract（参考訳）: 最近のR1-Zeroライクな研究は、推論拡張が大きな言語モデル(LLM)に前例のない推論能力を与え、強化学習はその複雑な推論を引き出す中核的な技術ノロジーであることを示している。しかし、超大規模モデル上で直接RL実験を行うには、高い計算コストとリソース要求が伴い、重大なリスクが生じる。我々は,より少ない計算資源とコストでReinforcement Learn-ingの可能性を探求することを目的としたCompass-Thinker-7Bモデルを提案し,大規模モデルのRLレシピについてさらなる研究を行うための洞察を提供する。 Compass-Thinker-7Bは、Spe-cially設計のReinforcement Learning Pipelineを通じて、オープンソースモデルからトレーニングされている。 Reinforcement Learning Pipelineに対して,30kの検証可能な数学問題のデータセットをキュレートする。異なる段階のディフフェレント困難分布でデータやトレーニング設定を設定することにより、モデルのポテンシャルを徐々に解放し、トレーニング効率を向上する。広範囲な評価の結果,Compass-Thinker-7Bは例外的推論能力を持ち,同じ大きさのRLモデルよりも優れた性能を発揮し,特にAIME2024評価において,コンパス-Thinker-7Bは40%の精度で精度を達成している。

論文の概要: Compass-Thinker-7B Technical Report

関連論文リスト