Fugu-MT 論文翻訳(概要): Compass-Thinker-7B Technical Report

論文の概要: Compass-Thinker-7B Technical Report

arxiv url: http://arxiv.org/abs/2508.08909v2
Date: Thu, 14 Aug 2025 07:12:38 GMT
ステータス: 翻訳完了
システム内更新日: 2025-08-15 13:42:23.647242
Title: Compass-Thinker-7B Technical Report
Title（参考訳）: Compass-Thinker-7B 技術報告
Authors: Anxiang Zeng, Haibo Zhang, Kaixiang Mo, Long Zhang, Shuman Liu, Yanhui Huang, Yawen Liu, Yuepeng Sheng, Yuwei Huang,
Abstract要約: 計算資源とコストの少ない強化学習の可能性を検討するために,Compass-Thinker-7Bモデルを提案する。 Compass-Thinker-7Bは、特別に設計されたReinforcement Learning Pipelineを通じて、オープンソースモデルからトレーニングされている。我々はCompass-Thinker-7Bが例外的推論能力を有しており、同じ大きさのRLモデルと比較して数学において優れた性能を発揮することを示す。
参考スコア（独自算出の注目度）: 8.496143273813718
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent R1-Zero-like research further demonstrates that reasoning extension has given large language models (LLMs) unprecedented reasoning capabilities, and Reinforcement Learning is the core technology to elicit its complex reasoning. However, conducting RL experiments directly on hyperscale models involves high computational costs and resource demands, posing significant risks. We propose the Compass-Thinker-7B model, which aims to explore the potential of Reinforcement Learning with less computational resources and costs, and provides insights for further research into RL recipes for larger models. Compass-Thinker-7B is trained from an open source model through a specially designed Reinforcement Learning Pipeline. We curate a dataset of 30k verifiable mathematics problems for the Reinforcement Learning Pipeline. By configuring data and training settings with different difficulty distributions for different stages, the potential of the model is gradually released and the training efficiency is improved. Extensive evaluations show that Compass-Thinker-7B possesses exceptional reasoning potential, and achieves superior performance on mathematics compared to the same-sized RL model. Especially in the challenging AIME2024 evaluation, Compass-Thinker-7B achieves 40% accuracy.
Abstract（参考訳）: 最近のR1-Zeroライクな研究は、推論拡張が大きな言語モデル(LLM)に前例のない推論能力を与え、強化学習がその複雑な推論を引き出す中核技術であることを示している。しかし、超大規模モデル上で直接RL実験を行うには、高い計算コストとリソース要求が伴い、重大なリスクが生じる。本稿では,より少ない計算資源とコストで強化学習の可能性を探究することを目的としたCompass-Thinker-7Bモデルを提案する。 Compass-Thinker-7Bは、特別に設計されたReinforcement Learning Pipelineを通じて、オープンソースモデルからトレーニングされている。強化学習パイプラインのための算数問題30kのデータセットをキュレートする。異なる段階の難易度分布でデータやトレーニング設定を設定することにより、モデルのポテンシャルを徐々に解放し、トレーニング効率を向上させる。大規模な評価の結果,Compass-Thinker-7Bは例外的な推論能力を有しており,同じサイズのRLモデルと比較して,数学において優れた性能を発揮することがわかった。特に、挑戦的なAIME2024評価では、Compass-Thinker-7Bは40%の精度を実現している。

論文の概要: Compass-Thinker-7B Technical Report

関連論文リスト