Fugu-MT 論文翻訳(概要): Nanbeige4-3B Technical Report: Exploring the Frontier of Small Language Models

論文の概要: Nanbeige4-3B Technical Report: Exploring the Frontier of Small Language Models

arxiv url: http://arxiv.org/abs/2512.06266v1
Date: Sat, 06 Dec 2025 03:36:27 GMT
ステータス: 翻訳完了
システム内更新日: 2025-12-09 22:03:54.274748
Title: Nanbeige4-3B Technical Report: Exploring the Frontier of Small Language Models
Title（参考訳）: Nanbeige4-3B Technical Report: Exploring the Frontier of Small Language Models
Authors: Chen Yang, Guangyue Peng, Jiaying Zhu, Ran Le, Ruixiang Feng, Tao Zhang, Wei Ruan, Xiaoqi Liu, Xiaoxue Cheng, Xiyun Xu, Yang Song, Yanzipeng Gao, Yiming Jia, Yun Xing, Yuntao Wen, Zekai Wang, Zhenwei An, Zhicong Sun, Zongchao Chen,
Abstract要約: Nanbeige4-3Bは小型だが高性能な言語モデルである。 23Tの高品質トークンで事前訓練され、3000万以上の多様な命令に基づいて微調整され、小型言語モデルのスケーリング法則の境界を広げる。
参考スコア（独自算出の注目度）: 23.832817775138675
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We present Nanbeige4-3B, a family of small-scale but high-performing language models. Pretrained on 23T high-quality tokens and finetuned on over 30 million diverse instructions, we extend the boundary of the scaling law for small language models. In pre-training, we design a Fine-Grained Warmup-Stable-Decay (FG-WSD) training scheduler, which progressively refines data mixtures across stages to boost model performance. In post-training, to improve the quality of the SFT data, we design a joint mechanism that integrates deliberative generation refinement and chain-of-thought reconstruction, yielding substantial gains on complex tasks. Following SFT, we employ our flagship reasoning model to distill Nanbeige4-3B through our proposed Dual Preference Distillation (DPD) method, which leads to further performance gains. Finally, a multi-stage reinforcement learning phase was applied, leveraging verifiable rewards and preference modeling to strengthen abilities on both reasoning and human alignment. Extensive evaluations show that Nanbeige4-3B not only significantly outperforms models of comparable parameter scale but also rivals much larger models across a wide range of benchmarks. The model checkpoints are available at https://huggingface.co/Nanbeige.
Abstract（参考訳）: 小型だが高性能な言語モデルであるNanbeige4-3Bについて述べる。 23Tの高品質トークンで事前訓練され、3000万以上の多様な命令に基づいて微調整され、小型言語モデルのスケーリング法則の境界を広げる。プレトレーニングでは,FG-WSDトレーニングスケジューラを設計し,段階ごとのデータ混合を段階的に洗練し,モデル性能を向上する。本研究では,SFTデータの品質を向上させるために,検討世代改良とチェーン・オブ・コンストラクションを統合した共同機構を設計し,複雑なタスクに実質的な利得を与える。 SFT の後,本手法では,Nanbeige4-3B を蒸留するためのフラッグシップ推算モデルを用いて,提案手法であるDual Preference Distillation (DPD) を用いて,さらなる性能向上を実現している。最後に、検証可能な報酬と選好モデルを利用して、推論と人間のアライメントの両方の能力を強化する多段階強化学習フェーズを適用した。広範囲な評価により、Nanbeige4-3Bは、同等のパラメータスケールのモデルよりも優れているだけでなく、広範囲のベンチマークではるかに大きなモデルに匹敵することがわかった。モデルチェックポイントはhttps://huggingface.co/Nanbeige.comで入手できる。

論文の概要: Nanbeige4-3B Technical Report: Exploring the Frontier of Small Language Models

関連論文リスト