Fugu-MT 論文翻訳(概要): L20-Edu-135M: An Auditable Single-GPU Study of Data-Efficient Small Language Modeling

論文の概要: L20-Edu-135M: An Auditable Single-GPU Study of Data-Efficient Small Language Modeling

arxiv url: http://arxiv.org/abs/2606.22189v1
Date: Sat, 20 Jun 2026 18:42:37 GMT
ステータス: 情報取得中
システム内更新日: 2026-06-23 14:58:28.850303
Title: L20-Edu-135M: An Auditable Single-GPU Study of Data-Efficient Small Language Modeling
Title（参考訳）: L20-Edu-135M:データ効率の良い小言語モデリングの単一GPUによる検討
Authors: Yin Li,
Abstract要約: 小型の言語モデルは安価で、ローカルハードウェアで利用できる。強いパブリックな135Mクラスのシステムは、数十億から数兆のトークンで訓練されている。
参考スコア（独自算出の注目度）: 6.901585308625979
License:
Abstract: Small language models are cheap to serve and feasible on local hardware, but strong public 135M-class systems are commonly trained with hundreds of billions to trillions of tokens on large clusters. We study a sharply resource-constrained regime: a complete 134.5M-parameter language-model pipeline executed on one NVIDIA L20 GPU. The released checkpoint, L20-Edu-135M, receives approximately 13B pretraining tokens: 10B FineWeb-Edu tokens followed by a 3B-token educational, mathematics, code, and reasoning mixture. We document the architecture, data gates, cross-source MinHash/LSH near-deduplication, segment deduplication, benchmark-overlap removal, throughput optimization, supervised fine-tuning (SFT) with weight interpolation, and reinforcement learning from verifiable rewards (RLVR) on GSM8K. In a self-run zero-shot six-task harness, L20-Edu-135M obtains a mean score of 0.4150. It trails SmolLM-135M (0.4767) and SmolLM2-135M (0.4917), but its mean is 87.1% of SmolLM-135M's while its nominal token count is 2.17% as large. This ratio is descriptive, not evidence of statistical equivalence or a controlled scaling law. The model exceeds several older 100M-160M public baselines under the same harness. Direct GRPO-style RLVR decreases GSM8K exact-match accuracy from 1.82% to 1.59% (192-token completions) and 1.21% (320-token completions). These single-run results identify a concrete failure mode rather than establishing a general lower bound on RLVR. The contribution is an auditable resource-constrained case study, not a state-of-the-art claim.
Abstract（参考訳）: 小さな言語モデルはローカルハードウェアでは安価に提供でき、実現可能ですが、強力なパブリックな135Mクラスのシステムは、数十億から数十兆のトークンでトレーニングされています。 1つのNVIDIA L20 GPU上で実行される134.5Mパラメータ言語モデルパイプラインについて,資源制約の厳しいシステムについて検討する。リリースされたチェックポイントであるL20-Edu-135Mは、約13Bの事前トレーニングトークンを受け取っている。我々は,アーキテクチャ,データゲート,オープンソースMinHash/LSH準重複,セグメント重複除去,ベンチマークオーバーラップ除去,スループット最適化,重み補間による教師付き微調整(SFT),GSM8K上の検証可能な報酬(RLVR)からの強化学習を文書化する。自走式ゼロショット6タスクハーネスでは、L20-Edu-135Mの平均スコアは0.4150である。 SmolLM-135M (0.4767)とSmolLM2-135M (0.4917)に続くが、その平均値はSmolLM-135Mの87.1%であり、名目上のトークン数は2.17%である。この比率は記述的であり、統計的等価性や規制されたスケーリング法則の証拠ではない。このモデルは、同じハーネスの下で、より古い100M-160Mのベースラインを数回超えている。 GRPOスタイルのRLVRはGSM8Kの精度を1.82%から1.59%(192-token completions)、1.21%(320-token completions)に下げる。これらの単一実行結果は、RLVRの一般的な下限を確立するのではなく、具体的な障害モードを特定する。この貢献は、監査可能なリソース制約のあるケーススタディであり、最先端のクレームではない。

論文の概要: L20-Edu-135M: An Auditable Single-GPU Study of Data-Efficient Small Language Modeling

関連論文リスト