Fugu-MT 論文翻訳(概要): Scaling Laws in the Tiny Regime: How Small Models Change Their Mistakes

論文の概要: Scaling Laws in the Tiny Regime: How Small Models Change Their Mistakes

arxiv url: http://arxiv.org/abs/2603.07365v1
Date: Sat, 07 Mar 2026 22:25:45 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-10 15:13:14.321611
Title: Scaling Laws in the Tiny Regime: How Small Models Change Their Mistakes
Title（参考訳）: 小さめのモデルがいかにミスを犯すか
Authors: Mohammed Alnemari, Rizwan Qureshi, Nader Begrazadah,
Abstract要約: 2つのアーキテクチャで90モデル(22K--19.8Mパラメータ)をトレーニングします。どちらも誤差率の近似パワー則に従っている。小型モデルは簡単なクラスに集中する。検証はターゲットモデルサイズで行う必要があります。
参考スコア（独自算出の注目度）: 5.7241115867191175
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Neural scaling laws describe how model performance improves as a power law with size, but existing work focuses on models above 100M parameters. The sub-20M regime -- where TinyML and edge AI operate -- remains unexamined. We train 90 models (22K--19.8M parameters) across two architectures (plain ConvNet, MobileNetV2) on CIFAR-100, varying width while holding depth and training fixed. Both follow approximate power laws in error rate: $α= 0.156 \pm 0.002$ (ScaleCNN) and $α= 0.106 \pm 0.001$ (MobileNetV2) across five seeds. Since prior work fit cross-entropy loss rather than error rate, direct exponent comparison is approximate; with that caveat, these are 1.4--2x steeper than $α\approx 0.076$ for large language models. The power law does not hold uniformly: local exponents decay with scale, and MobileNetV2 saturates at 19.8M parameters ($α_{\mathrm{local}} = 0.006$). Error structure also changes. Jaccard overlap between error sets of the smallest and largest ScaleCNN is only 0.35 (25 seed pairs, $\pm 0.004$) -- compression changes which inputs are misclassified, not merely how many. Small models concentrate capacity on easy classes (Gini: 0.26 at 22K vs. 0.09 at 4.7M) while abandoning the hardest (bottom-5 accuracy: 10% vs. 53%). Counter to expectation, the smallest models are best calibrated (ECE = 0.013 vs. peak 0.110 at mid-size). Aggregate accuracy is therefore misleading for edge deployment; validation must happen at the target model size.
Abstract（参考訳）: ニューラルスケーリング法則は、モデルパフォーマンスがサイズによるパワー則としてどのように改善されるかを記述するが、既存の研究は100M以上のモデルに焦点を当てている。 TinyMLとエッジAIが運用する20M未満の政権は、依然として検討されていない。我々は、CIFAR-100上で90モデル(22K--19.8Mパラメータ)を2つのアーキテクチャ(Plain ConvNet, MobileNetV2)でトレーニングし、深さを保ち、トレーニングを修正した。どちらもエラーレートで近似パワー則に従う:$α = 0.156 \pm 0.002$ (ScaleCNN) と$α = 0.106 \pm 0.001$ (MobileNetV2) の5つのシードに対して。従来の作業はエラー率ではなく、クロスエントロピー損失に適合していたため、直接指数比較は近似されている。これは大きな言語モデルに対して、$α\approx 0.076$よりも1.4--2x急勾配である。この電力法則は均一に保持されない: 局所指数はスケールで崩壊し、MobileNetV2は19.8Mパラメータで飽和する(α_{\mathrm{local}} = 0.006$)。エラー構造も変更される。最小で最大のスケールCNNのエラーセット間のジャカードオーバーラップはわずか 0.35 (25シードペア、$\pm 0.004$) -- 入力が誤って分類された圧縮変化である。小型モデルは簡単なクラス(Gini: 0.26 at 22K vs. 0.7M)に集中し、最も難しいクラス(ボット5精度: 10% vs. 53%)を放棄する。予想に反して、最小のモデルは最も校正されている(ECE = 0.013 vs. ピーク0.110)。したがって、アグリゲートの精度はエッジの配置に誤解を招き、ターゲットモデルのサイズで検証を行わなければならない。

論文の概要: Scaling Laws in the Tiny Regime: How Small Models Change Their Mistakes

関連論文リスト