Fugu-MT 論文翻訳(概要): Training with Fewer Bits: Unlocking Edge LLMs Training with Stochastic Rounding

論文の概要: Training with Fewer Bits: Unlocking Edge LLMs Training with Stochastic Rounding

arxiv url: http://arxiv.org/abs/2511.00874v1
Date: Sun, 02 Nov 2025 09:49:34 GMT
ステータス: 翻訳完了
システム内更新日: 2025-11-05 16:37:26.974452
Title: Training with Fewer Bits: Unlocking Edge LLMs Training with Stochastic Rounding
Title（参考訳）: 低ビットによるトレーニング:確率ラウンドリングによるエッジLDMのアンロック
Authors: Taowen Liu, Marta Andronic, Deniz Gündüz, George A. Constantinides,
Abstract要約: 量子化トレーニングは計算とメモリ効率を改善するが、量子化ノイズを導入する。バッチサイズの増加は、バックプロパゲーション時の精度の低下を補うことができることを示す。また、重みとアクティベーションの定量化が、異なる方法で勾配のばらつきに影響を与えることも示している。
参考スコア（独自算出の注目度）: 37.30928503608494
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: LLM training is resource-intensive. Quantized training improves computational and memory efficiency but introduces quantization noise, which can hinder convergence and degrade model accuracy. Stochastic Rounding (SR) has emerged as a theoretically attractive alternative to deterministic rounding, offering unbiased gradient estimates. However, its interaction with other training factors -- especially batch size -- remains under explored. In this paper, we present a theoretical and empirical study of mini-batch stochastic gradient descent (SGD) with SR, showing that increased batch sizes can compensate for reduced precision during back-propagation. Furthermore, we show that quantizing weights and activations impacts gradient variance in distinct ways. Our experiments validate these theoretical insights.
Abstract（参考訳）: LLMトレーニングはリソース集約型です。量子化トレーニングは計算とメモリ効率を改善するが、量子化ノイズを導入し、収束を妨げ、モデルの精度を低下させる。確率的ラウンドリング(SR)は、決定論的ラウンドリングの理論的に魅力的な代替品として現れ、偏りのない勾配推定を提供する。しかし、他のトレーニングファクター(特にバッチサイズ)とのインタラクションはまだ検討中である。本稿では、SRを用いたミニバッチ確率勾配降下(SGD)の理論的および実証的研究を行い、バッチサイズの増加がバックプロパゲーションの精度の低下を補うことを示した。さらに、重みとアクティベーションの定量化が、異なる方法で勾配のばらつきに影響を及ぼすことを示す。我々の実験はこれらの理論的洞察を検証した。

論文の概要: Training with Fewer Bits: Unlocking Edge LLMs Training with Stochastic Rounding

関連論文リスト