Fugu-MT 論文翻訳(概要): From One-Pass SGD to Data Reuse: Mini-Batch Scaling Laws in Sketched Linear Regression

論文の概要: From One-Pass SGD to Data Reuse: Mini-Batch Scaling Laws in Sketched Linear Regression

arxiv url: http://arxiv.org/abs/2605.24316v1
Date: Sat, 23 May 2026 00:48:36 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-26 19:50:17.927922
Title: From One-Pass SGD to Data Reuse: Mini-Batch Scaling Laws in Sketched Linear Regression
Title（参考訳）: ワンパスSGDからデータ再利用へ:スケッチリニア回帰におけるミニバッチスケーリング法則
Authors: Ziyan Chen, Ding-Xuan Zhou,
Abstract要約: 本研究では,1パスのバッチSGD,複数パスのバッチSGD,複数パスのバッチSGDを置換せずに解析する。 1パスのバッチSGDはバイアスと分散に分割される一方、2つのマルチパス法はGDバイアス、GD分散、および揺らぎ項に分割される。我々は,ワンパスおよびマルチパスのミニバッチ手法のソース条件スケーリング法を証明した。
参考スコア（独自算出の注目度）: 13.325673179579818
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Scaling laws provide compact descriptions of how prediction error varies with compute, model size, and data, but existing theory mainly treats single-sample SGD or full data reuse, leaving the role of mini-batching unclear. We study batch scaling laws for sketched linear regression under a power-law covariance spectrum and a source condition on the target parameter. We analyze one-pass batch SGD, multi-pass batch SGD with replacement, and multi-pass batch SGD without replacement. Our first result is a risk decomposition: all three procedures share the same irreducible and approximation terms, while their stochastic terms depend on the sampling protocol. One-pass batch SGD splits into bias and variance, whereas the two multi-pass methods split into GD bias, GD variance, and a fluctuation term around a common GD reference trajectory. We then prove source-condition scaling laws for one-pass and multi-pass mini-batch methods. For one-pass batch SGD, mini-batching preserves the approximation and optimization-bias exponents, while the variance scales as $O(\min(M,(T_{\mathrm{eff}}γ)^{1/a})/(B T_{\mathrm{eff}}))$. Thus the usual $1/B$ covariance reduction holds at fixed update count $T$, but in the one-pass regime $T=N/B$ it is partly offset by the shorter optimization horizon. For multi-pass batch SGD, with- and without-replacement sampling have identical approximation and GD bias/variance terms; they differ only in the fluctuation covariance prefactor, which is $1/B$ with replacement and $ρ_{N,B}=(N-B)/(B(N-1))$ without replacement. Hence without-replacement sampling is less noisy for $B>1$, and when $B=N$ the fluctuation vanishes, recovering deterministic gradient descent. These results place batch size on the same theoretical footing as compute, data, and model dimension in sketched linear regression.
Abstract（参考訳）: スケーリング法則は、予測エラーが計算、モデルサイズ、データによってどのように変化するかのコンパクトな記述を提供するが、既存の理論は主に単一サンプルのSGDまたは完全なデータ再利用を扱い、ミニバッチの役割は不明確である。提案手法は,パワーロー共分散スペクトルとターゲットパラメータのソース条件に基づいて,スケッチされた線形回帰のバッチスケーリング法について検討する。本研究では,1パスのバッチSGD,複数パスのバッチSGD,複数パスのバッチSGDを置換せずに解析する。最初の結果はリスク分解であり、3つの手順は同じ既約項と近似項を共有し、その確率項はサンプリングプロトコルに依存している。一方、2つのマルチパス法はGDバイアス、GD分散、および共通のGD参照軌道の周りのゆらぎ項に分けられる。次に,ワンパスおよびマルチパスのミニバッチ法に対して,ソース条件のスケーリング法則を証明した。 1パスのバッチSGDの場合、ミニバッチは近似と最適化バイアス指数を保ち、分散は$O(\min(M,(T_{\mathrm{eff}}γ)^{1/a})/(B T_{\mathrm{eff}})$である。したがって、通常の1/B$共分散還元は、固定更新数$T$で保持されるが、ワンパス方式$T=N/B$では、短い最適化水平線によって部分的にオフセットされる。マルチパスバッチSGDの場合、非置換型サンプリングと非置換型サンプリングは同じ近似とGDバイアス/分散項を持ち、置換した1/B$と置き換えた$ρ_{N,B}=(N-B)/(B(N-1))$のゆらぎ共分散プレファクタでのみ異なる。したがって,非置換サンプリングは$B>1$ではノイズが少なく,$B=N$ではゆらぎが消え,決定論的勾配降下が回復する。これらの結果は、スケッチされた線形回帰における計算、データ、モデル次元と同じ理論的基盤上にバッチサイズを配置する。

論文の概要: From One-Pass SGD to Data Reuse: Mini-Batch Scaling Laws in Sketched Linear Regression

関連論文リスト