Fugu-MT 論文翻訳(概要): Stein-Rule Shrinkage for Stochastic Gradient Estimation in High Dimensions

論文の概要: Stein-Rule Shrinkage for Stochastic Gradient Estimation in High Dimensions

arxiv url: http://arxiv.org/abs/2602.01777v1
Date: Mon, 02 Feb 2026 08:01:13 GMT
ステータス: 翻訳完了
システム内更新日: 2026-02-03 19:28:33.998723
Title: Stein-Rule Shrinkage for Stochastic Gradient Estimation in High Dimensions
Title（参考訳）: 高次元の確率勾配推定のためのスタイン・ルール収縮
Authors: M. Arashi, M. Amintoosi,
Abstract要約: 高次元設定では、偏りのない推定子は一般に二次的損失の下では許容できない。我々は、安定な制限された推定器に対して雑音の多いミニバッチ勾配を適応的に収縮する縮小推定器を構築する。この推定器は、誤差損失下での標準勾配を均一に支配し、古典的決定論的な意味では最小値最適であることを示す。
参考スコア（独自算出の注目度）: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Stochastic gradient methods are central to large-scale learning, yet their analysis typically treats mini-batch gradients as unbiased estimators of the population gradient. In high-dimensional settings, however, classical results from statistical decision theory show that unbiased estimators are generally inadmissible under quadratic loss, suggesting that standard stochastic gradients may be suboptimal from a risk perspective. In this work, we formulate stochastic gradient computation as a high-dimensional estimation problem and introduce a decision-theoretic framework based on Stein-rule shrinkage. We construct a shrinkage gradient estimator that adaptively contracts noisy mini-batch gradients toward a stable restricted estimator derived from historical momentum. The shrinkage intensity is determined in a data-driven manner using an online estimate of gradient noise variance, leveraging second-moment statistics commonly maintained by adaptive optimization methods. Under a Gaussian noise model and for dimension p>=3, we show that the proposed estimator uniformly dominates the standard stochastic gradient under squared error loss and is minimax-optimal in the classical decision-theoretic sense. We further demonstrate how this estimator can be incorporated into the Adam optimizer, yielding a practical algorithm with negligible additional computational cost. Empirical evaluations on CIFAR10 and CIFAR100, across multiple levels of label noise, show consistent improvements over Adam in the large-batch regime. Ablation studies indicate that the gains arise primarily from selectively applying shrinkage to high-dimensional convolutional layers, while indiscriminate shrinkage across all parameters degrades performance. These results illustrate that classical shrinkage principles provide a principled and effective approach to improving stochastic gradient estimation in modern deep learning.
Abstract（参考訳）: 確率勾配法は大規模学習の中心であるが、その分析は小バッチ勾配を人口勾配の偏りのない推定因子として扱うのが一般的である。しかし、高次元の設定では、統計的決定理論による古典的な結果から、偏りのない推定器は一般に二次的損失の下では許容できないことが示され、標準確率勾配はリスクの観点からは最適である可能性が示唆された。本研究では,確率勾配計算を高次元推定問題として定式化し,スタインルールの縮退に基づく決定理論の枠組みを導入する。我々は,歴史運動量から導かれる安定な制限された推定器に対して,雑音の多いミニバッチ勾配を適応的に縮退する縮退勾配推定器を構築した。縮小強度は、適応最適化法によって一般に維持される第2モーメント統計を利用して、勾配雑音分散のオンライン推定を用いてデータ駆動方式で決定される。ガウス雑音モデルと次元 p>=3 では,提案した推定器が二乗誤差損失下での標準確率勾配を均一に支配し,古典的決定論的な意味で最小値であることを示す。さらに,この推定器をAdamオプティマイザに組み込む方法を示す。 CIFAR10とCIFAR100の複数レベルのラベルノイズに対する実験的な評価は、大規模バッチ方式におけるAdamに対する一貫した改善を示している。アブレーション研究は、主に高次元の畳み込み層に収縮を選択的に適用することから生じるが、全てのパラメータにわたる収縮は性能を低下させることを示している。これらの結果は、古典的収縮原理が、現代のディープラーニングにおける確率的勾配推定を改善するための原則的かつ効果的なアプローチを提供することを示している。

論文の概要: Stein-Rule Shrinkage for Stochastic Gradient Estimation in High Dimensions

関連論文リスト