Fugu-MT 論文翻訳(概要): Increasing Missingness to Reduce Bias: Richardson-SGD with Missing Data

論文の概要: Increasing Missingness to Reduce Bias: Richardson-SGD with Missing Data

arxiv url: http://arxiv.org/abs/2605.19641v1
Date: Tue, 19 May 2026 10:24:58 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-20 15:03:09.279464
Title: Increasing Missingness to Reduce Bias: Richardson-SGD with Missing Data
Title（参考訳）: バイオマス削減に欠如が増す - Richardson-SGD と欠落データ
Authors: Ferdinand Genans, Erwan Scornet,
Abstract要約: すべてのパラメトリックモデルが、様々な計算手順に類似した勾配バイアスを示すことを証明している。本稿では, 勾配降下法(SGD)の簡易な脱バイアス法を提案する。我々は、いくつかの欠損シナリオにおいて、勾配バイアスが$O(|p|)$から$O(|p|2)$に減少することを証明する。
参考スコア（独自算出の注目度）: 36.564317791867644
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Stochastic gradient methods are central to modern large-scale learning, but their use with incomplete covariates remains delicate since imputation schemes generally introduce systematic gradient biases, as shown for linear models. In this work, we prove that all parametric models exhibit similar gradient bias for various imputation procedures and characterize exactly the dependence on the missingness ratio vector $p$, with $O(\|p\|)$ as the leading term. We exploit this analysis to propose a simple debiasing procedure for stochastic gradient descent (SGD) with missing values based on Richardson extrapolation, which leverages the exact expression of the gradient bias. The key idea is to \emph{deliberately add missingness}: from an already incomplete observation, we generate a further-thinned version at a higher, controlled missingness level, and combine the two resulting stochastic gradients to cancel the leading bias term. We prove that one Richardson step reduces the gradient bias from $O(\|p\|)$ to $O(\|p\|^2)$ under several missingness scenarios. Our proposed method is computationally efficient, model-agnostic and applies to any parametric loss whose stochastic gradient can be computed after imputation. Furthermore, when missing indicators are independent, the population gradient bias is a multilinear polynomial in $p$ and depends only on population gradient errors induced by declaring a single coordinate missing. In this case, our method generalizes to a multi-step Richardson procedure which recursively cancels higher-order terms. Empirically, Richardson debiasing improves optimization and estimation across several generalized linear models and combines positively with widely used imputation procedures such as MICE. These results suggest that, somewhat counter-intuitively, adding controlled missingness on top of existing missing data can make stochastic learning from incomplete data more accurate.
Abstract（参考訳）: 確率的勾配法は現代の大規模学習の中心であるが、不完全共変体での使用は、線形モデルに示すように、一般に計算スキームは系統的な勾配バイアスをもたらすため、微妙なままである。本研究では、全てのパラメトリックモデルが様々な計算手順に類似した勾配バイアスを示し、不足比ベクトル$p$への依存を正確に特徴づけ、$O(\|p\|)$を主項とする。本稿では,リチャードソン外挿法に基づく確率勾配降下法(SGD)の簡易なデバイアス化手法を提案する。キーとなる考え方は、既に不完全である観察から、より高次で制御された欠損度でさらに薄いバージョンを生成し、結果として生じる2つの確率勾配を組み合わせて、先行バイアス項をキャンセルすることである。 1つのリチャードソンステップは、いくつかの欠損シナリオの下で、勾配バイアスを$O(\|p\|)$から$O(\|p\|^2)$に減少させる。提案手法は計算効率が高く, モデルに依存しず, 確率勾配が計算された任意のパラメトリック損失に適用できる。さらに、欠落指標が独立である場合、集団勾配バイアスは$p$の多重線型多項式であり、単一の座標の欠落を宣言することによって生じる集団勾配誤差にのみ依存する。この場合、本手法は高次項を再帰的にキャンセルするマルチステップのリチャードソン手順に一般化する。経験的に、Richardson debiasingはいくつかの一般化された線形モデルにおける最適化と推定を改善し、MICEのような広く使われている計算手順と正に結合する。これらの結果から,既存データに制御不能さを加えることで,不完全データからの確率的学習がより正確になる可能性が示唆された。

論文の概要: Increasing Missingness to Reduce Bias: Richardson-SGD with Missing Data

関連論文リスト