Fugu-MT 論文翻訳(概要): Noise-Adaptive High-Probability Regret Bounds for Online Convex Optimization

論文の概要: Noise-Adaptive High-Probability Regret Bounds for Online Convex Optimization

arxiv url: http://arxiv.org/abs/2606.08028v1
Date: Sat, 06 Jun 2026 07:40:55 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-09 14:42:05.672871
Title: Noise-Adaptive High-Probability Regret Bounds for Online Convex Optimization
Title（参考訳）: オンライン凸最適化のための雑音適応型高確率レギュレータ境界
Authors: Wentao Zhang, Yutong Zhang, Wentao Mo,
Abstract要約: オンライン凸最適化 (OCO) における高い確率的後悔境界について, 強い凸損失を伴って検討した。雑音適応性, フィードバック構造, 制約満足度の交点において, オープンな質問を解決するための3つの結果を確立する。
参考スコア（独自算出の注目度）: 12.903796669387809
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We study high-probability regret bounds for online convex optimization (OCO) with strongly convex losses and establish three results that resolve open questions at the intersection of noise adaptivity, feedback structure, and constraint satisfaction. For the full-information setting with sub-Gaussian stochastic gradients, we prove a noise-adaptive high-probability regret bound in which the martingale deviation term scales with the noise level $σ$ rather than the gradient bound $G$, yielding a multiplicative improvement of $G/σ$ over the classical Azuma-Hoeffding baseline. Our analysis introduces an exponential supermartingale argument that bypasses the bounded-difference requirement of Freedman's inequality, enabling direct treatment of unbounded sub-Gaussian noise without truncation artifacts. For bandit feedback, we prove a minimax lower bound: the high-probability regret scales linearly in $\log(1/δ)$, in contrast to the $\sqrt{\log(1/δ)}$ confidence cost under full information. This constitutes a formal separation in the confidence cost of strongly convex OCO across feedback models. Regarding constrained OCO with stochastic constraints satisfying a Slater condition, we provide simultaneous high-probability guarantees for both cumulative regret and long-run constraint violation, achieving $\mathcal{O}(\sqrt{T\log(m/δ)})$ regret and $\mathcal{O}(\sqrt{T}/(ζδ) + m\sqrt{T\log(m/δ)})$ violation. Synthetic experiments corroborate all theoretical predictions.
Abstract（参考訳）: 我々は,オンライン凸最適化(OCO)における高確率後悔境界について,強い凸損失を伴って検討し,雑音適応性,フィードバック構造,制約満足度の交点におけるオープン質問を解決する3つの結果を確立する。ガウス以下の確率勾配を持つ完全情報集合に対して、マルティンゲール偏差項が勾配境界の$G$よりも$σ$でスケールし、古典的な東ホエーフディング基底線に対して$G/σ$の乗法的改善をもたらすような雑音適応的高確率的後悔境界を証明した。解析では,フリードマンの不等式の有界差要求を回避し,非ガウス雑音の非有界化を回避できるという指数関数的スーパーマーチンゲーレの議論を紹介した。バンドイットフィードバックでは、極小値の最小値が証明される: 高確率の後悔は、フル情報の下での信頼コストと対照的に$\log(1/δ)$で線形にスケールする。これは、フィードバックモデル間で強い凸OCOの信頼性コストを正式に分離する。 Slater 条件を満たす確率的制約を持つ制約付き OCO について、累積的後悔と長期的制約違反を同時に保証し、$\mathcal{O}(\sqrt{T\log(m/δ)})$ regret と $\mathcal{O}(\sqrt{T}/(\sqrt) + m\sqrt{T\log(m/δ)})$ violation を達成する。合成実験はすべての理論的予測を裏付ける。

論文の概要: Noise-Adaptive High-Probability Regret Bounds for Online Convex Optimization

関連論文リスト