Fugu-MT 論文翻訳(概要): Heavy-tailed Linear Bandits: Adversarial Robustness, Best-of-both-worlds, and Beyond

論文の概要: Heavy-tailed Linear Bandits: Adversarial Robustness, Best-of-both-worlds, and Beyond

arxiv url: http://arxiv.org/abs/2508.13679v1
Date: Tue, 19 Aug 2025 09:28:51 GMT
ステータス: 翻訳完了
システム内更新日: 2025-08-20 15:36:31.879278
Title: Heavy-tailed Linear Bandits: Adversarial Robustness, Best-of-both-worlds, and Beyond
Title（参考訳）: ヘビーテールのリニアバンド:敵対的ロバスト性、ベスト・オブ・ボトム・ワールド、そしてそれ以上
Authors: Canzhe Zhao, Shinji Ito, Shuai Li,
Abstract要約: 本稿では,逆方向の重み付きバンディット問題に対する一般的な枠組みを提案する。重み付きMABのためのFTRL-type Best-of-both-worlds (BOBW)アルゴリズムを考案した。また,HT-SPM(Hythitheavy-tailed noise aware stability-penalty matching)と呼ばれる一般的なデータ依存学習率を提案する。
参考スコア（独自算出の注目度）: 36.287082425317934
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Heavy-tailed bandits have been extensively studied since the seminal work of \citet{Bubeck2012BanditsWH}. In particular, heavy-tailed linear bandits, enabling efficient learning with both a large number of arms and heavy-tailed noises, have recently attracted significant attention \citep{ShaoYKL18,XueWWZ20,ZhongHYW21,Wang2025heavy,tajdini2025improved}. However, prior studies focus almost exclusively on stochastic regimes, with few exceptions limited to the special case of heavy-tailed multi-armed bandits (MABs) \citep{Huang0H22,ChengZ024,Chen2024uniINF}. In this work, we propose a general framework for adversarial heavy-tailed bandit problems, which performs follow-the-regularized-leader (FTRL) over the loss estimates shifted by a bonus function. Via a delicate setup of the bonus function, we devise the first FTRL-type best-of-both-worlds (BOBW) algorithm for heavy-tailed MABs, which does not require the truncated non-negativity assumption and achieves an $\widetilde{O}(T^{\frac{1}{\varepsilon}})$ worst-case regret in the adversarial regime as well as an $\widetilde{O}(\log T)$ gap-dependent regret in the stochastic regime. We then extend our framework to the linear case, proposing the first algorithm for adversarial heavy-tailed linear bandits with finite arm sets. This algorithm achieves an $\widetilde{O}(d^{\frac{1}{2}}T^{\frac{1}{\varepsilon}})$ regret, matching the best-known worst-case regret bound in stochastic regimes. Moreover, we propose a general data-dependent learning rate, termed \textit{heavy-tailed noise aware stability-penalty matching} (HT-SPM). We prove that HT-SPM guarantees BOBW regret bounds for general heavy-tailed bandit problems once certain conditions are satisfied. By using HT-SPM and, in particular, a variance-reduced linear loss estimator, we obtain the first BOBW result for heavy-tailed linear bandits.
Abstract（参考訳）: 重尾の包帯は \citet{Bubeck2012BanditsWH} の精巧な研究から広く研究されている。特に、重尾の線形包帯は、多数の腕と重尾の騒音の両方で効率的な学習を可能にしており、最近、大きな注目を集めている。しかし、以前の研究は確率的レジームにのみ焦点をあてていたが、例外はヘビーテールの多腕バンディット (MABs) \citep{Huang0H22,ChengZ024,Chen2024uniINF} の特殊なケースに限られる。本研究では,ボーナス関数にシフトした損失推定値に対して,FTRL(Regularized-Leading-the-Regularized-Leading)を実行する逆重み付きバンディット問題に対する一般的な枠組みを提案する。ボーナス関数の微妙な設定により、重み付きMABに対する最初のFTRL型ベスト・オブ・ボス・ワールド(BOBW)アルゴリズムを考案するが、これは切り詰められた非負性仮定を必要とせず、対数的体制における最悪の後悔と、確率的体制におけるギャップ依存の後悔を$\widetilde{O}(T^{\frac{1}{\varepsilon}})を達成できる。次に、我々のフレームワークを線形ケースに拡張し、有限のアームセットを持つ逆重み付き線形バンドレットに対する最初のアルゴリズムを提案する。このアルゴリズムは$\widetilde{O}(d^{\frac{1}{2}}T^{\frac{1}{\varepsilon}})$ regretを達成する。さらに,HT-SPM(Hit-Heavy-tailed noise aware stability-penalty matching)と呼ばれる一般データ依存型学習率を提案する。 HT-SPMは、ある条件が満たされれば、一般的な重み付きバンディット問題に対してBOBWの後悔境界を保証することを証明している。 HT-SPM, 特に分散還元線形損失推定器を用いて, 重み付き線形包帯に対する最初のBOBW値を求める。

論文の概要: Heavy-tailed Linear Bandits: Adversarial Robustness, Best-of-both-worlds, and Beyond

関連論文リスト