Fugu-MT 論文翻訳(概要): Asymptotic Optimality of Thompson Sampling for Risk-Averse Bandits with Sub-Gaussian Rewards

論文の概要: Asymptotic Optimality of Thompson Sampling for Risk-Averse Bandits with Sub-Gaussian Rewards

arxiv url: http://arxiv.org/abs/2606.09191v1
Date: Mon, 08 Jun 2026 08:26:37 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-09 14:42:06.834063
Title: Asymptotic Optimality of Thompson Sampling for Risk-Averse Bandits with Sub-Gaussian Rewards
Title（参考訳）: サブガウス・リワードを伴うリスク・アバースバンドに対するトンプソンサンプリングの漸近的最適性
Authors: Joel Q. L. Chang,
Abstract要約: $text-mathrmNPTS_mathrmSG$はアンカーフリーの非パラメトリックトンプソンサンプリングアルゴリズムである。我々は、$text-mathrmNPTS_mathrmSG$が、$log n$の先頭の順にインスタンス依存の下位境界と一致することを証明した。
参考スコア（独自算出の注目度）: 1.370633147306388
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We prove that $ρ\text{-}\mathrm{NPTS}_{\mathrm{SG}}$, an anchor-free nonparametric Thompson Sampling algorithm for risk-averse bandits, achieves regret matching the instance-dependent lower bound to leading order in $\log n$, establishing it as asymptotically optimal for any continuous risk functional $ρ$ (CVaR, mean-variance, Sharpe ratio, distortion risk measures, and more) on the class of distributions with bounded density and sub-Gaussian tails, including Gaussian arms. Both this result and its bounded-support counterpart require only continuity of $ρ$: strictly weaker than the dominance condition of prior parametric Thompson Sampling results, and strictly weaker than the Lipschitz condition of UCB-type algorithms, yielding the first instance-optimal guarantees for non-Lipschitz functionals such as the Sharpe ratio without parametric reward assumptions. The bounded-support case is developed first as a stepping stone sharing the same proof structure. The key technical contributions are a discretisation lemma (bounded support) and a truncated discretisation lemma (sub-Gaussian tails), each projecting the growing-alphabet Dirichlet posterior onto a fixed grid via the Dirichlet aggregation property, holding all polynomial prefactors at fixed degree independent of sample size and breaking the super-exponential barrier that blocked prior proofs.
Abstract（参考訳）: リスク-逆バンディットに対するアンカーフリーな非パラメトリックトンプソンサンプリングアルゴリズムである$ρ\text{-}\mathrm{NPTS}_{\mathrm{SG}}$は、ガウスの腕を含む有界密度と亜ガウスの尾を持つ分布のクラスにおける任意の連続リスク汎関数 $ρ$(CVaR, 平均分散, シャープ比, 歪みリスク測度など)に対して漸近的に最適であることを示す。この結果とその有界サポートはともに$ρ$の連続性しか必要としない: 事前パラメトリックトンプソンサンプリング結果の優位条件よりも厳密に弱く、UTB型アルゴリズムのリプシッツ条件よりも厳密に弱く、パラメトリック報酬仮定のないシャープ比のような非リプシッツ函数に対する最初のインスタンス最適保証を与える。有界支持ケースは、まず、同じ証明構造を共有するステッピング石として開発される。主要な技術的貢献は、離散化補題(有界支持)と切り離された離散化補題(準ガウスの尾)であり、成長アルファベットのディリクレの後方をディリクレ集約特性を介して固定格子上に投影し、全ての多項式プレファクタを標本サイズに依存しない一定の程度に保持し、先行証明を妨害する超指数障壁を破る。

論文の概要: Asymptotic Optimality of Thompson Sampling for Risk-Averse Bandits with Sub-Gaussian Rewards

関連論文リスト