Fugu-MT 論文翻訳(概要): Variance-Optimal Arm Selection: Regret Minimization and Best Arm Identification

論文の概要: Variance-Optimal Arm Selection: Regret Minimization and Best Arm Identification

arxiv url: http://arxiv.org/abs/2505.11985v2
Date: Tue, 20 May 2025 17:01:38 GMT
ステータス: 翻訳完了
システム内更新日: 2025-05-21 14:49:52.256853
Title: Variance-Optimal Arm Selection: Regret Minimization and Best Arm Identification
Title（参考訳）: 可変最適アーム選択:レグレット最小化とベストアーム識別
Authors: Sabrina Khurshid, Gourab Ghatak, Mohammad Shahid Abdulla,
Abstract要約: 我々は、後悔設定のためのtextttUCB-VV と呼ばれるオンラインアルゴリズムを開発し、制限付き報酬に対する後悔の上限が $mathcalOleft(lognright)$として進化することを示す。我々は, 試料分散に対する新しい濃度不等式を用いて, フレームワークを有界分布から準ガウス分布に拡張する。
参考スコア（独自算出の注目度）: 3.5502600490147196
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This paper focuses on selecting the arm with the highest variance from a set of $K$ independent arms. Specifically, we focus on two settings: (i) regret setting, that penalizes the number of pulls of suboptimal arms in terms of variance, and (ii) fixed-budget BAI setting, that evaluates the ability of an algorithm to determine the arm with the highest variance after a fixed number of pulls. We develop a novel online algorithm called \texttt{UCB-VV} for the regret setting and show that its upper bound on regret for bounded rewards evolves as $\mathcal{O}\left(\log{n}\right)$ where $n$ is the horizon. By deriving the lower bound on the regret, we show that \texttt{UCB-VV} is order optimal. For the fixed budget BAI setting, we propose the \texttt{SHVV} algorithm. We show that the upper bound of the error probability of \texttt{SHVV} evolves as $\exp\left(-\frac{n}{\log(K) H}\right)$, where $H$ represents the complexity of the problem, and this rate matches the corresponding lower bound. We extend the framework from bounded distributions to sub-Gaussian distributions using a novel concentration inequality on the sample variance. Leveraging the same, we derive a concentration inequality for the empirical Sharpe ratio (SR) for sub-Gaussian distributions, which was previously unknown in the literature. Empirical simulations show that \texttt{UCB-VV} consistently outperforms \texttt{$\epsilon$-greedy} across different sub-optimality gaps, though it is surpassed by \texttt{VTS}, which exhibits the lowest regret, albeit lacking in theoretical guarantees. We also illustrate the superior performance of \texttt{SHVV}, for a fixed budget setting under 6 different setups against uniform sampling. Finally, we conduct a case study to empirically evaluate the performance of the \texttt{UCB-VV} and \texttt{SHVV} in call option trading on $100$ stocks generated using geometric Brownian motion (GBM).
Abstract（参考訳）: 本稿は,K$独立アームの集合から最も分散度の高いアームを選択することに焦点を当てる。具体的には2つの設定に焦点を当てます。一相違の点において、下腕の引っ掛けの数を罰する後悔の設定、 (II)固定予算BAI設定は、一定数のプル後に最も分散したアームを決定するアルゴリズムの能力を評価するものである。後悔設定のための新しいオンラインアルゴリズムである「texttt{UCB-VV}」を開発し、その上限付き報酬に対する後悔の上限が$\mathcal{O}\left(\log{n}\right)$として進化することを示す。後悔の下位境界を導出することにより、 \texttt{UCB-VV} が最適であることを示す。固定予算BAI設定では, texttt{SHVV} アルゴリズムを提案する。我々は、 \texttt{SHVV} の誤差確率の上界が $\exp\left(-\frac{n}{\log(K) H}\right)$ として進化することを示す。我々は, 試料分散に対する新しい濃度不等式を用いて, フレームワークを有界分布から準ガウス分布に拡張する。これを利用して、ガウス以下の分布に対する経験的シャープ比(SR)の濃度不等式を導出した。経験的シミュレーションにより、 \texttt{UCB-VV} は、理論的な保証が欠如しているにもかかわらず、最も低い後悔を示す \texttt{UCB-VTS} に勝っているにもかかわらず、異なる準最適ギャップにおいて、一貫して \textt{$\epsilon$-greedy} を上回ることを示した。また, 均一サンプリングに対する6つの異なる設定条件下での固定予算設定において, \texttt{SHVV} の優れた性能を示す。最後に,幾何学的ブラウン運動 (GBM) を用いて生成した100ドル株のコールオプション取引において, \texttt{UCB-VV} と \texttt{SHVV} の性能を実証的に評価するケーススタディを行う。

論文の概要: Variance-Optimal Arm Selection: Regret Minimization and Best Arm Identification

関連論文リスト