Fugu-MT 論文翻訳(概要): Regret Lower Bound and Optimal Algorithm for High-Dimensional Contextual Linear Bandit

論文の概要: Regret Lower Bound and Optimal Algorithm for High-Dimensional Contextual Linear Bandit

arxiv url: http://arxiv.org/abs/2109.11612v1
Date: Thu, 23 Sep 2021 19:35:38 GMT
ステータス: 翻訳完了
システム内更新日: 2021-09-28 05:57:12.768111
Title: Regret Lower Bound and Optimal Algorithm for High-Dimensional Contextual Linear Bandit
Title（参考訳）: 高次元文脈線形バンディットに対する後悔の下限と最適アルゴリズム
Authors: Ke Li, Yun Yang, Naveen N. Narisetty
Abstract要約: 我々は、累積後悔に対して、$mathcalObig((log d)fracalpha+12Tfrac1-alpha2+log Tbig)$をミニマックス下界として証明する。第2に,汎用的なアッパー信頼境界(UCB)戦略に着想を得た,シンプルで効率的なアルゴリズムを提案する。
参考スコア（独自算出の注目度）: 10.604939762790517
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In this paper, we consider the multi-armed bandit problem with high-dimensional features. First, we prove a minimax lower bound, $\mathcal{O}\big((\log d)^{\frac{\alpha+1}{2}}T^{\frac{1-\alpha}{2}}+\log T\big)$, for the cumulative regret, in terms of horizon $T$, dimension $d$ and a margin parameter $\alpha\in[0,1]$, which controls the separation between the optimal and the sub-optimal arms. This new lower bound unifies existing regret bound results that have different dependencies on T due to the use of different values of margin parameter $\alpha$ explicitly implied by their assumptions. Second, we propose a simple and computationally efficient algorithm inspired by the general Upper Confidence Bound (UCB) strategy that achieves a regret upper bound matching the lower bound. The proposed algorithm uses a properly centered $\ell_1$-ball as the confidence set in contrast to the commonly used ellipsoid confidence set. In addition, the algorithm does not require any forced sampling step and is thereby adaptive to the practically unknown margin parameter. Simulations and a real data analysis are conducted to compare the proposed method with existing ones in the literature.
Abstract（参考訳）: 本稿では,高次元特徴を持つマルチアームバンディット問題を考察する。まず、最小値の下限である$\mathcal{o}\big((\log d)^{\frac{\alpha+1}{2}}t^{\frac{1-\alpha}{2}}+\log t\big)$ を証明し、累積後悔に対して、ホライズン$t$、次元$d$、マージンパラメータ$\alpha\in[0,1]$ を用いて、最適アームと準最適アームの分離を制御する。この新しい下限は、マージンパラメータの異なる値である$\alpha$の使用によって、t に異なる依存性を持つ既存の後悔の束縛結果を統一する。第2に,下界にマッチする後悔の上界を実現する汎用上界境界(UCB)戦略に着想を得た,単純で効率的なアルゴリズムを提案する。提案アルゴリズムは、一般的に使用される楕円体信頼セットとは対照的な信頼セットとして、適切な中心となる$\ell_1$-ballを使用する。さらに、このアルゴリズムは強制サンプリングステップを必要とせず、実質的に未知のマージンパラメータに適応する。提案手法を既存の文献と比較するためにシミュレーションと実データ解析を行った。

関連論文リスト

LC-Tsallis-INF: Generalized Best-of-Both-Worlds Linear Contextual Bandits [38.41164102066483]
本研究では、独立かつ同一に分散したコンテキストを持つ線形文脈帯域問題について考察する。提案アルゴリズムは、Tsallisエントロピーを持つFollow-The-Regularized-Leaderに基づいており、$alpha$-textual-Con (LC)-Tsallis-INFと呼ばれている。
論文参考訳（メタデータ） (2024-03-05T18:59:47Z)
Nearly Minimax Optimal Regret for Learning Linear Mixture Stochastic Shortest Path [80.60592344361073]
線形混合遷移カーネルを用いた最短経路(SSP)問題について検討する。エージェントは繰り返し環境と対話し、累積コストを最小化しながら特定の目標状態に到達する。既存の作業は、イテレーションコスト関数の厳密な下限や、最適ポリシーに対する期待長の上限を仮定することが多い。
論文参考訳（メタデータ） (2024-02-14T07:52:00Z)
Efficient Frameworks for Generalized Low-Rank Matrix Bandit Problems [61.85150061213987]
一般化線形モデル (GLM) フレームワークを用いて, citelu2021low で提案した一般化低ランク行列帯域問題について検討する。既存のアルゴリズムの計算不可能性と理論的制約を克服するため,まずG-ESTTフレームワークを提案する。 G-ESTT は $tildeO(sqrt(d_1+d_2)3/2Mr3/2T)$ bound of regret を達成でき、G-ESTS は $tildeO を達成できることを示す。
論文参考訳（メタデータ） (2024-01-14T14:14:19Z)
Pseudonorm Approachability and Applications to Regret Minimization [73.54127663296906]
我々は、高次元 $ell_infty$-approachability 問題を、低次元の擬ノルムアプローチ可能性問題に変換する。我々は、$ell$や他のノルムに対するアプローチ可能性に関する以前の研究に類似した疑似ノルムアプローチ可能性のアルゴリズム理論を開発する。
論文参考訳（メタデータ） (2023-02-03T03:19:14Z)
Best Policy Identification in Linear MDPs [70.57916977441262]
縮退した線形マルコフ+デルタ決定における最適同定問題について, 生成モデルに基づく固定信頼度設定における検討を行った。複雑な非最適化プログラムの解としての下位境界は、そのようなアルゴリズムを考案する出発点として用いられる。
論文参考訳（メタデータ） (2022-08-11T04:12:50Z)
Gaussian Process Bandit Optimization with Few Batches [49.896920704012395]
有限腕バンディットアルゴリズムにインスパイアされたバッチアルゴリズムを導入する。 O(log T)$ batches in time horizon $T$.sqrtTgamma_T)$ using $O(log T)$ batches in time horizon。さらに,アルゴリズムの修正版を提案し,バッチ数によって後悔がどう影響するかを特徴付ける。
論文参考訳（メタデータ） (2021-10-15T00:54:04Z)
Explicit Best Arm Identification in Linear Bandits Using No-Regret Learners [17.224805430291177]
線形パラメータ化マルチアームバンドにおけるベストアーム識別の問題について検討する。そこで本研究では,この問題を解決するために,明示的に実装可能かつ証明可能な順序-最適サンプル-複雑度アルゴリズムを提案する。
論文参考訳（メタデータ） (2020-06-13T05:00:01Z)
Differentiable Linear Bandit Algorithm [6.849358422233866]
アッパー信頼境界は、線形多腕バンディット問題の最も一般的な方法である。勾配上昇による信頼度を学習できる勾配推定器を導入する。提案アルゴリズムは,腕の特徴の次元を$d$で,信頼度を$hatbeta$で学習したサイズを$tildemathcalO(hatbetasqrtdT)$上限とする。
論文参考訳（メタデータ） (2020-06-04T16:43:55Z)
Naive Exploration is Optimal for Online LQR [49.681825576239355]
最適後悔尺度は$widetildeTheta(sqrtd_mathbfu2 d_mathbfx T)$で、$T$は時間ステップの数、$d_mathbfu$は入力空間の次元、$d_mathbfx$はシステム状態の次元である。我々の下界は、かつての$mathrmpoly(logT)$-regretアルゴリズムの可能性を排除する。
論文参考訳（メタデータ） (2020-01-27T03:44:54Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。