Fugu-MT 論文翻訳(概要): LC-Tsalis-INF: Generalized Best-of-Both-Worlds Linear Contextual Bandits

論文の概要: LC-Tsalis-INF: Generalized Best-of-Both-Worlds Linear Contextual Bandits

arxiv url: http://arxiv.org/abs/2403.03219v1
Date: Tue, 5 Mar 2024 18:59:47 GMT
ステータス: 翻訳完了
システム内更新日: 2024-03-06 13:44:41.450638
Title: LC-Tsalis-INF: Generalized Best-of-Both-Worlds Linear Contextual Bandits
Title（参考訳）: LC-Tsalis-INF:Best-of-Both-Worlds Linear Contextual Bandits
Authors: Masahiro Kato and Shinji Ito
Abstract要約: 本研究では、独立かつ同一に分散したコンテキストを持つ線形文脈帯域問題について考察する。提案アルゴリズムは、Tsallisエントロピーを持つFollow-The-Regularized-Leaderに基づいており、$alpha$-textual-Con (LC)-Tsallis-INFと呼ばれている。
参考スコア（独自算出の注目度）: 45.378265414553226
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This study considers the linear contextual bandit problem with independent and identically distributed (i.i.d.) contexts. In this problem, existing studies have proposed Best-of-Both-Worlds (BoBW) algorithms whose regrets satisfy $O(\log^2(T))$ for the number of rounds $T$ in a stochastic regime with a suboptimality gap lower-bounded by a positive constant, while satisfying $O(\sqrt{T})$ in an adversarial regime. However, the dependency on $T$ has room for improvement, and the suboptimality-gap assumption can be relaxed. For this issue, this study proposes an algorithm whose regret satisfies $O(\log(T))$ in the setting when the suboptimality gap is lower-bounded. Furthermore, we introduce a margin condition, a milder assumption on the suboptimality gap. That condition characterizes the problem difficulty linked to the suboptimality gap using a parameter $\beta \in (0, \infty]$. We then show that the algorithm's regret satisfies $O\left(\left\{\log(T)\right\}^{\frac{1+\beta}{2+\beta}}T^{\frac{1}{2+\beta}}\right)$. Here, $\beta= \infty$ corresponds to the case in the existing studies where a lower bound exists in the suboptimality gap, and our regret satisfies $O(\log(T))$ in that case. Our proposed algorithm is based on the Follow-The-Regularized-Leader with the Tsallis entropy and referred to as the $\alpha$-Linear-Contextual (LC)-Tsallis-INF.
Abstract（参考訳）: 本研究は,独立かつ同一分散(i.i.d.)コンテキストを持つ線形文脈バンディット問題を考える。この問題において、既存の研究は、正の定数で下界の準最適ギャップを持つ確率的状態におけるラウンド数$T$に対して、後悔を満足する$O(\log^2(T))$に対して$O(\sqrt{T})$を満たすBest-of-Both-Worlds (BoBW)アルゴリズムを提案した。しかし、$t$への依存は改善の余地があり、準最適性-ガップ仮定は緩和できる。そこで本研究では,サブオプティビティギャップが低く設定された場合に,後悔が$o(\log(t))$を満たすアルゴリズムを提案する。さらに, 下位最適ギャップに対するより穏やかな仮定であるマージン条件を導入する。この条件は、パラメータ $\beta \in (0, \infty]$ を用いて、最適以下のギャップに関連する問題を特徴づける。次に、アルゴリズムの後悔は$O\left(\left\{\log(T)\right\}^{\frac{1+\beta}{2+\beta}}T^{\frac{1}{2+\beta}}\right)$を満たすことを示す。ここで、$\beta= \infty$ は、下限が最適性ギャップに存在する既存の研究のケースに対応しており、その場合の後悔は$O(\log(T))$ を満たす。提案するアルゴリズムは,tsallisエントロピーを伴うフォローザ・ザ・レギュラライズド・リーダに基づき,$\alpha$-linear-contextual (lc)-tsallis-inf と呼ばれる。

関連論文リスト

Improved Best-of-Both-Worlds Regret for Bandits with Delayed Feedback [40.86836752251116]
本稿では,Best-Both-Worlds (BoBW) フレームワークにおいて,逆選択遅延を用いたマルチアームバンディット問題について検討する。我々の主な貢献は、各設定の既知の下界を個別にマッチングする新しいアルゴリズムである。これは$sum_i>0left(log T/Delta_iright) + frac1Ksum Delta_i sigma_max$で、$Delta_i$はarm $i$と$のサブ最適ギャップである。
論文参考訳（メタデータ） (2025-05-30T04:05:52Z)
Sign Operator for Coping with Heavy-Tailed Noise in Non-Convex Optimization: High Probability Bounds Under $(L_0, L_1)$-Smoothness [74.18546828528298]
SignSGD with Majority Votingは,Kappakappakappa-1right,Kappakappakappa-1right,Kappakappakappa-1right,Kappakappakappa-1right,Kappaka ppakappa-1right,Kappakappakappa-1right,Kappakappakappa-1right,Kappakappakappa-1right,Kappakappakappa -1right,Kappakappakappa-1right,Kappakappakappa-1right,Kappakappappapa-1right,Kappaを用いて,複雑性の全範囲で堅牢に動作することを示す。
論文参考訳（メタデータ） (2025-02-11T19:54:11Z)
No-Regret Linear Bandits under Gap-Adjusted Misspecification [38.592043705502725]
既存の線形包帯の作用は通常、最良の線形近似のsup-norm誤差を測定する一様不特定パラメータ$epsilon$に依存する。そこで本研究では,各入力における近似誤差をx$で近似し,その差分をx$で比例する,より自然な不特定モデルを提案する。我々は,従来のLinUCBアルゴリズムが,そのような$rho$-gap-adjusted misspecificationに対して自動的に堅牢であることを示す。
論文参考訳（メタデータ） (2025-01-09T16:44:53Z)
Efficient Frameworks for Generalized Low-Rank Matrix Bandit Problems [61.85150061213987]
一般化線形モデル (GLM) フレームワークを用いて, citelu2021low で提案した一般化低ランク行列帯域問題について検討する。既存のアルゴリズムの計算不可能性と理論的制約を克服するため,まずG-ESTTフレームワークを提案する。 G-ESTT は $tildeO(sqrt(d_1+d_2)3/2Mr3/2T)$ bound of regret を達成でき、G-ESTS は $tildeO を達成できることを示す。
論文参考訳（メタデータ） (2024-01-14T14:14:19Z)
Best-of-Both-Worlds Linear Contextual Bandits [45.378265414553226]
本研究は, 対向汚職下での多武装盗賊問題の事例である$K$腕線形文脈盗賊の問題を考察する。我々は,理論的保証のもと,双方の敵環境に有効な戦略を開発する。両体制の理論的保証から,我々の戦略をBest-of-Both-Worlds (BoBW) RealFTRLと呼んでいる。
論文参考訳（メタデータ） (2023-12-27T09:32:18Z)
Best-of-Both-Worlds Algorithms for Linear Contextual Bandits [11.94312915280916]
両世界のベスト・オブ・ワールドズ・アルゴリズムを$K$武器付き線形文脈包帯に対して検討する。我々のアルゴリズムは、敵対的体制と敵対的体制の両方において、ほぼ最適の後悔の限界を提供する。
論文参考訳（メタデータ） (2023-12-24T08:27:30Z)
On the Interplay Between Misspecification and Sub-optimality Gap in Linear Contextual Bandits [76.2262680277608]
本研究では,線形関数クラスによって期待される報酬関数を近似できるような,不特定条件下での線形文脈帯域について検討する。このアルゴリズムは, 対数的因子に比例した設定において, ギャップ依存の残差が$tilde O (d2/Delta)$と同じであることを示す。
論文参考訳（メタデータ） (2023-03-16T15:24:29Z)
Optimal Extragradient-Based Bilinearly-Coupled Saddle-Point Optimization [116.89941263390769]
滑らかな凸凹凸結合型サドル点問題, $min_mathbfxmax_mathbfyF(mathbfx) + H(mathbfx,mathbfy)$ を考える。漸進的勾配指数(AG-EG)降下指数アルゴリズムについて述べる。
論文参考訳（メタデータ） (2022-06-17T06:10:20Z)
On Submodular Contextual Bandits [92.45432756301231]
作用が基底集合の部分集合であり、平均報酬が未知の単調部分モジュラ函数によってモデル化されるような文脈的包帯の問題を考える。 Inverse Gap Weighting 戦略により,提案アルゴリズムは推定関数の局所的最適度を効率よくランダム化することを示す。
論文参考訳（メタデータ） (2021-12-03T21:42:33Z)
Almost Optimal Batch-Regret Tradeoff for Batch Linear Contextual Bandits [45.43968161616453]
バッチ線形文脈帯域に対する最適バッチ-regretトレードオフについて検討する。時間的地平線が成長するにつれて2相表現を特徴とする後悔の保証を証明します。また, 動的上界に依存した新しい行列不等式濃度を証明した。
論文参考訳（メタデータ） (2021-10-15T12:32:33Z)
Regret Lower Bound and Optimal Algorithm for High-Dimensional Contextual Linear Bandit [10.604939762790517]
我々は、累積後悔に対して、$mathcalObig((log d)fracalpha+12Tfrac1-alpha2+log Tbig)$をミニマックス下界として証明する。第2に,汎用的なアッパー信頼境界(UCB)戦略に着想を得た,シンプルで効率的なアルゴリズムを提案する。
論文参考訳（メタデータ） (2021-09-23T19:35:38Z)
Optimal Regret Algorithm for Pseudo-1d Bandit Convex Optimization [51.23789922123412]
我々は,バンディットフィードバックを用いてオンライン学習を学習する。 learnerは、コスト/リワード関数が"pseudo-1d"構造を許可するゼロ次オラクルのみにアクセスできる。我々は、$T$がラウンドの数である任意のアルゴリズムの後悔のために$min(sqrtdT、T3/4)$の下限を示しています。ランダム化オンライングラデーション下降とカーネル化指数重み法を組み合わせた新しいアルゴリズムsbcalgを提案し,疑似-1d構造を効果的に活用する。
論文参考訳（メタデータ） (2021-02-15T08:16:51Z)
Thresholded Lasso Bandit [70.17389393497125]
Thresholded Lasso banditは、報酬関数を定義するベクトルとスパースサポートを推定するアルゴリズムである。一般には $mathcalO( log d + sqrtT )$ や $mathcalO( log d + sqrtT )$ としてスケールする非漸近的後悔の上界を確立する。
論文参考訳（メタデータ） (2020-10-22T19:14:37Z)
Stochastic Bandits with Linear Constraints [69.757694218456]
制約付き文脈線形帯域設定について検討し、エージェントの目標は一連のポリシーを作成することである。楽観的悲観的線形帯域(OPLB)と呼ばれる,この問題に対する高信頼束縛アルゴリズムを提案する。
論文参考訳（メタデータ） (2020-06-17T22:32:19Z)
$Q$-learning with Logarithmic Regret [60.24952657636464]
楽観的な$Q$は$mathcalOleft(fracSAcdot mathrmpolyleft(Hright)Delta_minlogleft(SATright)right)$ cumulative regret bound, where $S$ is the number of state, $A$ is the number of action, $H$ is the planning horizon, $T$ is the total number of steps, $Delta_min$ is the least sub-Optitimality gap。
論文参考訳（メタデータ） (2020-06-16T13:01:33Z)
Agnostic Q-learning with Function Approximation in Deterministic Systems: Tight Bounds on Approximation Error and Sample Complexity [94.37110094442136]
本稿では,決定論的システムにおける関数近似を用いたQ$学習の問題について検討する。もし$delta = Oleft(rho/sqrtdim_Eright)$なら、$Oleft(dim_Eright)$を使って最適なポリシーを見つけることができる。
論文参考訳（メタデータ） (2020-02-17T18:41:49Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。