Fugu-MT 論文翻訳(概要): LC-Tsallis-INF: Generalized Best-of-Both-Worlds Linear Contextual Bandits

論文の概要: LC-Tsallis-INF: Generalized Best-of-Both-Worlds Linear Contextual Bandits

arxiv url: http://arxiv.org/abs/2403.03219v3
Date: Wed, 28 May 2025 09:30:28 GMT
ステータス: 翻訳完了
システム内更新日: 2025-05-29 17:35:49.969866
Title: LC-Tsallis-INF: Generalized Best-of-Both-Worlds Linear Contextual Bandits
Title（参考訳）: LC-Tsallis-INF:Best-of-Both-Worlds Linear Contextual Bandits
Authors: Masahiro Kato, Shinji Ito,
Abstract要約: 本研究では,両逆境における上界を後悔するEmphBest-of-Both-Worlds (BoBW) アルゴリズムを開発した。提案アルゴリズムは限界条件下で$Oleft(log(T)1+beta2+betaTfrac12+betaright)$ regretを達成していることを示す。
参考スコア（独自算出の注目度）: 38.41164102066483
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We investigate the \emph{linear contextual bandit problem} with independent and identically distributed (i.i.d.) contexts. In this problem, we aim to develop a \emph{Best-of-Both-Worlds} (BoBW) algorithm with regret upper bounds in both stochastic and adversarial regimes. We develop an algorithm based on \emph{Follow-The-Regularized-Leader} (FTRL) with Tsallis entropy, referred to as the $\alpha$-\emph{Linear-Contextual (LC)-Tsallis-INF}. We show that its regret is at most $O(\log(T))$ in the stochastic regime under the assumption that the suboptimality gap is uniformly bounded from below, and at most $O(\sqrt{T})$ in the adversarial regime. Furthermore, our regret analysis is extended to more general regimes characterized by the \emph{margin condition} with a parameter $\beta \in (1, \infty]$, which imposes a milder assumption on the suboptimality gap. We show that the proposed algorithm achieves $O\left(\log(T)^{\frac{1+\beta}{2+\beta}}T^{\frac{1}{2+\beta}}\right)$ regret under the margin condition.
Abstract（参考訳）: 独立かつ同一に分散した文脈(すなわちd.d.)を用いて,emph{linear contextual bandit problem} について検討する。そこで本研究では,確率的・対角的両方の上界を後悔させるような,ボベス・ワールドズ(BoBW)アルゴリズムを提案する。我々は,Tsallis entropy を用いた \emph{Follow-The-Regularized-Leader} (FTRL) に基づくアルゴリズムを開発し,これを $\alpha$-\emph{Linear-Contextual (LC)-Tsallis-INF} と呼ぶ。我々は、その後悔は確率的状態における少なくとも$O(\log(T))$であり、その準最適性ギャップが下から一様有界であるという仮定の下で、対数的状態における少なくとも$O(\sqrt{T})$であることを示す。さらに、我々の後悔分析は、パラメータ $\beta \in (1, \infty]$ の \emph{margin 条件によって特徴づけられるより一般的な状態にまで拡張される。提案アルゴリズムは, マージン条件下での後悔を$O\left(\log(T)^{\frac{1+\beta}{2+\beta}}T^{\frac{1}{2+\beta}}\right)とする。

関連論文リスト

Improved Best-of-Both-Worlds Regret for Bandits with Delayed Feedback [40.86836752251116]
本稿では,Best-Both-Worlds (BoBW) フレームワークにおいて,逆選択遅延を用いたマルチアームバンディット問題について検討する。我々の主な貢献は、各設定の既知の下界を個別にマッチングする新しいアルゴリズムである。これは$sum_i>0left(log T/Delta_iright) + frac1Ksum Delta_i sigma_max$で、$Delta_i$はarm $i$と$のサブ最適ギャップである。
論文参考訳（メタデータ） (2025-05-30T04:05:52Z)
Sign Operator for Coping with Heavy-Tailed Noise in Non-Convex Optimization: High Probability Bounds Under $(L_0, L_1)$-Smoothness [74.18546828528298]
SignSGD with Majority Votingは,Kappakappakappa-1right,Kappakappakappa-1right,Kappakappakappa-1right,Kappakappakappa-1right,Kappaka ppakappa-1right,Kappakappakappa-1right,Kappakappakappa-1right,Kappakappakappa-1right,Kappakappakappa -1right,Kappakappakappa-1right,Kappakappakappa-1right,Kappakappappapa-1right,Kappaを用いて,複雑性の全範囲で堅牢に動作することを示す。
論文参考訳（メタデータ） (2025-02-11T19:54:11Z)
No-Regret Linear Bandits under Gap-Adjusted Misspecification [38.592043705502725]
既存の線形包帯の作用は通常、最良の線形近似のsup-norm誤差を測定する一様不特定パラメータ$epsilon$に依存する。そこで本研究では,各入力における近似誤差をx$で近似し,その差分をx$で比例する,より自然な不特定モデルを提案する。我々は,従来のLinUCBアルゴリズムが,そのような$rho$-gap-adjusted misspecificationに対して自動的に堅牢であることを示す。
論文参考訳（メタデータ） (2025-01-09T16:44:53Z)
Efficient Frameworks for Generalized Low-Rank Matrix Bandit Problems [61.85150061213987]
一般化線形モデル (GLM) フレームワークを用いて, citelu2021low で提案した一般化低ランク行列帯域問題について検討する。既存のアルゴリズムの計算不可能性と理論的制約を克服するため,まずG-ESTTフレームワークを提案する。 G-ESTT は $tildeO(sqrt(d_1+d_2)3/2Mr3/2T)$ bound of regret を達成でき、G-ESTS は $tildeO を達成できることを示す。
論文参考訳（メタデータ） (2024-01-14T14:14:19Z)
Best-of-Both-Worlds Linear Contextual Bandits [45.378265414553226]
本研究は, 対向汚職下での多武装盗賊問題の事例である$K$腕線形文脈盗賊の問題を考察する。我々は,理論的保証のもと,双方の敵環境に有効な戦略を開発する。両体制の理論的保証から,我々の戦略をBest-of-Both-Worlds (BoBW) RealFTRLと呼んでいる。
論文参考訳（メタデータ） (2023-12-27T09:32:18Z)
Best-of-Both-Worlds Algorithms for Linear Contextual Bandits [11.94312915280916]
両世界のベスト・オブ・ワールドズ・アルゴリズムを$K$武器付き線形文脈包帯に対して検討する。我々のアルゴリズムは、敵対的体制と敵対的体制の両方において、ほぼ最適の後悔の限界を提供する。
論文参考訳（メタデータ） (2023-12-24T08:27:30Z)
On the Interplay Between Misspecification and Sub-optimality Gap in Linear Contextual Bandits [76.2262680277608]
本研究では,線形関数クラスによって期待される報酬関数を近似できるような,不特定条件下での線形文脈帯域について検討する。このアルゴリズムは, 対数的因子に比例した設定において, ギャップ依存の残差が$tilde O (d2/Delta)$と同じであることを示す。
論文参考訳（メタデータ） (2023-03-16T15:24:29Z)
Optimal Extragradient-Based Bilinearly-Coupled Saddle-Point Optimization [116.89941263390769]
滑らかな凸凹凸結合型サドル点問題, $min_mathbfxmax_mathbfyF(mathbfx) + H(mathbfx,mathbfy)$ を考える。漸進的勾配指数(AG-EG)降下指数アルゴリズムについて述べる。
論文参考訳（メタデータ） (2022-06-17T06:10:20Z)
On Submodular Contextual Bandits [92.45432756301231]
作用が基底集合の部分集合であり、平均報酬が未知の単調部分モジュラ函数によってモデル化されるような文脈的包帯の問題を考える。 Inverse Gap Weighting 戦略により,提案アルゴリズムは推定関数の局所的最適度を効率よくランダム化することを示す。
論文参考訳（メタデータ） (2021-12-03T21:42:33Z)
Almost Optimal Batch-Regret Tradeoff for Batch Linear Contextual Bandits [45.43968161616453]
バッチ線形文脈帯域に対する最適バッチ-regretトレードオフについて検討する。時間的地平線が成長するにつれて2相表現を特徴とする後悔の保証を証明します。また, 動的上界に依存した新しい行列不等式濃度を証明した。
論文参考訳（メタデータ） (2021-10-15T12:32:33Z)
Regret Lower Bound and Optimal Algorithm for High-Dimensional Contextual Linear Bandit [10.604939762790517]
我々は、累積後悔に対して、$mathcalObig((log d)fracalpha+12Tfrac1-alpha2+log Tbig)$をミニマックス下界として証明する。第2に,汎用的なアッパー信頼境界(UCB)戦略に着想を得た,シンプルで効率的なアルゴリズムを提案する。
論文参考訳（メタデータ） (2021-09-23T19:35:38Z)
Optimal Regret Algorithm for Pseudo-1d Bandit Convex Optimization [51.23789922123412]
我々は,バンディットフィードバックを用いてオンライン学習を学習する。 learnerは、コスト/リワード関数が"pseudo-1d"構造を許可するゼロ次オラクルのみにアクセスできる。我々は、$T$がラウンドの数である任意のアルゴリズムの後悔のために$min(sqrtdT、T3/4)$の下限を示しています。ランダム化オンライングラデーション下降とカーネル化指数重み法を組み合わせた新しいアルゴリズムsbcalgを提案し,疑似-1d構造を効果的に活用する。
論文参考訳（メタデータ） (2021-02-15T08:16:51Z)
Thresholded Lasso Bandit [70.17389393497125]
Thresholded Lasso banditは、報酬関数を定義するベクトルとスパースサポートを推定するアルゴリズムである。一般には $mathcalO( log d + sqrtT )$ や $mathcalO( log d + sqrtT )$ としてスケールする非漸近的後悔の上界を確立する。
論文参考訳（メタデータ） (2020-10-22T19:14:37Z)
Stochastic Bandits with Linear Constraints [69.757694218456]
制約付き文脈線形帯域設定について検討し、エージェントの目標は一連のポリシーを作成することである。楽観的悲観的線形帯域(OPLB)と呼ばれる,この問題に対する高信頼束縛アルゴリズムを提案する。
論文参考訳（メタデータ） (2020-06-17T22:32:19Z)
$Q$-learning with Logarithmic Regret [60.24952657636464]
楽観的な$Q$は$mathcalOleft(fracSAcdot mathrmpolyleft(Hright)Delta_minlogleft(SATright)right)$ cumulative regret bound, where $S$ is the number of state, $A$ is the number of action, $H$ is the planning horizon, $T$ is the total number of steps, $Delta_min$ is the least sub-Optitimality gap。
論文参考訳（メタデータ） (2020-06-16T13:01:33Z)
Agnostic Q-learning with Function Approximation in Deterministic Systems: Tight Bounds on Approximation Error and Sample Complexity [94.37110094442136]
本稿では,決定論的システムにおける関数近似を用いたQ$学習の問題について検討する。もし$delta = Oleft(rho/sqrtdim_Eright)$なら、$Oleft(dim_Eright)$を使って最適なポリシーを見つけることができる。
論文参考訳（メタデータ） (2020-02-17T18:41:49Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。