Fugu-MT 論文翻訳(概要): Data-dependent Bounds with $T$-Optimal Best-of-Both-Worlds Guarantees in Multi-Armed Bandits using Stability-Penalty Matching

論文の概要: Data-dependent Bounds with $T$-Optimal Best-of-Both-Worlds Guarantees in Multi-Armed Bandits using Stability-Penalty Matching

arxiv url: http://arxiv.org/abs/2502.08143v1
Date: Wed, 12 Feb 2025 05:48:57 GMT
ステータス: 翻訳完了
システム内更新日: 2025-02-13 18:10:00.88312
Title: Data-dependent Bounds with $T$-Optimal Best-of-Both-Worlds Guarantees in Multi-Armed Bandits using Stability-Penalty Matching
Title（参考訳）: 安定ペナルティマッチングを用いたマルチアーメッド帯域におけるT$-Optimal Best-of-Both-Worlds Guaranteesを用いたデータ依存境界
Authors: Quan Nguyen, Shinji Ito, Junpei Komiyama, Nishant A. Mehta,
Abstract要約: 既存の多武装バンディット問題に対する既存のデータ依存的および最良世界的後悔境界は適応性に制限がある。本稿では,データ依存,ベスト・オブ・ボス・ワールド,および$T$-Optimalを同時に獲得する新たな手法であるリアルタイム安定性-ペナルティマッチング(SPM)を提案する。
参考スコア（独自算出の注目度）: 34.36408457839709
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Existing data-dependent and best-of-both-worlds regret bounds for multi-armed bandits problems have limited adaptivity as they are either data-dependent but not best-of-both-worlds (BOBW), BOBW but not data-dependent or have sub-optimal $O(\sqrt{T\ln{T}})$ worst-case guarantee in the adversarial regime. To overcome these limitations, we propose real-time stability-penalty matching (SPM), a new method for obtaining regret bounds that are simultaneously data-dependent, best-of-both-worlds and $T$-optimal for multi-armed bandits problems. In particular, we show that real-time SPM obtains bounds with worst-case guarantees of order $O(\sqrt{T})$ in the adversarial regime and $O(\ln{T})$ in the stochastic regime while simultaneously being adaptive to data-dependent quantities such as sparsity, variations, and small losses. Our results are obtained by extending the SPM technique for tuning the learning rates in the follow-the-regularized-leader (FTRL) framework, which further indicates that the combination of SPM and FTRL is a promising approach for proving new adaptive bounds in online learning problems.
Abstract（参考訳）: 既存の多武装バンディット問題に対する、データ依存的であり、最強世界(BOBW)でも、BOBWでも、データ依存ではない、あるいは準最適$O(\sqrt{T\ln{T}})があるため、既成の多武装バンディット問題に対する後悔境界は、適応性に制限がある。これらの制約を克服するために,データ依存のベスト・オブ・ボス・ワールドと,マルチアームバンディット問題に対する$T$-optimalを同時に獲得する新たな手法であるリアルタイム安定性-ペナルティマッチング(SPM)を提案する。特に, 実時間SPMは, 逆数系では$O(\sqrt{T})$, 確率系では$O(\ln{T})$と, 空間性, ばらつき, 損失などのデータ依存量に適応し, 最悪の場合の保証付き境界が得られることを示す。本研究は,オンライン学習問題において,学習率の調整を行うSPM手法を拡張した上で,FTRLとFPMの併用が新たな適応的境界を証明するための有望なアプローチであることを示すものである。

関連論文リスト

Influential Bandits: Pulling an Arm May Change the Environment [44.71145269686588]
現実世界のアプリケーションは、しばしば非定常環境と武器間の相互依存を含む。本稿では,未知の,対称な正の半定値相互作用行列による腕間相互作用をモデル化する,影響力のあるバンドイット問題を提案する。我々は,損失ダイナミクスの構造に合わせて,低信頼境界(LCB)推定器に基づく新しいアルゴリズムを提案する。
論文参考訳（メタデータ） (2025-04-11T02:05:51Z)
Follow-the-Perturbed-Leader Approaches Best-of-Both-Worlds for the m-Set Semi-Bandit Problems [18.74680577173648]
我々は、学習者が$m$の腕から$m$の腕を正確に選択する、$m$セット半帯域問題の一般的な場合を考える。また, Fr'echet 摂動を持つFTPL は, 対向的な設定で, $mathcalO(sqrtnm(sqrtdlog(d)+m5/6)$ をほぼ最適に再現できることを示す。
論文参考訳（メタデータ） (2025-04-09T22:07:01Z)
Semi-Parametric Batched Global Multi-Armed Bandits with Covariates [0.48342038441006807]
マルチアームバンディット(MAB)フレームワークは、シーケンシャルな意思決定に広く使われているアプローチである。本稿では,コパラメトリックと腕間の共有パラメータを持つバッチバンドの半パラメトリックフレームワークを提案する。 Batched Single-Index Dynamic binning and Successive arm elimination (BIDS) というアルゴリズムでは、バッチ化された逐次アームの除去戦略を採用している。
論文参考訳（メタデータ） (2025-03-01T17:23:55Z)
uniINF: Best-of-Both-Worlds Algorithm for Parameter-Free Heavy-Tailed MABs [33.262918224598614]
本稿では,HTMAB(Heavy-Tailed Multi-Armed Bandits)問題に対する新しいアルゴリズムを提案する。我々の新しいアルゴリズムユニは、Best-of-Both-Worlds(BoBW)特性を楽しみ、両環境とも最適に機能する。我々の知る限り、UniINFは重み付きMAB問題に対するBoBW特性を達成する最初のパラメータフリーアルゴリズムである。
論文参考訳（メタデータ） (2024-10-04T09:55:44Z)
Stability-penalty-adaptive follow-the-regularized-leader: Sparsity, game-dependency, and best-of-both-worlds [46.30750729936261]
FTRL(Follow-the-regularized-leader)は近年,バンドイット問題における適応性獲得の最も有望なアプローチの1つである。我々は3種類の適応性を持ついくつかのアルゴリズムを確立する:空間性、ゲーム依存性、およびベスト・オブ・ボス・ワールド(BOBW)である。
論文参考訳（メタデータ） (2023-05-26T23:20:48Z)
Implicitly normalized forecaster with clipping for linear and non-linear heavy-tailed multi-armed bandits [85.27420062094086]
Implicitly Normalized Forecaster (INF) は、敵対的マルチアームバンディット(MAB)問題に対する最適解であると考えられている。重み付き設定のMAB問題に対するクリッピング(INFclip)を用いたINFの新バージョン"Implicitly Normalized Forecaster"を提案する。 INFclipは線形重み付きMAB問題に対して最適であり、非線形問題に対して有効であることを示す。
論文参考訳（メタデータ） (2023-05-11T12:00:43Z)
Contextual Combinatorial Bandits with Probabilistically Triggered Arms [55.9237004478033]
確率的に誘発される腕(C$2$MAB-T)を様々な滑らかさ条件下で検討した。トリガー変調 (TPM) 条件の下では、C$2$-UC-Tアルゴリズムを考案し、後悔すべき$tildeO(dsqrtT)$を導出する。
論文参考訳（メタデータ） (2023-03-30T02:51:00Z)
Improved Best-of-Both-Worlds Guarantees for Multi-Armed Bandits: FTRL with General Regularizers and Multiple Optimal Arms [41.06668954462585]
本研究では,適応型マルチアームバンディットアルゴリズムを設計する際の課題について検討する。 FTRLには多種多様な正規化要因と新たな学習率スケジュールが不要であることを示す。
論文参考訳（メタデータ） (2023-02-27T06:09:10Z)
Variance-Dependent Regret Bounds for Linear Bandits and Reinforcement Learning: Adaptivity and Computational Efficiency [90.40062452292091]
本稿では,不整合雑音を持つ線形帯域に対する計算効率のよい最初のアルゴリズムを提案する。我々のアルゴリズムは未知のノイズの分散に適応し、$tildeO(d sqrtsum_k = 1K sigma_k2 + d)$ regretを達成する。また、強化学習において、線形混合マルコフ決定過程(MDP)に対する分散適応アルゴリズムを提案する。
論文参考訳（メタデータ） (2023-02-21T00:17:24Z)
Batch-Size Independent Regret Bounds for Combinatorial Semi-Bandits with Probabilistically Triggered Arms or Independent Arms [59.8188496313214]
半帯域 (CMAB) について検討し, 半帯域 (CMAB) におけるバッチサイズ (K$) の依存性の低減に着目した。まず,確率的に引き起こされるアーム(CMAB-T)を用いたCMABの設定に対して,分散を考慮した信頼区間を持つBCUCB-Tアルゴリズムを提案する。次に,独立アームを用いた非トリガ型CMABの設定に対して,TPVM条件の非トリガ型を利用したSESCBアルゴリズムを提案する。
論文参考訳（メタデータ） (2022-08-31T13:09:39Z)
Adversarially Robust Multi-Armed Bandit Algorithm with Variance-Dependent Regret Bounds [34.37963000493442]
本稿では,マルチアーム・バンディット(MAB)問題について考察し,両対向的条件下でほぼ最適に機能する,新たなベスト・オブ・ボス・ワールド(BOBW)アルゴリズムを提案する。
論文参考訳（メタデータ） (2022-06-14T12:58:46Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。