Fugu-MT 論文翻訳(概要): Fast UCB-type algorithms for stochastic bandits with heavy and super heavy symmetric noise

論文の概要: Fast UCB-type algorithms for stochastic bandits with heavy and super heavy symmetric noise

arxiv url: http://arxiv.org/abs/2402.07062v1
Date: Sat, 10 Feb 2024 22:38:21 GMT
ステータス: 翻訳完了
システム内更新日: 2024-02-13 17:42:43.478423
Title: Fast UCB-type algorithms for stochastic bandits with heavy and super heavy symmetric noise
Title（参考訳）: 重度・超重度対称雑音をもつ確率帯域に対する高速UCB型アルゴリズム
Authors: Yuriy Dorn, Aleksandr Katrutsa, Ilgam Latypov, Andrey Pudovikov
Abstract要約: マルチアームバンディットのためのUCB型アルゴリズムを構築するための新しいアルゴリズムを提案する。報酬の対称雑音の場合、$O(log TsqrtKTlog T)$ regret bound を $Oleft の代わりに達成できることを示す。
参考スコア（独自算出の注目度）: 45.60098988395789
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: In this study, we propose a new method for constructing UCB-type algorithms for stochastic multi-armed bandits based on general convex optimization methods with an inexact oracle. We derive the regret bounds corresponding to the convergence rates of the optimization methods. We propose a new algorithm Clipped-SGD-UCB and show, both theoretically and empirically, that in the case of symmetric noise in the reward, we can achieve an $O(\log T\sqrt{KT\log T})$ regret bound instead of $O\left (T^{\frac{1}{1+\alpha}} K^{\frac{\alpha}{1+\alpha}} \right)$ for the case when the reward distribution satisfies $\mathbb{E}_{X \in D}[|X|^{1+\alpha}] \leq \sigma^{1+\alpha}$ ($\alpha \in (0, 1])$, i.e. perform better than it is assumed by the general lower bound for bandits with heavy-tails. Moreover, the same bound holds even when the reward distribution does not have the expectation, that is, when $\alpha<0$.
Abstract（参考訳）: 本研究では,不正確なオラクルを用いた一般凸最適化法に基づいて,確率的マルチアームバンディットのためのUCB型アルゴリズムを構築する手法を提案する。我々は最適化手法の収束率に対応する後悔境界を導出する。 We propose a new algorithm Clipped-SGD-UCB and show, both theoretically and empirically, that in the case of symmetric noise in the reward, we can achieve an $O(\log T\sqrt{KT\log T})$ regret bound instead of $O\left (T^{\frac{1}{1+\alpha}} K^{\frac{\alpha}{1+\alpha}} \right)$ for the case when the reward distribution satisfies $\mathbb{E}_{X \in D}[|X|^{1+\alpha}] \leq \sigma^{1+\alpha}$ ($\alpha \in (0, 1])$, i.e. perform better than it is assumed by the general lower bound for bandits with heavy-tails. さらに、報酬分布が期待値を持っていない場合、すなわち$\alpha<0$であるときでも、同じバウンドが成り立つ。

関連論文リスト

Nearly Optimal Sample Complexity of Offline KL-Regularized Contextual Bandits under Single-Policy Concentrability [49.96531901205305]
我々は,オフラインの文脈的包帯に対する単一政治中心性の下でのサンプル複雑性を$tildeO(epsilon-1)$とするemphfirstアルゴリズムを提案する。我々の証明は、KL正則化の強い凸性と、真の報酬と悲観的推定子のギャップの条件的非負性を利用する。我々は,このアルゴリズムを文脈的デュエル帯域に拡張し,ほぼ最適なサンプル複雑性を実現する。
論文参考訳（メタデータ） (2025-02-09T22:14:45Z)
Nonconvex Stochastic Optimization under Heavy-Tailed Noises: Optimal Convergence without Gradient Clipping [21.865728815935665]
重み付き雑音下での最初の収束を提供するが、切断はしない。また、テールインデックス$mathfrakp$が事前に不明な場合には、最初の$mathcalO(Tfrac1-mathfrakp3mathfrakp-2)$収束率も設定する。
論文参考訳（メタデータ） (2024-12-27T08:46:46Z)
p-Mean Regret for Stochastic Bandits [52.828710025519996]
単純で統一された UCB ベースのアルゴリズムを導入し、新しい$p$-mean の後悔境界を実現する。我々の枠組みは、特別な場合として、平均的な累積的後悔とナッシュ後悔の両方を包含する。
論文参考訳（メタデータ） (2024-12-14T08:38:26Z)
Combinatorial Stochastic-Greedy Bandit [79.1700188160944]
我々は,選択した$n$のアームセットのジョイント報酬以外の余分な情報が観測されない場合に,マルチアームのバンディット問題に対する新規グリーディ・バンディット(SGB)アルゴリズムを提案する。 SGBは最適化された拡張型コミットアプローチを採用しており、ベースアームの大きなセットを持つシナリオ用に特別に設計されている。
論文参考訳（メタデータ） (2023-12-13T11:08:25Z)
Generalized Regret Analysis of Thompson Sampling using Fractional Posteriors [12.43000662545423]
トンプソンサンプリング(Thompson sample, TS)は、マルチアームバンディット問題を解くアルゴリズムの1つである。 TSの変種である$alpha$-TSを考え、標準的な後続分布の代わりに$alpha$-posteriorまたは$alpha$-posteriorを使用する。
論文参考訳（メタデータ） (2023-09-12T16:15:33Z)
An Oblivious Stochastic Composite Optimization Algorithm for Eigenvalue Optimization Problems [76.2042837251496]
相補的な合成条件に基づく2つの難解なミラー降下アルゴリズムを導入する。注目すべきは、どちらのアルゴリズムも、目的関数のリプシッツ定数や滑らかさに関する事前の知識なしで機能する。本稿では,大規模半確定プログラム上での手法の効率性とロバスト性を示す。
論文参考訳（メタデータ） (2023-06-30T08:34:29Z)
Private estimation algorithms for stochastic block models and mixture models [63.07482515700984]
効率的なプライベート推定アルゴリズムを設計するための一般的なツール。最初の効率的な$(epsilon, delta)$-differentially private algorithm for both weak recovery and exact recovery。
論文参考訳（メタデータ） (2023-01-11T09:12:28Z)
Robust Sparse Mean Estimation via Sum of Squares [42.526664955704746]
本研究では,高次元スパース平均推定の問題点を,逆数外乱の$epsilon$-fractionの存在下で検討する。我々のアルゴリズムは、サム・オブ・スクエア(Sum-of-Squares)ベースのアルゴリズムアプローチに従う。
論文参考訳（メタデータ） (2022-06-07T16:49:54Z)
Adaptive Best-of-Both-Worlds Algorithm for Heavy-Tailed Multi-Armed Bandits [22.18577284185939]
我々は、重尾の多腕バンディットのためのロバストなベスト・オブ・ザ・ワールドスアルゴリズムを開発した。 textttHTINFは、両方の環境と敵環境に最適な後悔を同時に達成する。 textttAdaTINFは、alpha$と$sigma$の両方に適応できる最初のアルゴリズムであり、最適のギャップを逸脱した後悔境界を達成する。
論文参考訳（メタデータ） (2022-01-28T03:53:39Z)
Gaussian Process Bandit Optimization with Few Batches [49.896920704012395]
有限腕バンディットアルゴリズムにインスパイアされたバッチアルゴリズムを導入する。 O(log T)$ batches in time horizon $T$.sqrtTgamma_T)$ using $O(log T)$ batches in time horizon。さらに,アルゴリズムの修正版を提案し,バッチ数によって後悔がどう影響するかを特徴付ける。
論文参考訳（メタデータ） (2021-10-15T00:54:04Z)
Smooth Bandit Optimization: Generalization to H\"older Space [37.15553727896912]
我々は、目標が累積後悔である滑らかな報酬関数のバンディット最適化を考える。私たちの主な結果は、指数 $alpha>1$ を持つ H"older space への報酬関数の一般化です。私たちは、$alphaleq 1$サブセット内で、既存の下限に適合する後悔率を達成することを示します。
論文参考訳（メタデータ） (2020-12-11T01:43:25Z)
Thresholded Lasso Bandit [70.17389393497125]
Thresholded Lasso banditは、報酬関数を定義するベクトルとスパースサポートを推定するアルゴリズムである。一般には $mathcalO( log d + sqrtT )$ や $mathcalO( log d + sqrtT )$ としてスケールする非漸近的後悔の上界を確立する。
論文参考訳（メタデータ） (2020-10-22T19:14:37Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。