Fugu-MT 論文翻訳(概要): Optimal Strategies for Graph-Structured Bandits

論文の概要: Optimal Strategies for Graph-Structured Bandits

arxiv url: http://arxiv.org/abs/2007.03224v2
Date: Fri, 10 Jul 2020 09:17:09 GMT
ステータス: 翻訳完了
システム内更新日: 2022-11-12 20:37:47.904998
Title: Optimal Strategies for Graph-Structured Bandits
Title（参考訳）: グラフ構造バンディットの最適戦略
Authors: Hassan Saber (SEQUEL), Pierre M\'enard (SEQUEL), Odalric-Ambrym Maillard (SEQUEL)
Abstract要約: Bernoulli $!=!(nu_a,b)_a in mathcalA, b in mathcalB$ with means $(mu_a,b)_a in mathcalA, b in mathcalB$ with means $(mu_a,b)_a
参考スコア（独自算出の注目度）: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We study a structured variant of the multi-armed bandit problem specified by a set of Bernoulli distributions $ \nu \!= \!(\nu\_{a,b})\_{a \in \mathcal{A}, b \in \mathcal{B}}$ with means $(\mu\_{a,b})\_{a \in \mathcal{A}, b \in \mathcal{B}}\!\in\![0,1]^{\mathcal{A}\times\mathcal{B}}$ and by a given weight matrix $\omega\!=\! (\omega\_{b,b'})\_{b,b' \in \mathcal{B}}$, where $ \mathcal{A}$ is a finite set of arms and $ \mathcal{B} $ is a finite set of users. The weight matrix $\omega$ is such that for any two users $b,b'\!\in\!\mathcal{B}, \text{max}\_{a\in\mathcal{A}}|\mu\_{a,b} \!-\! \mu\_{a,b'}| \!\leq\! \omega\_{b,b'} $. This formulation is flexible enough to capture various situations, from highly-structured scenarios ($\omega\!\in\!\{0,1\}^{\mathcal{B}\times\mathcal{B}}$) to fully unstructured setups ($\omega\!\equiv\! 1$).We consider two scenarios depending on whether the learner chooses only the actions to sample rewards from or both users and actions. We first derive problem-dependent lower bounds on the regret for this generic graph-structure that involves a structure dependent linear programming problem. Second, we adapt to this setting the Indexed Minimum Empirical Divergence (IMED) algorithm introduced by Honda and Takemura (2015), and introduce the IMED-GS$^\star$ algorithm. Interestingly, IMED-GS$^\star$ does not require computing the solution of the linear programming problem more than about $\log(T)$ times after $T$ steps, while being provably asymptotically optimal. Also, unlike existing bandit strategies designed for other popular structures, IMED-GS$^\star$ does not resort to an explicit forced exploration scheme and only makes use of local counts of empirical events. We finally provide numerical illustration of our results that confirm the performance of IMED-GS$^\star$.
Abstract（参考訳）: ベルヌーイ分布の集合によって定義される多武装バンディット問題の構造化変種について研究する。 = \! (\nu\_{a,b})\_{a \in \mathcal{A}, b \in \mathcal{B}}$ with means $(\mu\_{a,b})\_{a \in \mathcal{A}, b \in \mathcal{B}}\! イン! [0,1]^{\mathcal{A}\times\mathcal{B}}$ そして与えられた重み行列$\omega\! =\! (\omega\_{b,b'})\_{b,b' \in \mathcal{B}}$, ここで $ \mathcal{A}$ は有限個のアームの集合、 $ \mathcal{B} $ は有限個のユーザ集合である。重み行列 $\omega$ は、任意の 2 ユーザに対して $b,b'\! イン! \mathcal{B}, \text{max}\_{a\in\mathcal{A}}|\mu\_{a,b} \! -\! b' {\displaystyle b'}|\,b'} である。くたばれ! \omega\_{b,b'} $。この定式化は、高度に構造化されたシナリオ($\omega\! イン! \{0,1\}^{\mathcal{b}\times\mathcal{b}}$) 完全に構造化されていないセットアップ ($\omega\! \equiv\! 1$). 我々は,学習者が報酬をサンプリングする行動のみを選択するか,ユーザと行動の両方を選択するかの2つのシナリオを考察する。まず、構造依存線形計画問題を含むこの一般的なグラフ構造に対する後悔に関する問題依存下限を導出する。第2に本田と竹村(2015)が導入したindexed minimum empirical divergence(imed)アルゴリズムの設定に適応し,imed-gs$^\star$アルゴリズムを導入する。興味深いことに、imed-gs$^\star$は、t$ステップの後に約$\log(t)$倍の線形計画問題の解を計算する必要はないが、漸近的に最適である。また、他の一般的な構造向けに設計された既存のバンディット戦略とは異なり、imed-gs$^\star$は明示的な強制探査スキームを頼らず、経験的な事象の局所的なカウントのみを使用する。最終的に IMED-GS$^\star$ の性能を確認する数値図式を提供する。

関連論文リスト

Algorithmic contiguity from low-degree conjecture and applications in correlated random graphs [0.0]
2つの問題に対して計算硬度を示す。我々の証明の主な要素の1つは、2つの確率測度の間の近位関係を導出することである。このフレームワークは、異なるタスク間のリダクションを実行するための便利なツールを提供する。
論文参考訳（メタデータ） (2025-02-14T00:24:51Z)
Learning a Single Neuron Robustly to Distributional Shifts and Adversarial Label Noise [38.551072383777594]
本研究では, 対向分布シフトの存在下でのL2$損失に対して, 単一ニューロンを学習する問題について検討した。ベクトルベクトル二乗損失を$chi2$divergenceから$mathcalp_0$に近似するアルゴリズムを開発した。
論文参考訳（メタデータ） (2024-11-11T03:43:52Z)
The Communication Complexity of Approximating Matrix Rank [50.6867896228563]
この問題は通信複雑性のランダム化を$Omega(frac1kcdot n2log|mathbbF|)$とする。アプリケーションとして、$k$パスを持つ任意のストリーミングアルゴリズムに対して、$Omega(frac1kcdot n2log|mathbbF|)$スペースローバウンドを得る。
論文参考訳（メタデータ） (2024-10-26T06:21:42Z)
Efficient Continual Finite-Sum Minimization [52.5238287567572]
連続有限サム最小化(continuous finite-sum minimization)と呼ばれる有限サム最小化の鍵となるツイストを提案する。我々のアプローチは$mathcalO(n/epsilon)$ FOs that $mathrmStochasticGradientDescent$で大幅に改善されます。また、$mathcalOleft(n/epsilonalpharight)$ complexity gradient for $alpha 1/4$という自然な一階法は存在しないことを証明し、この方法の第一階法がほぼ密であることを示す。
論文参考訳（メタデータ） (2024-06-07T08:26:31Z)
Provably learning a multi-head attention layer [55.2904547651831]
マルチヘッドアテンション層は、従来のフィードフォワードモデルとは分離したトランスフォーマーアーキテクチャの重要な構成要素の1つである。本研究では,ランダムな例から多面的注意層を実証的に学習する研究を開始する。最悪の場合、$m$に対する指数的依存は避けられないことを示す。
論文参考訳（メタデータ） (2024-02-06T15:39:09Z)
A/B Testing and Best-arm Identification for Linear Bandits with Robustness to Non-stationarity [28.068960555415014]
非定常環境下での線形包帯の固定予算ベストアーム識別問題について検討する。アルゴリズムは、可能な限り高い確率で最適な腕 $x* := argmax_xinmathcalXxtopsum_t=1Ttheta_t$ を正しく識別することを目的としている。
論文参考訳（メタデータ） (2023-07-27T19:03:36Z)
Fast $(1+\varepsilon)$-Approximation Algorithms for Binary Matrix Factorization [54.29685789885059]
本稿では, 2次行列分解(BMF)問題に対する効率的な$(1+varepsilon)$-approximationアルゴリズムを提案する。目標は、低ランク因子の積として$mathbfA$を近似することである。我々の手法はBMF問題の他の一般的な変種に一般化する。
論文参考訳（メタデータ） (2023-06-02T18:55:27Z)
Randomized and Deterministic Attention Sparsification Algorithms for Over-parameterized Feature Dimension [18.57735939471469]
我々は注意問題のスパシフィケーションを考慮する。超大規模特徴量の場合、文の長さをほぼ線形に縮めることができる。
論文参考訳（メタデータ） (2023-04-10T05:52:38Z)
A Nearly-Optimal Bound for Fast Regression with $\ell_\infty$ Guarantee [16.409210914237086]
行列 $Ain mathbbRntimes d$ とテンソル $bin mathbbRn$ が与えられたとき、 $ell_infty$ の回帰問題を考える。このような$ell_infty$レグレッションの保証を得るためには、濃密なスケッチ行列を使わなければならない。我々はまた、OCE(Oblivious Coordinate-wise Embedding)特性を利用した $ell_infty$ guarantee regression のための新しい分析フレームワークを開発した。
論文参考訳（メタデータ） (2023-02-01T05:22:40Z)
Linear Bandits on Uniformly Convex Sets [88.3673525964507]
線形バンディットアルゴリズムはコンパクト凸作用集合上の $tildemathcalo(nsqrtt)$ pseudo-regret 境界を与える。 2種類の構造的仮定は、より良い擬似回帰境界をもたらす。
論文参考訳（メタデータ） (2021-03-10T07:33:03Z)
Problem-Complexity Adaptive Model Selection for Stochastic Linear Bandits [20.207989166682832]
2つの一般的な線形バンディット設定におけるモデル選択の問題について考察する。まず、[K]$におけるarm $iの平均的な報酬は、$mu_i+ langle alpha_i,t,theta*|$である。我々は、ALBが$O(|theta*|sqrtT)$の後悔のスケーリングを達成することを示す。
論文参考訳（メタデータ） (2020-06-04T02:19:00Z)
Taking a hint: How to leverage loss predictors in contextual bandits? [63.546913998407405]
我々は,損失予測の助けを借りて,文脈的包帯における学習を研究する。最適な後悔は$mathcalO(minsqrtT, sqrtmathcalETfrac13)$である。
論文参考訳（メタデータ） (2020-03-04T07:36:38Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。