Fugu-MT 論文翻訳(概要): Exploration in Linear Bandits with Rich Action Sets and its Implications for Inference

論文の概要: Exploration in Linear Bandits with Rich Action Sets and its Implications for Inference

arxiv url: http://arxiv.org/abs/2207.11597v1
Date: Sat, 23 Jul 2022 20:25:07 GMT
ステータス: 翻訳完了
システム内更新日: 2022-07-26 13:04:52.694643
Title: Exploration in Linear Bandits with Rich Action Sets and its Implications for Inference
Title（参考訳）: リッチな作用集合をもつ線形バンディットの探索とその推論への応用
Authors: Debangshu Banerjee, Avishek Ghosh, Sayak Ray Chowdhury, Aditya Gopalan
Abstract要約: 期待行列の最小固有値は、アルゴリズムの累積後悔が$sqrtn)$であるときに、$Omega(sqrtn)として増加することを示す。本研究は, 線形帯域におけるEmphmodel selectionとEmphclusteringの2つの実践シナリオに適用する。
参考スコア（独自算出の注目度）: 23.364534479492715
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We present a non-asymptotic lower bound on the eigenspectrum of the design matrix generated by any linear bandit algorithm with sub-linear regret when the action set has well-behaved curvature. Specifically, we show that the minimum eigenvalue of the expected design matrix grows as $\Omega(\sqrt{n})$ whenever the expected cumulative regret of the algorithm is $O(\sqrt{n})$, where $n$ is the learning horizon, and the action-space has a constant Hessian around the optimal arm. This shows that such action-spaces force a polynomial lower bound rather than a logarithmic lower bound, as shown by \cite{lattimore2017end}, in discrete (i.e., well-separated) action spaces. Furthermore, while the previous result is shown to hold only in the asymptotic regime (as $n \to \infty$), our result for these ``locally rich" action spaces is any-time. Additionally, under a mild technical assumption, we obtain a similar lower bound on the minimum eigen value holding with high probability. We apply our result to two practical scenarios -- \emph{model selection} and \emph{clustering} in linear bandits. For model selection, we show that an epoch-based linear bandit algorithm adapts to the true model complexity at a rate exponential in the number of epochs, by virtue of our novel spectral bound. For clustering, we consider a multi agent framework where we show, by leveraging the spectral result, that no forced exploration is necessary -- the agents can run a linear bandit algorithm and estimate their underlying parameters at once, and hence incur a low regret.
Abstract（参考訳）: 本稿では,任意の線形バンディットアルゴリズムが生成する設計行列の固有スペクトル上の非漸近下界について,作用集合の曲率が良好であるとき,その部分線形後悔を伴うことを述べる。具体的には、期待される設計行列の最小固有値は、アルゴリズムの期待される累積後悔が$O(\sqrt{n})$であるときに$\Omega(\sqrt{n})$として成長することを示す。このことは、そのような作用空間が、離散(すなわち、十分に分離された)作用空間において、対数下界ではなく多項式下界を強制することを示している。さらに、前回の結果は漸近的な状態($n \to \infty$ として)でのみ成立することが示されるが、これらの ``locally rich' な作用空間に対する我々の結果はいつでも成り立つ。さらに、軽度の技術的仮定の下では、確率の高い最小固有値保持に関する同様の下限が得られる。我々は,線形包帯におけるemph{model selection} と \emph{clustering} の2つの実践シナリオに適用する。モデル選択については,新しいスペクトル境界を生かして,エポックベース線形バンディットアルゴリズムがエポック数に指数関数的な速度での真のモデル複雑性に適応することを示す。クラスタリングでは、スペクトル結果を活用することで、強制的な探索は不要であることを示すマルチエージェントフレームワークを検討する。エージェントは線形バンディットアルゴリズムを実行し、その基礎となるパラメータを一度に見積もることで、後悔を少なくすることができる。

関連論文リスト

Linear Bandits with Partially Observable Features [35.08645010112184]
本稿では,部分的に観測可能な特徴を持つ線形帯域問題を導入し,部分的な報奨情報と突発的な推定を行う。遅延部分に対する適切なアドレスがなければ、後悔は、報酬に対する影響が不明であるため、決定の地平線上で線形に増加する可能性がある。本稿では,潜在特徴を扱うための新しい解析法と,サブ線形後悔を実現するアルゴリズムを提案する。
論文参考訳（メタデータ） (2025-02-10T04:15:38Z)
Tangential Randomization in Linear Bandits (TRAiL): Guaranteed Inference and Regret Bounds [1.03590082373586]
本稿では,線形帯域探索アルゴリズムTRAiLの提案と解析を行う。 TraiLは、設計(回帰器)行列の最小固有値によって測定された推論品質の$Omega(sqrtT)$成長を保証する。我々は,期待された後悔に対して,任意のアルゴリズムに対して$Omega(sqrtT)$ minimax小境界を特徴付ける。
論文参考訳（メタデータ） (2024-11-19T01:08:13Z)
Linear bandits with polylogarithmic minimax regret [8.97780713904412]
本研究では,未知ベクトルに近づいた単位球上での動作を選択すると,サブガウス雑音パラメータが線形に消滅する線形帯域の雑音モデルについて検討する。我々は,この問題に対するアルゴリズムを導入し,この最小限の後悔のスケーリングを,時間軸で$log3(T)$,時間軸で$T$として示す。
論文参考訳（メタデータ） (2024-02-19T10:56:47Z)
Nearly Minimax Algorithms for Linear Bandits with Shared Representation [86.79657561369397]
我々は、次元が$d$で、それぞれ$T$のラウンドで$M$リニアバンディットをプレイする設定を考え、これらの$M$リニアバンディットタスクは共通の$k(ll d)$次元リニア表現を共有する。我々は$widetildeOleft(dsqrtkMT + kMsqrtTright)$ regret boundsを達成する新しいアルゴリズムを考案した。
論文参考訳（メタデータ） (2022-03-29T15:27:13Z)
Minimax Optimal Quantization of Linear Models: Information-Theoretic Limits and Efficient Algorithms [59.724977092582535]
測定から学習した線形モデルの定量化の問題を考える。この設定の下では、ミニマックスリスクに対する情報理論の下限を導出する。本稿では,2層ReLUニューラルネットワークに対して,提案手法と上界を拡張可能であることを示す。
論文参考訳（メタデータ） (2022-02-23T02:39:04Z)
High-Dimensional Sparse Linear Bandits [67.9378546011416]
データ・ポーア・システマティクスにおける疎線形包帯に対して、新しい$Omega(n2/3)$ dimension-free minimax regret lower boundを導出する。また、関連する特徴に対する信号の大きさに関する追加の仮定の下で、次元のない$O(sqrtn)$ regret上界も証明する。
論文参考訳（メタデータ） (2020-11-08T16:48:11Z)
Stochastic Linear Bandits with Protected Subspace [51.43660657268171]
線形目的関数を最適化するが、報酬は未知の部分空間にのみ得られる線形帯域問題の変種について検討する。特に、各ラウンドでは、学習者は、目的または保護されたサブスペースを、アクションの選択とともにクエリするかどうかを選択する必要がある。提案アルゴリズムはOFULの原理から導かれるもので,保護された空間を推定するためにクエリのいくつかを利用する。
論文参考訳（メタデータ） (2020-11-02T14:59:39Z)
Nearly Dimension-Independent Sparse Linear Bandit over Small Action Spaces via Best Subset Selection [71.9765117768556]
本研究では,高次元線形モデルの下での文脈的帯域問題について考察する。この設定は、パーソナライズされたレコメンデーション、オンライン広告、パーソナライズされた医療など、不可欠な応用を見出す。本稿では,最適部分集合選択法を用いて2重成長エポックを推定する手法を提案する。
論文参考訳（メタデータ） (2020-09-04T04:10:39Z)
Stochastic Bandits with Linear Constraints [69.757694218456]
制約付き文脈線形帯域設定について検討し、エージェントの目標は一連のポリシーを作成することである。楽観的悲観的線形帯域(OPLB)と呼ばれる,この問題に対する高信頼束縛アルゴリズムを提案する。
論文参考訳（メタデータ） (2020-06-17T22:32:19Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。