Fugu-MT 論文翻訳(概要): Problem-Complexity Adaptive Model Selection for Stochastic Linear Bandits

論文の概要: Problem-Complexity Adaptive Model Selection for Stochastic Linear Bandits

arxiv url: http://arxiv.org/abs/2006.02612v2
Date: Mon, 15 Jun 2020 18:55:38 GMT
ステータス: 翻訳完了
システム内更新日: 2022-11-25 09:15:59.791712
Title: Problem-Complexity Adaptive Model Selection for Stochastic Linear Bandits
Title（参考訳）: 確率線形バンディットに対する問題複雑性適応モデル選択
Authors: Avishek Ghosh, Abishek Sankararaman and Kannan Ramchandran
Abstract要約: 2つの一般的な線形バンディット設定におけるモデル選択の問題について考察する。まず、[K]$におけるarm $iの平均的な報酬は、$mu_i+ langle alpha_i,t,theta*|$である。我々は、ALBが$O(|theta*|sqrtT)$の後悔のスケーリングを達成することを示す。
参考スコア（独自算出の注目度）: 20.207989166682832
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We consider the problem of model selection for two popular stochastic linear bandit settings, and propose algorithms that adapts to the unknown problem complexity. In the first setting, we consider the $K$ armed mixture bandits, where the mean reward of arm $i \in [K]$, is $\mu_i+ \langle \alpha_{i,t},\theta^* \rangle $, with $\alpha_{i,t} \in \mathbb{R}^d$ being the known context vector and $\mu_i \in [-1,1]$ and $\theta^*$ are unknown parameters. We define $\|\theta^*\|$ as the problem complexity and consider a sequence of nested hypothesis classes, each positing a different upper bound on $\|\theta^*\|$. Exploiting this, we propose Adaptive Linear Bandit (ALB), a novel phase based algorithm that adapts to the true problem complexity, $\|\theta^*\|$. We show that ALB achieves regret scaling of $O(\|\theta^*\|\sqrt{T})$, where $\|\theta^*\|$ is apriori unknown. As a corollary, when $\theta^*=0$, ALB recovers the minimax regret for the simple bandit algorithm without such knowledge of $\theta^*$. ALB is the first algorithm that uses parameter norm as model section criteria for linear bandits. Prior state of art algorithms \cite{osom} achieve a regret of $O(L\sqrt{T})$, where $L$ is the upper bound on $\|\theta^*\|$, fed as an input to the problem. In the second setting, we consider the standard linear bandit problem (with possibly an infinite number of arms) where the sparsity of $\theta^*$, denoted by $d^* \leq d$, is unknown to the algorithm. Defining $d^*$ as the problem complexity, we show that ALB achieves $O(d^*\sqrt{T})$ regret, matching that of an oracle who knew the true sparsity level. This methodology is then extended to the case of finitely many arms and similar results are proven. This is the first algorithm that achieves such model selection guarantees. We further verify our results via synthetic and real-data experiments.
Abstract（参考訳）: 2つの一般的な確率線形バンディット設定に対するモデル選択の問題を検討し,未知問題に適応するアルゴリズムを提案する。最初の設定では、腕$i \in [K]$の平均的な報酬が$\mu_i+ \langle \alpha_{i,t},\theta^* \rangle $, with $\alpha_{i,t} \in \mathbb{R}^d$が既知のコンテキストベクトルであり、$\mu_i \in [-1,1]$と$\theta^*$が未知のパラメータであるような、K$の武装混合包帯を考える。問題複雑性として$\|\theta^*\|$ と定義し、ネスト化された仮説クラスの列を考え、それぞれ$\|\theta^*\|$ 上の異なる上限を仮定する。そこで我々は,真の問題複雑性に対応する新しい位相ベースのアルゴリズムであるAdaptive Linear Bandit (ALB) を提案する。 alb は $o(\|\theta^*\|\sqrt{t})$ の後悔のスケーリングを達成できることを示し、ここで $\|\theta^*\|$ は apriori 未知であることを示した。結論として、$\theta^*=0$ のとき、ALB は$\theta^*$ の知識のない単純なバンドイットアルゴリズムに対するミニマックスの後悔を回復する。 ALBは、パラメータノルムを線形帯域のモデルセクション基準として利用する最初のアルゴリズムである。以前の状態のアルゴリズム \cite{osom} は$o(l\sqrt{t})$ の後悔を達成し、ここで $l$ は問題への入力として与えられる$\|\theta^*\|$ の上限である。第2の設定では、標準線形バンディット問題(おそらく無限個のアームを持つ)を考える。ここでは、$d^* \leq d$ で表される$\theta^*$ のスパース性はアルゴリズムには未知である。問題複雑性として $d^*$ を定義すると、alb が$o(d^*\sqrt{t})$ を達成でき、真のスパーシティレベルを知っているオラクルのそれと一致することが分かる。この方法論は有限個の腕の場合に拡張され、同様の結果が証明される。このようなモデル選択を保証する最初のアルゴリズムである。さらに、合成および実データ実験により結果を検証する。

関連論文リスト

Distribution-Independent Regression for Generalized Linear Models with Oblivious Corruptions [49.69852011882769]
一般化線形モデル (GLMs) の重畳雑音の存在下での回帰問題に対する最初のアルゴリズムを示す。本稿では,この問題に最も一般的な分布非依存設定で対処するアルゴリズムを提案する。これは、サンプルの半分以上を任意に破損させる難聴ノイズを持つGLMレグレッションに対する最初の新しいアルゴリズムによる結果である。
論文参考訳（メタデータ） (2023-09-20T21:41:59Z)
A/B Testing and Best-arm Identification for Linear Bandits with Robustness to Non-stationarity [28.068960555415014]
非定常環境下での線形包帯の固定予算ベストアーム識別問題について検討する。アルゴリズムは、可能な限り高い確率で最適な腕 $x* := argmax_xinmathcalXxtopsum_t=1Ttheta_t$ を正しく識別することを目的としている。
論文参考訳（メタデータ） (2023-07-27T19:03:36Z)
Variance-Aware Sparse Linear Bandits [64.70681598741417]
余分な線形包帯に対する最悪のミニマックスは$widetildeThetaleft(sqrtdTright)$である。ノイズがなく、アクションセットが単位球面である良性設定では、ディビジョン・アンド・コンカーを使用して、$widetildemathcal O(1)$ regretを達成することができる。我々は,任意の分散対応線形帯域幅アルゴリズムを分散対応線形帯域幅アルゴリズムに変換する汎用フレームワークを開発した。
論文参考訳（メタデータ） (2022-05-26T15:55:44Z)
Corralling a Larger Band of Bandits: A Case Study on Switching Regret for Linear Bandits [99.86860277006318]
本稿では,一組の逆アルゴリズムを組み合わせ,学習することの問題点について考察する。 Agarwal et al. の CORRAL はこの目標を、$widetildeO(sqrtd S T)$ の残酷なオーバーヘッドで達成している。この問題に触発されて、後悔のオーバーヘッドが百万ドルにしか依存しない大規模バンディットアルゴリズムのバンドを囲む新しいレシピを提案する。
論文参考訳（メタデータ） (2022-02-12T21:55:44Z)
Contextual Recommendations and Low-Regret Cutting-Plane Algorithms [49.91214213074933]
本稿では、ナビゲーションエンジンやレコメンデーションシステムにおけるルーティングアプリケーションによって動機付けられた、コンテキスト線形帯域の次の変種について考察する。我々は、真の点$w*$と分離オラクルが返す超平面の間の全距離を、低い「回帰」を持つ新しい切断平面アルゴリズムを設計する。
論文参考訳（メタデータ） (2021-06-09T05:39:05Z)
Learning a Latent Simplex in Input-Sparsity Time [58.30321592603066]
我々は、$AinmathbbRdtimes n$へのアクセスを考えると、潜入$k$-vertex simplex $KsubsetmathbbRdtimes n$を学習する問題を考える。実行時間における$k$への依存は、トップ$k$特異値の質量が$a$であるという自然な仮定から不要であることを示す。
論文参考訳（メタデータ） (2021-05-17T16:40:48Z)
Optimal Regret Algorithm for Pseudo-1d Bandit Convex Optimization [51.23789922123412]
我々は,バンディットフィードバックを用いてオンライン学習を学習する。 learnerは、コスト/リワード関数が"pseudo-1d"構造を許可するゼロ次オラクルのみにアクセスできる。我々は、$T$がラウンドの数である任意のアルゴリズムの後悔のために$min(sqrtdT、T3/4)$の下限を示しています。ランダム化オンライングラデーション下降とカーネル化指数重み法を組み合わせた新しいアルゴリズムsbcalgを提案し,疑似-1d構造を効果的に活用する。
論文参考訳（メタデータ） (2021-02-15T08:16:51Z)
Non-Convex Compressed Sensing with Training Data [0.0]
我々は、行列$A$上の比較的少数の仮定を持つ1層の線形ニューラルネットワークの範囲において高い確率で元の問題$Ax = b$の解決策を見つける。本稿では、適切な初期値の代わりに、圧縮センシング問題に関連する追加のトレーニング問題 $Ax = B_l$, $l=1, dots, p$ が提供される代替案を検討する。
論文参考訳（メタデータ） (2021-01-20T20:30:59Z)
Near-Optimal Regret Bounds for Contextual Combinatorial Semi-Bandits with Linear Payoff Functions [53.77572276969548]
我々は、C$2$UCBアルゴリズムが分割マトロイド制約に対して最適な後悔結合$tildeO(dsqrtkT + dk)$を有することを示した。一般的な制約に対して,C$2$UCBアルゴリズムで腕の報酬推定値を変更するアルゴリズムを提案する。
論文参考訳（メタデータ） (2021-01-20T04:29:18Z)
Thresholded Lasso Bandit [70.17389393497125]
Thresholded Lasso banditは、報酬関数を定義するベクトルとスパースサポートを推定するアルゴリズムである。一般には $mathcalO( log d + sqrtT )$ や $mathcalO( log d + sqrtT )$ としてスケールする非漸近的後悔の上界を確立する。
論文参考訳（メタデータ） (2020-10-22T19:14:37Z)
Optimal Strategies for Graph-Structured Bandits [0.0]
Bernoulli $!=!(nu_a,b)_a in mathcalA, b in mathcalB$ with means $(mu_a,b)_a in mathcalA, b in mathcalB$ with means $(mu_a,b)_a
論文参考訳（メタデータ） (2020-07-07T06:27:51Z)
Explicit Best Arm Identification in Linear Bandits Using No-Regret Learners [17.224805430291177]
線形パラメータ化マルチアームバンドにおけるベストアーム識別の問題について検討する。そこで本研究では,この問題を解決するために,明示的に実装可能かつ証明可能な順序-最適サンプル-複雑度アルゴリズムを提案する。
論文参考訳（メタデータ） (2020-06-13T05:00:01Z)
Low-Rank Generalized Linear Bandit Problems [19.052901817304743]
低ランク線型バンドイット問題において、作用の報酬は、作用と未知の低ランク行列$Theta*$の間の内部積である。低ランク行列の被覆によって構築される指数重み付き平均予測器と、オンラインから信頼度セットへの変換パバシ2012onlineの新たな組み合わせに基づくアルゴリズムを提案する。
論文参考訳（メタデータ） (2020-06-04T15:30:14Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。