Fugu-MT 論文翻訳(概要): Optimal Gradient-based Algorithms for Non-concave Bandit Optimization

論文の概要: Optimal Gradient-based Algorithms for Non-concave Bandit Optimization

arxiv url: http://arxiv.org/abs/2107.04518v1
Date: Fri, 9 Jul 2021 16:04:24 GMT
ステータス: 翻訳完了
システム内更新日: 2021-07-12 17:37:40.293080
Title: Optimal Gradient-based Algorithms for Non-concave Bandit Optimization
Title（参考訳）: 非凹帯域最適化のための最適勾配アルゴリズム
Authors: Baihe Huang, Kaixuan Huang, Sham M. Kakade, Jason D. Lee, Qi Lei, Runzhe Wang, Jiaqi Yang
Abstract要約: この研究は、未知の報酬関数が非可逆であるようなバンドイット問題の大群を考察する。我々のアルゴリズムは、非常に一般化されたゼロ階最適化のパラダイムに基づいている。標準的な楽観的アルゴリズムは次元因子によって準最適であることを示す。
参考スコア（独自算出の注目度）: 76.57464214864756
License: http://creativecommons.org/publicdomain/zero/1.0/
Abstract: Bandit problems with linear or concave reward have been extensively studied, but relatively few works have studied bandits with non-concave reward. This work considers a large family of bandit problems where the unknown underlying reward function is non-concave, including the low-rank generalized linear bandit problems and two-layer neural network with polynomial activation bandit problem. For the low-rank generalized linear bandit problem, we provide a minimax-optimal algorithm in the dimension, refuting both conjectures in [LMT21, JWWN19]. Our algorithms are based on a unified zeroth-order optimization paradigm that applies in great generality and attains optimal rates in several structured polynomial settings (in the dimension). We further demonstrate the applicability of our algorithms in RL in the generative model setting, resulting in improved sample complexity over prior approaches. Finally, we show that the standard optimistic algorithms (e.g., UCB) are sub-optimal by dimension factors. In the neural net setting (with polynomial activation functions) with noiseless reward, we provide a bandit algorithm with sample complexity equal to the intrinsic algebraic dimension. Again, we show that optimistic approaches have worse sample complexity, polynomial in the extrinsic dimension (which could be exponentially worse in the polynomial degree).
Abstract（参考訳）: 線形あるいは凹面報酬のバンドイット問題は広く研究されているが、非凹面報酬のバンドイットの研究は比較的少ない。本研究は、低ランク一般化線形バンディット問題や多項式活性化バンディット問題を持つ2層ニューラルネットワークなど、未知の報酬関数が凹凸でないバンディット問題の大きなファミリーを考察する。低ランク一般化線形バンドイット問題に対しては、[LMT21, JWWN19] における両方の予想を反論するミニマックス最適化アルゴリズムを提供する。我々のアルゴリズムは、非常に一般化されたゼロ階最適化パラダイムに基づいており、(次元において)いくつかの構造化多項式設定において最適な速度が得られる。さらに、生成モデル設定におけるRLにおけるアルゴリズムの適用性を実証し、従来の手法よりもサンプルの複雑さが向上した。最後に、標準楽観的アルゴリズム(例:ucb)が次元因子によって最適化されることを示す。雑音のない報酬を持つニューラルネット設定(多項式アクティベーション関数付き)では、本質的な代数次元に等しいサンプリング複雑性を持つバンディットアルゴリズムを提供する。また、楽観的なアプローチはサンプルの複雑さが悪く、外部次元の多項式(多項式次数において指数関数的に悪い)があることを示した。

関連論文リスト

Indexed Minimum Empirical Divergence-Based Algorithms for Linear Bandits [55.938644481736446]
Indexed Minimum Empirical Divergence (IMED)は、マルチアームバンディット問題に対する非常に効果的なアプローチである。 UCBベースのアルゴリズムとトンプソンサンプリングを実証的に上回ることが観察されている。我々は、LinIMEDアルゴリズムのファミリーと呼ぶIMEDアルゴリズムの新しい線形バージョンを提案する。
論文参考訳（メタデータ） (2024-05-24T04:11:58Z)
Second Order Methods for Bandit Optimization and Control [34.51425758864638]
我々は,大規模な凸関数に対して,このアルゴリズムが最適($kappa$-2020と呼ぶ凸関数の観点で)となることを示す。また,メモリを用いたオンライン凸最適化への2次帯域幅アルゴリズムの適用について検討した。
論文参考訳（メタデータ） (2024-02-14T04:03:38Z)
Minimax Optimization with Smooth Algorithmic Adversaries [59.47122537182611]
対戦相手が展開するスムーズなアルゴリズムに対して,Min-playerの新しいアルゴリズムを提案する。本アルゴリズムは,制限周期のない単調進行を保証し,適切な勾配上昇数を求める。
論文参考訳（メタデータ） (2021-06-02T22:03:36Z)
A Simple Unified Framework for High Dimensional Bandit Problems [33.139925285802825]
本稿では,アルゴリズムの上界を後悔する一般的な解析フレームワークを提案する。本アルゴリズムは,異なる高次元バンディット問題に適用できることを示した。
論文参考訳（メタデータ） (2021-02-18T21:35:32Z)
An Asymptotically Optimal Primal-Dual Incremental Algorithm for Contextual Linear Bandits [129.1029690825929]
複数の次元に沿った最先端技術を改善する新しいアルゴリズムを提案する。非文脈線形帯域の特別な場合において、学習地平線に対して最小限の最適性を確立する。
論文参考訳（メタデータ） (2020-10-23T09:12:47Z)
An Empirical Process Approach to the Union Bound: Practical Algorithms for Combinatorial and Linear Bandits [34.06611065493047]
本稿では、信頼度と予算設定の固定化において、純探索線形帯域問題に対する近似アルゴリズムを提案する。サンプルの複雑性がインスタンスの幾何でスケールし、アームの数に縛られた明示的な結合を避けるアルゴリズムを提供する。また,固定予算設定における線形帯域幅に対する最初のアルゴリズムを提案する。
論文参考訳（メタデータ） (2020-06-21T00:56:33Z)
Model Selection in Contextual Stochastic Bandit Problems [51.94632035240787]
基本アルゴリズムを選択できるメタアルゴリズムを開発した。基本アルゴリズムの1つが$O(sqrtT)$後悔している場合でも、一般的には$Omega(sqrtT)$後悔よりも良いものを得ることはできません。
論文参考訳（メタデータ） (2020-03-03T18:46:34Z)
Optimal Randomized First-Order Methods for Least-Squares Problems [56.05635751529922]
このアルゴリズムのクラスは、最小二乗問題に対する最も高速な解法のうち、いくつかのランダム化手法を含んでいる。我々は2つの古典的埋め込み、すなわちガウス射影とアダマール変換のサブサンプリングに焦点を当てる。得られたアルゴリズムは条件数に依存しない最小二乗問題の解法として最も複雑である。
論文参考訳（メタデータ） (2020-02-21T17:45:32Z)
Bandit algorithms to emulate human decision making using probabilistic distortions [20.422725678982726]
報奨分布に歪んだ確率を持つ2つの多重武装バンディット問題を定式化する。以上のような後悔の最小化の問題と、マルチアームバンディットのための最高の腕識別フレームワークについて考察する。
論文参考訳（メタデータ） (2016-11-30T17:37:51Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。