Fugu-MT 論文翻訳(概要): Minimax Rate-Optimal Algorithms for High-Dimensional Stochastic Linear Bandits

論文の概要: Minimax Rate-Optimal Algorithms for High-Dimensional Stochastic Linear Bandits

arxiv url: http://arxiv.org/abs/2505.17400v1
Date: Fri, 23 May 2025 02:20:00 GMT
ステータス: 翻訳完了
システム内更新日: 2025-05-26 18:08:33.770449
Title: Minimax Rate-Optimal Algorithms for High-Dimensional Stochastic Linear Bandits
Title（参考訳）: 高次元確率線形帯域に対する最小速度最適アルゴリズム
Authors: Jingyu Liu, Yanglei Song,
Abstract要約: 我々は、T$のラウンドで複数のアームで線形バンディット問題を研究した。ラッソ推定器はシーケンシャルセッティングにおいて確実に準最適であることを示す。しきい値付きラッソを主推定法として用いた3段アーム選択アルゴリズムを提案する。
参考スコア（独自算出の注目度）: 1.2010968598596632
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We study the stochastic linear bandit problem with multiple arms over $T$ rounds, where the covariate dimension $d$ may exceed $T$, but each arm-specific parameter vector is $s$-sparse. We begin by analyzing the sequential estimation problem in the single-arm setting, focusing on cumulative mean-squared error. We show that Lasso estimators are provably suboptimal in the sequential setting, exhibiting suboptimal dependence on $d$ and $T$, whereas thresholded Lasso estimators -- obtained by applying least squares to the support selected by thresholding an initial Lasso estimator -- achieve the minimax rate. Building on these insights, we consider the full linear contextual bandit problem and propose a three-stage arm selection algorithm that uses thresholded Lasso as the main estimation method. We derive an upper bound on the cumulative regret of order $s(\log s)(\log d + \log T)$, and establish a matching lower bound up to a $\log s$ factor, thereby characterizing the minimax regret rate up to a logarithmic term in $s$. Moreover, when a short initial period is excluded from the regret, the proposed algorithm achieves exact minimax optimality.
Abstract（参考訳）: 共変次元$d$が$T$を超える場合、各アーム固有のパラメータベクトルは$s$-sparseである。まず, 累積平均二乗誤差に着目し, 単腕設定における逐次推定問題を解析することから始める。一方、初期ラッソ推定器をしきい値にすることで選択した支持体に最小二乗を施すことにより得られる閾値のラッソ推定器は、最小の最大値を得る。これらの知見に基づいて,完全線形文脈帯域幅問題について考察し,しきい値ラッソを主推定法として用いた3段アーム選択アルゴリズムを提案する。位数 $s(\log s)(\log d + \log T)$ の累積後悔の上限を導出し、一致する下限を $\log s$ factor に設定することで、最小最大後悔率を $s$ の対数項まで特徴づける。さらに,後悔から短い初期期間を除外した場合,提案アルゴリズムは最小限の最適性を正確に達成する。

関連論文リスト

Linear bandits with polylogarithmic minimax regret [8.97780713904412]
本研究では,未知ベクトルに近づいた単位球上での動作を選択すると,サブガウス雑音パラメータが線形に消滅する線形帯域の雑音モデルについて検討する。我々は,この問題に対するアルゴリズムを導入し,この最小限の後悔のスケーリングを,時間軸で$log3(T)$,時間軸で$T$として示す。
論文参考訳（メタデータ） (2024-02-19T10:56:47Z)
Nearly Minimax Optimal Regret for Learning Linear Mixture Stochastic Shortest Path [80.60592344361073]
線形混合遷移カーネルを用いた最短経路(SSP)問題について検討する。エージェントは繰り返し環境と対話し、累積コストを最小化しながら特定の目標状態に到達する。既存の作業は、イテレーションコスト関数の厳密な下限や、最適ポリシーに対する期待長の上限を仮定することが多い。
論文参考訳（メタデータ） (2024-02-14T07:52:00Z)
Fixed-Budget Best-Arm Identification in Sparse Linear Bandits [69.6194614504832]
固定予算設定下での疎線形包帯のベストアーム識別問題について検討した。我々は2相アルゴリズムであるLassoとOptimal-Design-(Lasso-OD)をベースとした線形ベストアーム識別を設計する。我々はラッソODが指数においてほぼ極小であることを示す。
論文参考訳（メタデータ） (2023-11-01T12:32:17Z)
Best Policy Identification in Linear MDPs [70.57916977441262]
縮退した線形マルコフ+デルタ決定における最適同定問題について, 生成モデルに基づく固定信頼度設定における検討を行った。複雑な非最適化プログラムの解としての下位境界は、そのようなアルゴリズムを考案する出発点として用いられる。
論文参考訳（メタデータ） (2022-08-11T04:12:50Z)
Optimal and instance-dependent guarantees for Markovian linear stochastic approximation [47.912511426974376]
標準スキームの最後の繰り返しの2乗誤差に対して、$t_mathrmmix tfracdn$の非漸近境界を示す。マルコフ雑音による政策評価について,これらの結果のまとめを導出する。
論文参考訳（メタデータ） (2021-12-23T18:47:50Z)
Breaking the Sample Complexity Barrier to Regret-Optimal Model-Free Reinforcement Learning [52.76230802067506]
漸進的強化学習における後悔を最小限に抑えるために,新しいモデルフリーアルゴリズムを提案する。提案アルゴリズムは、2つのQ-ラーニングシーケンスの助けを借りて、初期設定された参照更新ルールを用いる。初期の分散還元法の設計原理は、他のRL設定とは独立した関心を持つかもしれない。
論文参考訳（メタデータ） (2021-10-09T21:13:48Z)
Navigating to the Best Policy in Markov Decision Processes [68.8204255655161]
マルコフ決定過程における純粋探索問題について検討する。エージェントはアクションを逐次選択し、結果のシステム軌道から可能な限り早くベストを目標とする。
論文参考訳（メタデータ） (2021-06-05T09:16:28Z)
Optimal Algorithms for Stochastic Multi-Armed Bandits with Heavy Tailed Rewards [24.983866845065926]
我々は、重い尾の報酬を持つマルチアームのバンディットを考えており、そのp$-thのモーメントは、定数$nu_p$が1pleq2$である。本稿では,従来の情報として$nu_p$を必要としない新しいロバストな推定器を提案する。提案した推定器の誤差確率は指数関数的に高速に減衰することを示す。
論文参考訳（メタデータ） (2020-10-24T10:44:02Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。