Fugu-MT 論文翻訳(概要): Meta Representation Learning with Contextual Linear Bandits

論文の概要: Meta Representation Learning with Contextual Linear Bandits

arxiv url: http://arxiv.org/abs/2205.15100v1
Date: Mon, 30 May 2022 13:43:53 GMT
ステータス: 翻訳完了
システム内更新日: 2022-05-31 14:50:08.321625
Title: Meta Representation Learning with Contextual Linear Bandits
Title（参考訳）: 文脈線形帯域を用いたメタ表現学習
Authors: Leonardo Cella, Karim Lounici, Massimiliano Pontil
Abstract要約: 線形バンディットタスクの設定におけるメタラーニングについて検討する。学習した表現が未知の表現をうまく推定すると、下流のタスクを効率的に学習できることが示される。
参考スコア（独自算出の注目度）: 34.77618818693938
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Meta-learning seeks to build algorithms that rapidly learn how to solve new learning problems based on previous experience. In this paper we investigate meta-learning in the setting of stochastic linear bandit tasks. We assume that the tasks share a low dimensional representation, which has been partially acquired from previous learning tasks. We aim to leverage this information in order to learn a new downstream bandit task, which shares the same representation. Our principal contribution is to show that if the learned representation estimates well the unknown one, then the downstream task can be efficiently learned by a greedy policy that we propose in this work. We derive an upper bound on the regret of this policy, which is, up to logarithmic factors, of order $r\sqrt{N}(1\vee \sqrt{d/T})$, where $N$ is the horizon of the downstream task, $T$ is the number of training tasks, $d$ the ambient dimension and $r \ll d$ the dimension of the representation. We highlight that our strategy does not need to know $r$. We note that if $T> d$ our bound achieves the same rate of optimal minimax bandit algorithms using the true underlying representation. Our analysis is inspired and builds in part upon previous work on meta-learning in the i.i.d. full information setting \citep{tripuraneni2021provable,boursier2022trace}. As a separate contribution we show how to relax certain assumptions in those works, thereby improving their representation learning and risk analysis.
Abstract（参考訳）: メタ学習は、以前の経験に基づいて新しい学習問題の解法を迅速に学習するアルゴリズムの構築を目指している。本稿では,確率線形バンディットタスクの設定におけるメタラーニングについて検討する。従来の学習課題から部分的に取得した低次元表現をタスクが共有していると仮定する。我々は、この情報を利用して、同じ表現を共有する新しい下流バンディットタスクを学習することを目指している。私たちの主な貢献は、学習した表象が未知のものをうまく見積もるならば、下流のタスクは、本研究で提案する欲望のポリシーによって効率的に学習できることを示すことです。 r\sqrt{n}(1\vee \sqrt{d/t})$、ここで$n$は下流タスクの地平線、$t$はトレーニングタスクの数、$d$はアンビエント次元、$r \ll d$は表現の次元である。当社の戦略は$r$を知る必要がないことを強調する。もし$t> d$ 我々の境界が真の基底表現を用いて最適なminimaxbanditアルゴリズムと同じ速度を達成すると注意する。私たちの分析は、部分的には、i.i.d. full information set \citep{tripuraneni2021provable,boursier2022trace}におけるメタラーニングに関する以前の研究に基づいている。個別の貢献として、これらの作業における特定の仮定を緩和する方法を示し、それによって表現学習とリスク分析を改善する。

関連論文リスト

Beyond Task Diversity: Provable Representation Transfer for Sequential Multi-Task Linear Bandits [17.970177214029473]
本研究では,線形バンディットにおける生涯学習について研究し,そこでは学習者が一連の線形バンディットタスクと対話する。現在の文献では、これらのタスクは多様であり、例えば、それらのパラメータは$m$次元の部分空間に一様に存在すると仮定している。タスクの多様性を仮定せずに連続したマルチタスク線形帯域に対する最初の非自明な結果を示す。
論文参考訳（メタデータ） (2025-01-23T05:21:27Z)
Multi-Task Imitation Learning for Linear Dynamical Systems [50.124394757116605]
線形システム上での効率的な模倣学習のための表現学習について検討する。学習対象ポリシーによって生成された軌道上の模倣ギャップは、$tildeOleft(frack n_xHN_mathrmshared + frack n_uN_mathrmtargetright)$で制限されている。
論文参考訳（メタデータ） (2022-12-01T00:14:35Z)
Nearly Minimax Algorithms for Linear Bandits with Shared Representation [86.79657561369397]
我々は、次元が$d$で、それぞれ$T$のラウンドで$M$リニアバンディットをプレイする設定を考え、これらの$M$リニアバンディットタスクは共通の$k(ll d)$次元リニア表現を共有する。我々は$widetildeOleft(dsqrtkMT + kMsqrtTright)$ regret boundsを達成する新しいアルゴリズムを考案した。
論文参考訳（メタデータ） (2022-03-29T15:27:13Z)
Multi-task Representation Learning with Stochastic Linear Bandits [29.8208189270894]
線形バンディットタスクの設定におけるトランスファーラーニングの問題について検討する。我々は,タスク間で低次元線形表現が共有されていると考え,マルチタスク学習環境において,この表現を学習するメリットについて検討する。
論文参考訳（メタデータ） (2022-02-21T09:26:34Z)
Provable Lifelong Learning of Representations [21.440845049501778]
そこで本研究では,内部特徴表現を保守・洗練する,証明可能な生涯学習アルゴリズムを提案する。すべてのタスクにおける任意の所望の精度に対して、表現の次元は、基礎となる表現の次元に近いままであることを示す。
論文参考訳（メタデータ） (2021-10-27T00:41:23Z)
Randomized Exploration for Reinforcement Learning with General Value Function Approximation [122.70803181751135]
本稿では,ランダム化最小二乗値反復(RLSVI)アルゴリズムに着想を得たモデルレス強化学習アルゴリズムを提案する。提案アルゴリズムは,スカラーノイズを用いたトレーニングデータを簡易に摂動させることにより,探索を促進する。我々はこの理論を、既知の困難な探査課題にまたがる実証的な評価で補完する。
論文参考訳（メタデータ） (2021-06-15T02:23:07Z)
Impact of Representation Learning in Linear Bandits [83.17684841392754]
本研究では,表現学習が帯域幅問題の効率性を向上させる方法について検討する。我々は,$widetildeO(TsqrtkN + sqrtdkNT)$ regretを達成する新しいアルゴリズムを提案する。
論文参考訳（メタデータ） (2020-10-13T16:35:30Z)
Meta-learning with Stochastic Linear Bandits [120.43000970418939]
我々は、よく知られたOFULアルゴリズムの正規化バージョンを実装するバンディットアルゴリズムのクラスを考える。我々は,タスク数の増加とタスク分散の分散が小さくなると,タスクを個別に学習する上で,我々の戦略が大きな優位性を持つことを理論的および実験的に示す。
論文参考訳（メタデータ） (2020-05-18T08:41:39Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。