Fugu-MT 論文翻訳(概要): Leveraging Good Representations in Linear Contextual Bandits

論文の概要: Leveraging Good Representations in Linear Contextual Bandits

arxiv url: http://arxiv.org/abs/2104.03781v1
Date: Thu, 8 Apr 2021 14:05:31 GMT
ステータス: 翻訳完了
システム内更新日: 2021-04-09 12:58:41.207874
Title: Leveraging Good Representations in Linear Contextual Bandits
Title（参考訳）: 線形文脈バンディットにおける良き表現の活用
Authors: Matteo Papini, Andrea Tirinzoni, Marcello Restelli, Alessandro Lazaric and Matteo Pirotta
Abstract要約: 文脈的バンディット問題は複数の線形表現を許容することがある。最近の研究は、絶え間ない問題依存の後悔を達成できる「良い」表現が存在することを示した。最善の表現でlinucbを実行することで得られる後悔よりも、後悔は決して悪くありません。
参考スコア（独自算出の注目度）: 131.91060536108301
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The linear contextual bandit literature is mostly focused on the design of efficient learning algorithms for a given representation. However, a contextual bandit problem may admit multiple linear representations, each one with different characteristics that directly impact the regret of the learning algorithm. In particular, recent works showed that there exist "good" representations for which constant problem-dependent regret can be achieved. In this paper, we first provide a systematic analysis of the different definitions of "good" representations proposed in the literature. We then propose a novel selection algorithm able to adapt to the best representation in a set of $M$ candidates. We show that the regret is indeed never worse than the regret obtained by running LinUCB on the best representation (up to a $\ln M$ factor). As a result, our algorithm achieves constant regret whenever a "good" representation is available in the set. Furthermore, we show that the algorithm may still achieve constant regret by implicitly constructing a "good" representation, even when none of the initial representations is "good". Finally, we empirically validate our theoretical findings in a number of standard contextual bandit problems.
Abstract（参考訳）: 線形文脈バンディット文学は主に、与えられた表現に対する効率的な学習アルゴリズムの設計に焦点を当てている。しかし、文脈的バンディット問題は、学習アルゴリズムの後悔に直接影響を及ぼす異なる特徴を持つ複数の線形表現を許容することがある。特に、最近の研究は、一定の問題依存的後悔が達成できる「良い」表現が存在することを示した。本稿ではまず,文献で提案されている「良い」表現の異なる定義を体系的に分析する。そこで我々は,$M$の候補集合において,最適な表現に適応できる新しい選択アルゴリズムを提案する。我々は、LinUCBを最良の表現($\ln M$ factorまで)で実行したことによる後悔よりも、後悔は決して悪いことではないことを示した。その結果,本アルゴリズムは,集合内で「よい」表現が利用可能であれば,常に後悔する。さらに,初期表現が「良い」場合であっても,暗黙的に「良い」表現を構築することによって,アルゴリズムが常に後悔することを示す。最後に,多くの標準的な文脈的包帯問題における理論的知見を実証的に検証した。

論文の概要: Leveraging Good Representations in Linear Contextual Bandits

関連論文リスト