Fugu-MT 論文翻訳(概要): Tight Guarantees for Interactive Decision Making with the Decision-Estimation Coefficient

論文の概要: Tight Guarantees for Interactive Decision Making with the Decision-Estimation Coefficient

arxiv url: http://arxiv.org/abs/2301.08215v1
Date: Thu, 19 Jan 2023 18:24:08 GMT
ステータス: 翻訳完了
システム内更新日: 2023-01-20 14:24:07.140974
Title: Tight Guarantees for Interactive Decision Making with the Decision-Estimation Coefficient
Title（参考訳）: 意思決定係数を用いた対話的意思決定に対する厳密な保証
Authors: Dylan J. Foster, Noah Golowich, Yanjun Han
Abstract要約: 我々は、決定推定係数の新たな変種を導入し、それを用いて、3つの面における事前の作業を改善する新しい下界を導出する。我々は同じ量でスケールした後悔について上界を与え、フォスター等における上界と下界の間のギャップの1つを除いて全てを閉じる。この結果は、後悔のフレームワークとPACフレームワークの両方に適用され、我々が期待するいくつかの新しい分析とアルゴリズム設計技術を利用して、より広範な利用が期待できる。
参考スコア（独自算出の注目度）: 51.37720227675476
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: A foundational problem in reinforcement learning and interactive decision making is to understand what modeling assumptions lead to sample-efficient learning guarantees, and what algorithm design principles achieve optimal sample complexity. Recently, Foster et al. (2021) introduced the Decision-Estimation Coefficient (DEC), a measure of statistical complexity which leads to upper and lower bounds on the optimal sample complexity for a general class of problems encompassing bandits and reinforcement learning with function approximation. In this paper, we introduce a new variant of the DEC, the Constrained Decision-Estimation Coefficient, and use it to derive new lower bounds that improve upon prior work on three fronts: - They hold in expectation, with no restrictions on the class of algorithms under consideration. - They hold globally, and do not rely on the notion of localization used by Foster et al. (2021). - Most interestingly, they allow the reference model with respect to which the DEC is defined to be improper, establishing that improper reference models play a fundamental role. We provide upper bounds on regret that scale with the same quantity, thereby closing all but one of the gaps between upper and lower bounds in Foster et al. (2021). Our results apply to both the regret framework and PAC framework, and make use of several new analysis and algorithm design techniques that we anticipate will find broader use.
Abstract（参考訳）: 強化学習と対話的意思決定における基本的な問題は、モデリング仮定がサンプル効率の学習保証につながるもの、そしてアルゴリズム設計原則が最適なサンプル複雑性を達成するものを理解することである。フォスターらは最近、バンドイットと関数近似による強化学習を包含する一般的な問題の最適標本複雑性の上限を上下に設定する統計複雑性の尺度である決定推定係数(dec)を導入した(2021年)。本稿では,DECの新たな変種であるConstrained Decision-Estimation Coefficientを導入し,それを用いて,従来の3つの面での作業を改善する新しい下限を導出する。 -グローバルに存在し、フォスターら(2021年)のローカライズの概念には依存しない。もっとも興味深いのは、DECが不適切なものと定義されているリファレンスモデルを許容し、不適切な参照モデルが基本的な役割を果たすことを保証することです。我々は同じ量でスケールした後悔の上限について上界を提供し、フォスター等における上界と下界の間のギャップの1つを除いて全てを閉じる(2021年)。本研究は,pealtフレームワークとpacフレームワークの両方に適用し,より広範な利用を期待する新たな分析手法とアルゴリズム設計手法を生かした。

論文の概要: Tight Guarantees for Interactive Decision Making with the Decision-Estimation Coefficient

関連論文リスト