Fugu-MT 論文翻訳(概要): Instance-Optimality in Interactive Decision Making: Toward a Non-Asymptotic Theory

論文の概要: Instance-Optimality in Interactive Decision Making: Toward a Non-Asymptotic Theory

arxiv url: http://arxiv.org/abs/2304.12466v1
Date: Mon, 24 Apr 2023 21:51:58 GMT
ステータス: 翻訳完了
システム内更新日: 2023-04-26 22:36:58.148070
Title: Instance-Optimality in Interactive Decision Making: Toward a Non-Asymptotic Theory
Title（参考訳）: 対話的意思決定におけるインスタンス最適性:非漸近理論に向けて
Authors: Andrew Wagenmaker, Dylan J. Foster
Abstract要約: 適応性の強い概念であるインスタンス最適化を目指しており、どの問題の場合であっても、検討中のアルゴリズムは全ての一貫したアルゴリズムより優れていると主張する。本稿では,一般関数近似を用いたインスタンス最適決定の非漸近的理論の開発に向けて第一歩を踏み出す。
参考スコア（独自算出の注目度）: 30.061707627742766
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We consider the development of adaptive, instance-dependent algorithms for interactive decision making (bandits, reinforcement learning, and beyond) that, rather than only performing well in the worst case, adapt to favorable properties of real-world instances for improved performance. We aim for instance-optimality, a strong notion of adaptivity which asserts that, on any particular problem instance, the algorithm under consideration outperforms all consistent algorithms. Instance-optimality enjoys a rich asymptotic theory originating from the work of \citet{lai1985asymptotically,graves1997asymptotically}, but non-asymptotic guarantees have remained elusive outside of certain special cases. Even for problems as simple as tabular reinforcement learning, existing algorithms do not attain instance-optimal performance until the number of rounds of interaction is doubly exponential in the number of states. In this paper, we take the first step toward developing a non-asymptotic theory of instance-optimal decision making with general function approximation. We introduce a new complexity measure, the Allocation-Estimation Coefficient (AEC), and provide a new algorithm, $\mathsf{AE}^2$, which attains non-asymptotic instance-optimal performance at a rate controlled by the AEC. Our results recover the best known guarantees for well-studied problems such as finite-armed and linear bandits and, when specialized to tabular reinforcement learning, attain the first instance-optimal regret bounds with polynomial dependence on all problem parameters, improving over prior work exponentially. We complement these results with lower bounds that show that i) existing notions of statistical complexity are insufficient to derive non-asymptotic guarantees, and ii) under certain technical conditions, boundedness of the AEC is necessary to learn an instance-optimal allocation of decisions in finite time.
Abstract（参考訳）: 我々は,対話型意思決定(帯域,強化学習など)のための適応型インスタンス依存アルゴリズムの開発を検討する。適応性の強い概念であるインスタンス最適化を目指しており、どの問題の場合であっても、検討中のアルゴリズムは全ての一貫したアルゴリズムより優れていると主張する。インスタンス最適性は \citet{lai 1985asymptotically,graves1997asymptotically} の業績に由来する豊富な漸近理論を享受するが、非漸近的保証は特定の特別な場合以外でも解明されていない。テーブル型強化学習のような単純な問題であっても、既存のアルゴリズムは、インタラクションのラウンド数が2倍に指数関数的になるまでインスタンス最適化性能を達成できない。本稿では,一般関数近似を用いてインスタンス最適決定の非漸近理論を開発するための第一歩を踏み出す。本稿では,新しい複雑性尺度である割り当て推定係数(aec)を導入し,aecが制御するレートで非漸近的なインスタンス最適性能を実現する新しいアルゴリズムである$\mathsf{ae}^2$を提案する。本結果は,有限武器や線形包帯などのよく研究されている問題に対する保証を回復し,表型強化学習に特化すれば,全ての問題パラメータに対する多項式依存による最初のインスタンス最適後悔境界を達成でき,先行作業よりも指数関数的に改善される。これらの結果を下限で補うことで一統計複雑性の既存の概念は、非漸近的保証を導出することができないこと、及び二特定の技術的条件の下では、AECの有界性は、有限時間以内に、決定のインスタンス最適配分を学習するために必要である。

論文の概要: Instance-Optimality in Interactive Decision Making: Toward a Non-Asymptotic Theory

関連論文リスト