Related papers: Tight Guarantees for Interactive Decision Making with the Decision-Estimation Coefficient

Tight Guarantees for Interactive Decision Making with the Decision-Estimation Coefficient

URL: http://arxiv.org/abs/2301.08215v1
Date: Thu, 19 Jan 2023 18:24:08 GMT
Title: Tight Guarantees for Interactive Decision Making with the Decision-Estimation Coefficient
Authors: Dylan J. Foster, Noah Golowich, Yanjun Han
Abstract summary: We introduce a new variant of the Decision-Estimation Coefficient, and use it to derive new lower bounds that improve upon prior work on three fronts. We provide upper bounds on regret that scale with the same quantity, thereby closing all but one of the gaps between upper and lower bounds in Foster et al. Our results apply to both the regret framework and PAC framework, and make use of several new analysis and algorithm design techniques that we anticipate will find broader use.
Score: 51.37720227675476
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: A foundational problem in reinforcement learning and interactive decision making is to understand what modeling assumptions lead to sample-efficient learning guarantees, and what algorithm design principles achieve optimal sample complexity. Recently, Foster et al. (2021) introduced the Decision-Estimation Coefficient (DEC), a measure of statistical complexity which leads to upper and lower bounds on the optimal sample complexity for a general class of problems encompassing bandits and reinforcement learning with function approximation. In this paper, we introduce a new variant of the DEC, the Constrained Decision-Estimation Coefficient, and use it to derive new lower bounds that improve upon prior work on three fronts: - They hold in expectation, with no restrictions on the class of algorithms under consideration. - They hold globally, and do not rely on the notion of localization used by Foster et al. (2021). - Most interestingly, they allow the reference model with respect to which the DEC is defined to be improper, establishing that improper reference models play a fundamental role. We provide upper bounds on regret that scale with the same quantity, thereby closing all but one of the gaps between upper and lower bounds in Foster et al. (2021). Our results apply to both the regret framework and PAC framework, and make use of several new analysis and algorithm design techniques that we anticipate will find broader use.

Related papers

Decision Making in Hybrid Environments: A Model Aggregation Approach [26.993355411130505]
Recent work by Foster, Xu, and Zeevi characterizes the complexity of general online decision making problems. We propose a general extension of DEC that characterizes the hybrid regime. Our framework leads to a flexible algorithm design where the learner learns over subsets of the hypothesis set.
arXiv Detail & Related papers (2025-02-09T17:59:42Z)
Assouad, Fano, and Le Cam with Interaction: A Unifying Lower Bound Framework and Characterization for Bandit Learnability [71.82666334363174]
We develop a unified framework for lower bound methods in statistical estimation and interactive decision making. We introduce a novel measure, decision dimension, which facilitates the complexity of new lower bounds for interactive decision making.
arXiv Detail & Related papers (2024-10-07T15:14:58Z)
Optimal Multi-Distribution Learning [88.3008613028333]
Multi-distribution learning seeks to learn a shared model that minimizes the worst-case risk across $k$ distinct data distributions. We propose a novel algorithm that yields an varepsilon-optimal randomized hypothesis with a sample complexity on the order of (d+k)/varepsilon2.
arXiv Detail & Related papers (2023-12-08T16:06:29Z)
Efficient Alternating Minimization Solvers for Wyner Multi-View Unsupervised Learning [0.0]
We propose two novel formulations that enable the development of computational efficient solvers based the alternating principle. The proposed solvers offer computational efficiency, theoretical convergence guarantees, local minima complexity with the number of views, and exceptional accuracy as compared with the state-of-the-art techniques.
arXiv Detail & Related papers (2023-03-28T10:17:51Z)
Model-Free Reinforcement Learning with the Decision-Estimation Coefficient [79.30248422988409]
We consider the problem of interactive decision making, encompassing structured bandits and reinforcement learning with general function approximation. We use this approach to derive regret bounds for model-free reinforcement learning with value function approximation, and give structural results showing when it can and cannot help more generally.
arXiv Detail & Related papers (2022-11-25T17:29:40Z)
On the Complexity of Adversarial Decision Making [101.14158787665252]
We show that the Decision-Estimation Coefficient is necessary and sufficient to obtain low regret for adversarial decision making. We provide new structural results that connect the Decision-Estimation Coefficient to variants of other well-known complexity measures.
arXiv Detail & Related papers (2022-06-27T06:20:37Z)
The Statistical Complexity of Interactive Decision Making [126.04974881555094]
We provide a complexity measure, the Decision-Estimation Coefficient, that is proven to be both necessary and sufficient for sample-efficient interactive learning. A unified algorithm design principle, Estimation-to-Decisions (E2D), transforms any algorithm for supervised estimation into an online algorithm for decision making.
arXiv Detail & Related papers (2021-12-27T02:53:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.