Model-Free Reinforcement Learning with the Decision-Estimation
Coefficient
- URL: http://arxiv.org/abs/2211.14250v2
- Date: Sat, 12 Aug 2023 20:43:21 GMT
- Title: Model-Free Reinforcement Learning with the Decision-Estimation
Coefficient
- Authors: Dylan J. Foster and Noah Golowich and Jian Qian and Alexander Rakhlin
and Ayush Sekhari
- Abstract summary: We consider the problem of interactive decision making, encompassing structured bandits and reinforcement learning with general function approximation.
We use this approach to derive regret bounds for model-free reinforcement learning with value function approximation, and give structural results showing when it can and cannot help more generally.
- Score: 79.30248422988409
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider the problem of interactive decision making, encompassing
structured bandits and reinforcement learning with general function
approximation. Recently, Foster et al. (2021) introduced the
Decision-Estimation Coefficient, a measure of statistical complexity that lower
bounds the optimal regret for interactive decision making, as well as a
meta-algorithm, Estimation-to-Decisions, which achieves upper bounds in terms
of the same quantity. Estimation-to-Decisions is a reduction, which lifts
algorithms for (supervised) online estimation into algorithms for decision
making. In this paper, we show that by combining Estimation-to-Decisions with a
specialized form of optimistic estimation introduced by Zhang (2022), it is
possible to obtain guarantees that improve upon those of Foster et al. (2021)
by accommodating more lenient notions of estimation error. We use this approach
to derive regret bounds for model-free reinforcement learning with value
function approximation, and give structural results showing when it can and
cannot help more generally.
Related papers
- Assouad, Fano, and Le Cam with Interaction: A Unifying Lower Bound Framework and Characterization for Bandit Learnability [71.82666334363174]
We develop a unified framework for lower bound methods in statistical estimation and interactive decision making.
We introduce a novel measure, decision dimension, which facilitates the complexity of new lower bounds for interactive decision making.
arXiv Detail & Related papers (2024-10-07T15:14:58Z) - Tight Guarantees for Interactive Decision Making with the
Decision-Estimation Coefficient [51.37720227675476]
We introduce a new variant of the Decision-Estimation Coefficient, and use it to derive new lower bounds that improve upon prior work on three fronts.
We provide upper bounds on regret that scale with the same quantity, thereby closing all but one of the gaps between upper and lower bounds in Foster et al.
Our results apply to both the regret framework and PAC framework, and make use of several new analysis and algorithm design techniques that we anticipate will find broader use.
arXiv Detail & Related papers (2023-01-19T18:24:08Z) - Regret Bounds and Experimental Design for Estimate-then-Optimize [9.340611077939828]
In practical applications, data is used to make decisions in two steps: estimation and optimization.
Errors in the estimation step can lead estimate-then-optimize to sub-optimal decisions.
We provide a novel bound on this regret for smooth and unconstrained optimization problems.
arXiv Detail & Related papers (2022-10-27T16:13:48Z) - On the Complexity of Adversarial Decision Making [101.14158787665252]
We show that the Decision-Estimation Coefficient is necessary and sufficient to obtain low regret for adversarial decision making.
We provide new structural results that connect the Decision-Estimation Coefficient to variants of other well-known complexity measures.
arXiv Detail & Related papers (2022-06-27T06:20:37Z) - The Statistical Complexity of Interactive Decision Making [126.04974881555094]
We provide a complexity measure, the Decision-Estimation Coefficient, that is proven to be both necessary and sufficient for sample-efficient interactive learning.
A unified algorithm design principle, Estimation-to-Decisions (E2D), transforms any algorithm for supervised estimation into an online algorithm for decision making.
arXiv Detail & Related papers (2021-12-27T02:53:44Z) - Efficient Ensemble Model Generation for Uncertainty Estimation with
Bayesian Approximation in Segmentation [74.06904875527556]
We propose a generic and efficient segmentation framework to construct ensemble segmentation models.
In the proposed method, ensemble models can be efficiently generated by using the layer selection method.
We also devise a new pixel-wise uncertainty loss, which improves the predictive performance.
arXiv Detail & Related papers (2020-05-21T16:08:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.