On the Complexity of Adversarial Decision Making
- URL: http://arxiv.org/abs/2206.13063v1
- Date: Mon, 27 Jun 2022 06:20:37 GMT
- Title: On the Complexity of Adversarial Decision Making
- Authors: Dylan J. Foster and Alexander Rakhlin and Ayush Sekhari and Karthik
Sridharan
- Abstract summary: We show that the Decision-Estimation Coefficient is necessary and sufficient to obtain low regret for adversarial decision making.
We provide new structural results that connect the Decision-Estimation Coefficient to variants of other well-known complexity measures.
- Score: 101.14158787665252
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A central problem in online learning and decision making -- from bandits to
reinforcement learning -- is to understand what modeling assumptions lead to
sample-efficient learning guarantees. We consider a general adversarial
decision making framework that encompasses (structured) bandit problems with
adversarial rewards and reinforcement learning problems with adversarial
dynamics. Our main result is to show -- via new upper and lower bounds -- that
the Decision-Estimation Coefficient, a complexity measure introduced by Foster
et al. in the stochastic counterpart to our setting, is necessary and
sufficient to obtain low regret for adversarial decision making. However,
compared to the stochastic setting, one must apply the Decision-Estimation
Coefficient to the convex hull of the class of models (or, hypotheses) under
consideration. This establishes that the price of accommodating adversarial
rewards or dynamics is governed by the behavior of the model class under
convexification, and recovers a number of existing results -- both positive and
negative. En route to obtaining these guarantees, we provide new structural
results that connect the Decision-Estimation Coefficient to variants of other
well-known complexity measures, including the Information Ratio of Russo and
Van Roy and the Exploration-by-Optimization objective of Lattimore and
Gy\"{o}rgy.
Related papers
- Decision Making in Changing Environments: Robustness, Query-Based Learning, and Differential Privacy [59.64384863882473]
We study the problem of interactive decision making in which the underlying environment changes over time subject to given constraints.
We propose a framework, which provides an complexity between the complexity and adversarial settings of decision making.
arXiv Detail & Related papers (2025-01-24T21:31:50Z) - Assouad, Fano, and Le Cam with Interaction: A Unifying Lower Bound Framework and Characterization for Bandit Learnability [71.82666334363174]
We develop a unifying framework for information-theoretic lower bound in statistical estimation and interactive decision making.
We propose a new lower bound approach called emphinteractive Fano methodinteractive.
arXiv Detail & Related papers (2024-10-07T15:14:58Z) - Tight Guarantees for Interactive Decision Making with the
Decision-Estimation Coefficient [51.37720227675476]
We introduce a new variant of the Decision-Estimation Coefficient, and use it to derive new lower bounds that improve upon prior work on three fronts.
We provide upper bounds on regret that scale with the same quantity, thereby closing all but one of the gaps between upper and lower bounds in Foster et al.
Our results apply to both the regret framework and PAC framework, and make use of several new analysis and algorithm design techniques that we anticipate will find broader use.
arXiv Detail & Related papers (2023-01-19T18:24:08Z) - Model-Free Reinforcement Learning with the Decision-Estimation
Coefficient [79.30248422988409]
We consider the problem of interactive decision making, encompassing structured bandits and reinforcement learning with general function approximation.
We use this approach to derive regret bounds for model-free reinforcement learning with value function approximation, and give structural results showing when it can and cannot help more generally.
arXiv Detail & Related papers (2022-11-25T17:29:40Z) - The Statistical Complexity of Interactive Decision Making [126.04974881555094]
We provide a complexity measure, the Decision-Estimation Coefficient, that is proven to be both necessary and sufficient for sample-efficient interactive learning.
A unified algorithm design principle, Estimation-to-Decisions (E2D), transforms any algorithm for supervised estimation into an online algorithm for decision making.
arXiv Detail & Related papers (2021-12-27T02:53:44Z) - Risk-Sensitive Reinforcement Learning: a Martingale Approach to Reward
Uncertainty [15.572157454411533]
We introduce a novel framework to account for sensitivity to rewards uncertainty in sequential decision-making problems.
We present a new decomposition of the randomness contained in the cumulative reward based on the Doob decomposition of a process.
We innovate on the reinforcement learning side by incorporating this new risk-sensitive approach into model-free algorithms, both policy and value gradient function based, and illustrate its relevance on grid world and portfolio optimization problems.
arXiv Detail & Related papers (2020-06-23T01:20:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.