Instance-Dependent Complexity of Contextual Bandits and Reinforcement
Learning: A Disagreement-Based Perspective
- URL: http://arxiv.org/abs/2010.03104v1
- Date: Wed, 7 Oct 2020 01:33:06 GMT
- Title: Instance-Dependent Complexity of Contextual Bandits and Reinforcement
Learning: A Disagreement-Based Perspective
- Authors: Dylan J. Foster and Alexander Rakhlin and David Simchi-Levi and
Yunzong Xu
- Abstract summary: In the classical multi-armed bandit problem, instance-dependent algorithms attain improved performance on "easy" problems with a gap between the best and second-best arm.
We introduce a family of complexity measures that are both sufficient and necessary to obtain instance-dependent regret bounds.
We then introduce new oracle-efficient algorithms which adapt to the gap whenever possible, while also attaining the minimax rate in the worst case.
- Score: 104.67295710363679
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In the classical multi-armed bandit problem, instance-dependent algorithms
attain improved performance on "easy" problems with a gap between the best and
second-best arm. Are similar guarantees possible for contextual bandits? While
positive results are known for certain special cases, there is no general
theory characterizing when and how instance-dependent regret bounds for
contextual bandits can be achieved for rich, general classes of policies. We
introduce a family of complexity measures that are both sufficient and
necessary to obtain instance-dependent regret bounds. We then introduce new
oracle-efficient algorithms which adapt to the gap whenever possible, while
also attaining the minimax rate in the worst case. Finally, we provide
structural results that tie together a number of complexity measures previously
proposed throughout contextual bandits, reinforcement learning, and active
learning and elucidate their role in determining the optimal instance-dependent
regret. In a large-scale empirical evaluation, we find that our approach often
gives superior results for challenging exploration problems.
Turning our focus to reinforcement learning with function approximation, we
develop new oracle-efficient algorithms for reinforcement learning with rich
observations that obtain optimal gap-dependent sample complexity.
Related papers
- Agnostic Multi-Robust Learning Using ERM [19.313739782029185]
A fundamental problem in robust learning is asymmetry: a learner needs to correctly classify every one of exponentially-many perturbations that an adversary might make to a test-time natural example.
In contrast, the attacker only needs to find one successful perturbation.
We introduce a novel multi-group setting and introduce a novel multi-robust learning problem.
arXiv Detail & Related papers (2023-03-15T21:30:14Z) - Instance-Dependent Near-Optimal Policy Identification in Linear MDPs via
Online Experiment Design [12.056495277232118]
This work seeks to understand the "instance-dependent" complexity of learning near-optimal policies.
We propose an algorithm, textscPedel, which achieves a fine-grained instance-dependent measure of complexity.
We show that textscPedel yields provable gains over low-regret, minimax-optimal algorithms.
arXiv Detail & Related papers (2022-07-06T10:42:57Z) - Provable Reinforcement Learning with a Short-Term Memory [68.00677878812908]
We study a new subclass of POMDPs, whose latent states can be decoded by the most recent history of a short length $m$.
In particular, in the rich-observation setting, we develop new algorithms using a novel "moment matching" approach with a sample complexity that scales exponentially.
Our results show that a short-term memory suffices for reinforcement learning in these environments.
arXiv Detail & Related papers (2022-02-08T16:39:57Z) - Structure Adaptive Algorithms for Stochastic Bandits [22.871155520200773]
We study reward maximisation in a class of structured multi-armed bandit problems.
The mean rewards of arms satisfy some given structural constraints.
We develop algorithms from instance-dependent lower-bounds using iterative saddle-point solvers.
arXiv Detail & Related papers (2020-07-02T08:59:54Z) - Beyond Worst-Case Analysis in Stochastic Approximation: Moment
Estimation Improves Instance Complexity [58.70807593332932]
We study oracle complexity of gradient based methods for approximation problems.
We focus on instance-dependent complexity instead of worst case complexity.
Our proposed algorithm and its analysis provide a theoretical justification for the success of moment estimation.
arXiv Detail & Related papers (2020-06-08T09:25:47Z) - Provably Efficient Exploration for Reinforcement Learning Using
Unsupervised Learning [96.78504087416654]
Motivated by the prevailing paradigm of using unsupervised learning for efficient exploration in reinforcement learning (RL) problems, we investigate when this paradigm is provably efficient.
We present a general algorithmic framework that is built upon two components: an unsupervised learning algorithm and a noregret tabular RL algorithm.
arXiv Detail & Related papers (2020-03-15T19:23:59Z) - Beyond UCB: Optimal and Efficient Contextual Bandits with Regression
Oracles [112.89548995091182]
We provide the first universal and optimal reduction from contextual bandits to online regression.
Our algorithm requires no distributional assumptions beyond realizability, and works even when contexts are chosen adversarially.
arXiv Detail & Related papers (2020-02-12T11:33:46Z) - The Simulator: Understanding Adaptive Sampling in the
Moderate-Confidence Regime [52.38455827779212]
We propose a novel technique for analyzing adaptive sampling called the em Simulator.
We prove the first instance-based lower bounds the top-k problem which incorporate the appropriate log-factors.
Our new analysis inspires a simple and near-optimal for the best-arm and top-k identification, the first em practical of its kind for the latter problem.
arXiv Detail & Related papers (2017-02-16T23:42:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.