Near-optimal Policy Identification in Active Reinforcement Learning
- URL: http://arxiv.org/abs/2212.09510v1
- Date: Mon, 19 Dec 2022 14:46:57 GMT
- Title: Near-optimal Policy Identification in Active Reinforcement Learning
- Authors: Xiang Li, Viraj Mehta, Johannes Kirschner, Ian Char, Willie
Neiswanger, Jeff Schneider, Andreas Krause, Ilija Bogunovic
- Abstract summary: AE-LSVI is a novel variant of the kernelized least-squares value RL (LSVI) algorithm that combines optimism with pessimism for active exploration.
We show that AE-LSVI outperforms other algorithms in a variety of environments when robustness to the initial state is required.
- Score: 84.27592560211909
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Many real-world reinforcement learning tasks require control of complex
dynamical systems that involve both costly data acquisition processes and large
state spaces. In cases where the transition dynamics can be readily evaluated
at specified states (e.g., via a simulator), agents can operate in what is
often referred to as planning with a \emph{generative model}. We propose the
AE-LSVI algorithm for best-policy identification, a novel variant of the
kernelized least-squares value iteration (LSVI) algorithm that combines
optimism with pessimism for active exploration (AE). AE-LSVI provably
identifies a near-optimal policy \emph{uniformly} over an entire state space
and achieves polynomial sample complexity guarantees that are independent of
the number of states. When specialized to the recently introduced offline
contextual Bayesian optimization setting, our algorithm achieves improved
sample complexity bounds. Experimentally, we demonstrate that AE-LSVI
outperforms other RL algorithms in a variety of environments when robustness to
the initial state is required.
Related papers
- RL in Markov Games with Independent Function Approximation: Improved Sample Complexity Bound under the Local Access Model [15.596599935486534]
We introduce a new algorithm, Lin-Confident-FTRL, for learning coarse correlated equilibria (CCE) with local access to the simulator.
Up to a logarithmic dependence on the size of the state space, Lin-Confident-FTRL learns $epsilon$-CCE with a provable optimal accuracy bound $O(epsilon-2)$.
Our analysis generalizes the virtual policy bounds in the single-agent local planning literature.
arXiv Detail & Related papers (2024-03-18T07:54:11Z) - Faster Stochastic Variance Reduction Methods for Compositional MiniMax
Optimization [50.10952609321302]
compositional minimax optimization is a pivotal challenge across various machine learning domains.
Current methods of compositional minimax optimization are plagued by sub-optimal complexities or heavy reliance on sizable batch sizes.
This paper introduces a novel method, called Nested STOchastic Recursive Momentum (NSTORM), which can achieve the optimal sample complexity of $O(kappa3 /epsilon3 )$.
arXiv Detail & Related papers (2023-08-18T14:57:21Z) - Sample Efficient Deep Reinforcement Learning via Local Planning [21.420851589712626]
This work focuses on sample-efficient deep reinforcement learning (RL) with a simulator.
We propose an algorithmic framework, named uncertainty-first local planning (UFLP), that takes advantage of this property.
We demonstrate that this simple procedure can dramatically improve the sample cost of several baseline RL algorithms on difficult exploration tasks.
arXiv Detail & Related papers (2023-01-29T23:17:26Z) - Embed to Control Partially Observed Systems: Representation Learning with Provable Sample Efficiency [105.17746223041954]
Reinforcement learning in partially observed Markov decision processes (POMDPs) faces two challenges.
It often takes the full history to predict the future, which induces a sample complexity that scales exponentially with the horizon.
We propose a reinforcement learning algorithm named Embed to Control (ETC), which learns the representation at two levels while optimizing the policy.
arXiv Detail & Related papers (2022-05-26T16:34:46Z) - Reinforcement Learning for Adaptive Mesh Refinement [63.7867809197671]
We propose a novel formulation of AMR as a Markov decision process and apply deep reinforcement learning to train refinement policies directly from simulation.
The model sizes of these policy architectures are independent of the mesh size and hence scale to arbitrarily large and complex simulations.
arXiv Detail & Related papers (2021-03-01T22:55:48Z) - Adaptive Sampling for Best Policy Identification in Markov Decision
Processes [79.4957965474334]
We investigate the problem of best-policy identification in discounted Markov Decision (MDPs) when the learner has access to a generative model.
The advantages of state-of-the-art algorithms are discussed and illustrated.
arXiv Detail & Related papers (2020-09-28T15:22:24Z) - On the Sample Complexity of Reinforcement Learning with Policy Space
Generalization [21.879621917722613]
We study the optimal sample complexity in large-scale Reinforcement Learning (RL) problems with policy space generalization.
Existing results show that without a generalization model, the sample complexity of an RL algorithm will inevitably depend on the cardinalities of state space and action space.
This paper proposes a new notion of eluder dimension for the policy space, which characterizes the intrinsic complexity of policy learning.
arXiv Detail & Related papers (2020-08-17T14:26:18Z) - Deep Reinforcement Learning with Robust and Smooth Policy [90.78795857181727]
We propose to learn a smooth policy that behaves smoothly with respect to states.
We develop a new framework -- textbfSmooth textbfRegularized textbfReinforcement textbfLearning ($textbfSR2textbfL$), where the policy is trained with smoothness-inducing regularization.
Such regularization effectively constrains the search space, and enforces smoothness in the learned policy.
arXiv Detail & Related papers (2020-03-21T00:10:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.