Related papers: Provably Efficient Exploration for Reinforcement Learning Using Unsupervised Learning

Provably Efficient Exploration for Reinforcement Learning Using Unsupervised Learning

URL: http://arxiv.org/abs/2003.06898v4
Date: Tue, 1 Dec 2020 01:04:54 GMT
Title: Provably Efficient Exploration for Reinforcement Learning Using Unsupervised Learning
Authors: Fei Feng, Ruosong Wang, Wotao Yin, Simon S. Du, Lin F. Yang
Abstract summary: Motivated by the prevailing paradigm of using unsupervised learning for efficient exploration in reinforcement learning (RL) problems, we investigate when this paradigm is provably efficient. We present a general algorithmic framework that is built upon two components: an unsupervised learning algorithm and a noregret tabular RL algorithm.
Score: 96.78504087416654
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Motivated by the prevailing paradigm of using unsupervised learning for efficient exploration in reinforcement learning (RL) problems [tang2017exploration,bellemare2016unifying], we investigate when this paradigm is provably efficient. We study episodic Markov decision processes with rich observations generated from a small number of latent states. We present a general algorithmic framework that is built upon two components: an unsupervised learning algorithm and a no-regret tabular RL algorithm. Theoretically, we prove that as long as the unsupervised learning algorithm enjoys a polynomial sample complexity guarantee, we can find a near-optimal policy with sample complexity polynomial in the number of latent states, which is significantly smaller than the number of observations. Empirically, we instantiate our framework on a class of hard exploration problems to demonstrate the practicality of our theory.

Related papers

Representation Learning with Multi-Step Inverse Kinematics: An Efficient and Optimal Approach to Rich-Observation RL [106.82295532402335]
Existing reinforcement learning algorithms suffer from computational intractability, strong statistical assumptions, and suboptimal sample complexity. We provide the first computationally efficient algorithm that attains rate-optimal sample complexity with respect to the desired accuracy level. Our algorithm, MusIK, combines systematic exploration with representation learning based on multi-step inverse kinematics.
arXiv Detail & Related papers (2023-04-12T14:51:47Z)
Instance-Dependent Near-Optimal Policy Identification in Linear MDPs via Online Experiment Design [12.056495277232118]
This work seeks to understand the "instance-dependent" complexity of learning near-optimal policies. We propose an algorithm, textscPedel, which achieves a fine-grained instance-dependent measure of complexity. We show that textscPedel yields provable gains over low-regret, minimax-optimal algorithms.
arXiv Detail & Related papers (2022-07-06T10:42:57Z)
Sample-Efficient Reinforcement Learning in the Presence of Exogenous Information [77.19830787312743]
In real-world reinforcement learning applications the learner's observation space is ubiquitously high-dimensional with both relevant and irrelevant information about the task at hand. We introduce a new problem setting for reinforcement learning, the Exogenous Decision Process (ExoMDP), in which the state space admits an (unknown) factorization into a small controllable component and a large irrelevant component. We provide a new algorithm, ExoRL, which learns a near-optimal policy with sample complexity in the size of the endogenous component.
arXiv Detail & Related papers (2022-06-09T05:19:32Z)
Efficient Reinforcement Learning in Block MDPs: A Model-free Representation Learning Approach [73.62265030773652]
We present BRIEE, an algorithm for efficient reinforcement learning in Markov Decision Processes with block-structured dynamics. BRIEE interleaves latent states discovery, exploration, and exploitation together, and can provably learn a near-optimal policy. We show that BRIEE is more sample efficient than the state-of-art Block MDP algorithm HOMER RL and other empirical baselines on challenging rich-observation combination lock problems.
arXiv Detail & Related papers (2022-01-31T19:47:55Z)
Efficient Performance Bounds for Primal-Dual Reinforcement Learning from Demonstrations [1.0609815608017066]
We consider large-scale Markov decision processes with an unknown cost function and address the problem of learning a policy from a finite set of expert demonstrations. Existing inverse reinforcement learning methods come with strong theoretical guarantees, but are computationally expensive. We introduce a novel bilinear saddle-point framework using Lagrangian duality to bridge the gap between theory and practice.
arXiv Detail & Related papers (2021-12-28T05:47:24Z)
Adaptive Discretization in Online Reinforcement Learning [9.560980936110234]
Two major questions in designing discretization-based algorithms are how to create the discretization and when to refine it. We provide a unified theoretical analysis of tree-based hierarchical partitioning methods for online reinforcement learning. Our algorithms are easily adapted to operating constraints, and our theory provides explicit bounds across each of the three facets.
arXiv Detail & Related papers (2021-10-29T15:06:15Z)
Instance-Dependent Complexity of Contextual Bandits and Reinforcement Learning: A Disagreement-Based Perspective [104.67295710363679]
In the classical multi-armed bandit problem, instance-dependent algorithms attain improved performance on "easy" problems with a gap between the best and second-best arm. We introduce a family of complexity measures that are both sufficient and necessary to obtain instance-dependent regret bounds. We then introduce new oracle-efficient algorithms which adapt to the gap whenever possible, while also attaining the minimax rate in the worst case.
arXiv Detail & Related papers (2020-10-07T01:33:06Z)
Sample-Efficient Reinforcement Learning of Undercomplete POMDPs [91.40308354344505]
This work shows that these hardness barriers do not preclude efficient reinforcement learning for rich and interesting subclasses of Partially Observable Decision Processes (POMDPs) We present a sample-efficient algorithm, OOM-UCB, for episodic finite undercomplete POMDPs, where the number of observations is larger than the number of latent states and where exploration is essential for learning, thus distinguishing our results from prior works.
arXiv Detail & Related papers (2020-06-22T17:58:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.