Reward-Free Exploration for Reinforcement Learning
- URL: http://arxiv.org/abs/2002.02794v1
- Date: Fri, 7 Feb 2020 14:03:38 GMT
- Title: Reward-Free Exploration for Reinforcement Learning
- Authors: Chi Jin, Akshay Krishnamurthy, Max Simchowitz, Tiancheng Yu
- Abstract summary: We propose a new "reward-free RL" framework to isolate the challenges of exploration.
We give an efficient algorithm that conducts $tildemathcalO(S2Amathrmpoly(H)/epsilon2)$ episodes of exploration.
We also give a nearly-matching $Omega(S2AH2/epsilon2)$ lower bound, demonstrating the near-optimality of our algorithm in this setting.
- Score: 82.3300753751066
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Exploration is widely regarded as one of the most challenging aspects of
reinforcement learning (RL), with many naive approaches succumbing to
exponential sample complexity. To isolate the challenges of exploration, we
propose a new "reward-free RL" framework. In the exploration phase, the agent
first collects trajectories from an MDP $\mathcal{M}$ without a pre-specified
reward function. After exploration, it is tasked with computing near-optimal
policies under for $\mathcal{M}$ for a collection of given reward functions.
This framework is particularly suitable when there are many reward functions of
interest, or when the reward function is shaped by an external agent to elicit
desired behavior.
We give an efficient algorithm that conducts
$\tilde{\mathcal{O}}(S^2A\mathrm{poly}(H)/\epsilon^2)$ episodes of exploration
and returns $\epsilon$-suboptimal policies for an arbitrary number of reward
functions. We achieve this by finding exploratory policies that visit each
"significant" state with probability proportional to its maximum visitation
probability under any possible policy. Moreover, our planning procedure can be
instantiated by any black-box approximate planner, such as value iteration or
natural policy gradient. We also give a nearly-matching
$\Omega(S^2AH^2/\epsilon^2)$ lower bound, demonstrating the near-optimality of
our algorithm in this setting.
Related papers
- Minimax-Optimal Reward-Agnostic Exploration in Reinforcement Learning [17.239062061431646]
This paper studies reward-agnostic exploration in reinforcement learning (RL)
Consider a finite-horizon inhomogeneous decision process with $S$ states, $A$ actions, and a horizon length $H$.
Our algorithm is able to yield $varepsilon$ accuracy for arbitrarily many reward functions.
arXiv Detail & Related papers (2023-04-14T17:46:49Z) - Optimal Horizon-Free Reward-Free Exploration for Linear Mixture MDPs [60.40452803295326]
We propose a new reward-free algorithm for learning linear Markov decision processes (MDPs)
At the core of our algorithm is uncertainty-weighted value-targeted regression with exploration-driven pseudo-reward.
We show that our algorithm only needs to explore $tilde O( d2varepsilon-2)$ episodes to find an $varepsilon$-optimal policy.
arXiv Detail & Related papers (2023-03-17T17:53:28Z) - On Reward-Free RL with Kernel and Neural Function Approximations:
Single-Agent MDP and Markov Game [140.19656665344917]
We study the reward-free RL problem, where an agent aims to thoroughly explore the environment without any pre-specified reward function.
We tackle this problem under the context of function approximation, leveraging powerful function approximators.
We establish the first provably efficient reward-free RL algorithm with kernel and neural function approximators.
arXiv Detail & Related papers (2021-10-19T07:26:33Z) - Randomized Exploration for Reinforcement Learning with General Value
Function Approximation [122.70803181751135]
We propose a model-free reinforcement learning algorithm inspired by the popular randomized least squares value iteration (RLSVI) algorithm.
Our algorithm drives exploration by simply perturbing the training data with judiciously chosen i.i.d. scalar noises.
We complement the theory with an empirical evaluation across known difficult exploration tasks.
arXiv Detail & Related papers (2021-06-15T02:23:07Z) - On Reward-Free Reinforcement Learning with Linear Function Approximation [144.4210285338698]
Reward-free reinforcement learning (RL) is a framework which is suitable for both the batch RL setting and the setting where there are many reward functions of interest.
In this work, we give both positive and negative results for reward-free RL with linear function approximation.
arXiv Detail & Related papers (2020-06-19T17:59:36Z) - Task-agnostic Exploration in Reinforcement Learning [35.403304641170386]
We present an efficient task-agnostic reinforcement learning algorithm, textscUCBZero.
It finds $epsilon$-optimal policies for $N$ arbitrary tasks after at most $tilde O(log(N)H5SA/epsilon2)$ exploration episodes.
We also provide an $Omega(log (N)H2SA/epsilon2)$ lower bound, showing that the $log$ dependency on $N$ is unavoidable.
arXiv Detail & Related papers (2020-06-16T20:23:41Z) - Adaptive Reward-Free Exploration [48.98199700043158]
We show that our reward-free UCRL algorithm can be seen as a variant of an algorithm of Fiechter from 1994.
We further investigate the relative complexities of reward-free exploration and best-policy identification.
arXiv Detail & Related papers (2020-06-11T09:58:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.