Adaptive Reward-Free Exploration
- URL: http://arxiv.org/abs/2006.06294v2
- Date: Wed, 7 Oct 2020 16:23:09 GMT
- Title: Adaptive Reward-Free Exploration
- Authors: Emilie Kaufmann, Pierre M\'enard, Omar Darwiche Domingues, Anders
Jonsson, Edouard Leurent, Michal Valko
- Abstract summary: We show that our reward-free UCRL algorithm can be seen as a variant of an algorithm of Fiechter from 1994.
We further investigate the relative complexities of reward-free exploration and best-policy identification.
- Score: 48.98199700043158
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Reward-free exploration is a reinforcement learning setting studied by Jin et
al. (2020), who address it by running several algorithms with regret guarantees
in parallel. In our work, we instead give a more natural adaptive approach for
reward-free exploration which directly reduces upper bounds on the maximum MDP
estimation error. We show that, interestingly, our reward-free UCRL algorithm
can be seen as a variant of an algorithm of Fiechter from 1994, originally
proposed for a different objective that we call best-policy identification. We
prove that RF-UCRL needs of order $({SAH^4}/{\varepsilon^2})(\log(1/\delta) +
S)$ episodes to output, with probability $1-\delta$, an
$\varepsilon$-approximation of the optimal policy for any reward function. This
bound improves over existing sample-complexity bounds in both the small
$\varepsilon$ and the small $\delta$ regimes. We further investigate the
relative complexities of reward-free exploration and best-policy
identification.
Related papers
- Uncertainty-Aware Reward-Free Exploration with General Function Approximation [69.27868448449755]
In this paper, we propose a reward-free reinforcement learning algorithm called alg.
The key idea behind our algorithm is an uncertainty-aware intrinsic reward for exploring the environment.
Experiment results show that GFA-RFE outperforms or is comparable to the performance of state-of-the-art unsupervised RL algorithms.
arXiv Detail & Related papers (2024-06-24T01:37:18Z) - Convergence of a model-free entropy-regularized inverse reinforcement learning algorithm [6.481009996429766]
inverse reinforcement learning (IRL) aims to recover a reward for which the expert is optimal.
This work proposes a model-free algorithm to solve the entropy-regularized IRL problem.
arXiv Detail & Related papers (2024-03-25T14:54:42Z) - Minimax-Optimal Reward-Agnostic Exploration in Reinforcement Learning [17.239062061431646]
This paper studies reward-agnostic exploration in reinforcement learning (RL)
Consider a finite-horizon inhomogeneous decision process with $S$ states, $A$ actions, and a horizon length $H$.
Our algorithm is able to yield $varepsilon$ accuracy for arbitrarily many reward functions.
arXiv Detail & Related papers (2023-04-14T17:46:49Z) - Improved Sample Complexity for Reward-free Reinforcement Learning under
Low-rank MDPs [43.53286390357673]
This paper focuses on reward-free reinforcement learning under low-rank MDP models.
We first provide the first known sample complexity lower bound for any algorithm under low-rank MDPs.
We then propose a novel model-based algorithm, coined RAFFLE, and show it can both find an $epsilon$-optimal policy and achieve an $epsilon$-accurate system identification.
arXiv Detail & Related papers (2023-03-20T04:39:39Z) - Optimal Horizon-Free Reward-Free Exploration for Linear Mixture MDPs [60.40452803295326]
We propose a new reward-free algorithm for learning linear Markov decision processes (MDPs)
At the core of our algorithm is uncertainty-weighted value-targeted regression with exploration-driven pseudo-reward.
We show that our algorithm only needs to explore $tilde O( d2varepsilon-2)$ episodes to find an $varepsilon$-optimal policy.
arXiv Detail & Related papers (2023-03-17T17:53:28Z) - Reward-Free RL is No Harder Than Reward-Aware RL in Linear Markov
Decision Processes [61.11090361892306]
Reward-free reinforcement learning (RL) considers the setting where the agent does not have access to a reward function during exploration.
We show that this separation does not exist in the setting of linear MDPs.
We develop a computationally efficient algorithm for reward-free RL in a $d$-dimensional linear MDP.
arXiv Detail & Related papers (2022-01-26T22:09:59Z) - On Reward-Free RL with Kernel and Neural Function Approximations:
Single-Agent MDP and Markov Game [140.19656665344917]
We study the reward-free RL problem, where an agent aims to thoroughly explore the environment without any pre-specified reward function.
We tackle this problem under the context of function approximation, leveraging powerful function approximators.
We establish the first provably efficient reward-free RL algorithm with kernel and neural function approximators.
arXiv Detail & Related papers (2021-10-19T07:26:33Z) - Reward-Free Exploration for Reinforcement Learning [82.3300753751066]
We propose a new "reward-free RL" framework to isolate the challenges of exploration.
We give an efficient algorithm that conducts $tildemathcalO(S2Amathrmpoly(H)/epsilon2)$ episodes of exploration.
We also give a nearly-matching $Omega(S2AH2/epsilon2)$ lower bound, demonstrating the near-optimality of our algorithm in this setting.
arXiv Detail & Related papers (2020-02-07T14:03:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.