Related papers: Adaptive Reward-Free Exploration

Adaptive Reward-Free Exploration

URL: http://arxiv.org/abs/2006.06294v2
Date: Wed, 7 Oct 2020 16:23:09 GMT
Title: Adaptive Reward-Free Exploration
Authors: Emilie Kaufmann, Pierre M\'enard, Omar Darwiche Domingues, Anders Jonsson, Edouard Leurent, Michal Valko
Abstract summary: We show that our reward-free UCRL algorithm can be seen as a variant of an algorithm of Fiechter from 1994. We further investigate the relative complexities of reward-free exploration and best-policy identification.
Score: 48.98199700043158
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Reward-free exploration is a reinforcement learning setting studied by Jin et al. (2020), who address it by running several algorithms with regret guarantees in parallel. In our work, we instead give a more natural adaptive approach for reward-free exploration which directly reduces upper bounds on the maximum MDP estimation error. We show that, interestingly, our reward-free UCRL algorithm can be seen as a variant of an algorithm of Fiechter from 1994, originally proposed for a different objective that we call best-policy identification. We prove that RF-UCRL needs of order $({SAH^4}/{\varepsilon^2})(\log(1/\delta) + S)$ episodes to output, with probability $1-\delta$, an $\varepsilon$-approximation of the optimal policy for any reward function. This bound improves over existing sample-complexity bounds in both the small $\varepsilon$ and the small $\delta$ regimes. We further investigate the relative complexities of reward-free exploration and best-policy identification.

Related papers

Uncertainty-Aware Reward-Free Exploration with General Function Approximation [69.27868448449755]
In this paper, we propose a reward-free reinforcement learning algorithm called alg. The key idea behind our algorithm is an uncertainty-aware intrinsic reward for exploring the environment. Experiment results show that GFA-RFE outperforms or is comparable to the performance of state-of-the-art unsupervised RL algorithms.
arXiv Detail & Related papers (2024-06-24T01:37:18Z)
Convergence of a model-free entropy-regularized inverse reinforcement learning algorithm [6.481009996429766]
inverse reinforcement learning (IRL) aims to recover a reward for which the expert is optimal. This work proposes a model-free algorithm to solve the entropy-regularized IRL problem.
arXiv Detail & Related papers (2024-03-25T14:54:42Z)
Minimax-Optimal Reward-Agnostic Exploration in Reinforcement Learning [17.239062061431646]
This paper studies reward-agnostic exploration in reinforcement learning (RL) Consider a finite-horizon inhomogeneous decision process with $S$ states, $A$ actions, and a horizon length $H$. Our algorithm is able to yield $varepsilon$ accuracy for arbitrarily many reward functions.
arXiv Detail & Related papers (2023-04-14T17:46:49Z)
Improved Sample Complexity for Reward-free Reinforcement Learning under Low-rank MDPs [43.53286390357673]
This paper focuses on reward-free reinforcement learning under low-rank MDP models. We first provide the first known sample complexity lower bound for any algorithm under low-rank MDPs. We then propose a novel model-based algorithm, coined RAFFLE, and show it can both find an $epsilon$-optimal policy and achieve an $epsilon$-accurate system identification.
arXiv Detail & Related papers (2023-03-20T04:39:39Z)
Optimal Horizon-Free Reward-Free Exploration for Linear Mixture MDPs [60.40452803295326]
We propose a new reward-free algorithm for learning linear Markov decision processes (MDPs) At the core of our algorithm is uncertainty-weighted value-targeted regression with exploration-driven pseudo-reward. We show that our algorithm only needs to explore $tilde O( d2varepsilon-2)$ episodes to find an $varepsilon$-optimal policy.
arXiv Detail & Related papers (2023-03-17T17:53:28Z)
Reward-Free RL is No Harder Than Reward-Aware RL in Linear Markov Decision Processes [61.11090361892306]
Reward-free reinforcement learning (RL) considers the setting where the agent does not have access to a reward function during exploration. We show that this separation does not exist in the setting of linear MDPs. We develop a computationally efficient algorithm for reward-free RL in a $d$-dimensional linear MDP.
arXiv Detail & Related papers (2022-01-26T22:09:59Z)
On Reward-Free RL with Kernel and Neural Function Approximations: Single-Agent MDP and Markov Game [140.19656665344917]
We study the reward-free RL problem, where an agent aims to thoroughly explore the environment without any pre-specified reward function. We tackle this problem under the context of function approximation, leveraging powerful function approximators. We establish the first provably efficient reward-free RL algorithm with kernel and neural function approximators.
arXiv Detail & Related papers (2021-10-19T07:26:33Z)
Reward-Free Exploration for Reinforcement Learning [82.3300753751066]
We propose a new "reward-free RL" framework to isolate the challenges of exploration. We give an efficient algorithm that conducts $tildemathcalO(S2Amathrmpoly(H)/epsilon2)$ episodes of exploration. We also give a nearly-matching $Omega(S2AH2/epsilon2)$ lower bound, demonstrating the near-optimality of our algorithm in this setting.
arXiv Detail & Related papers (2020-02-07T14:03:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.