Adaptive Multi-Goal Exploration
- URL: http://arxiv.org/abs/2111.12045v1
- Date: Tue, 23 Nov 2021 17:59:50 GMT
- Title: Adaptive Multi-Goal Exploration
- Authors: Jean Tarbouriech, Omar Darwiche Domingues, Pierre M\'enard, Matteo
Pirotta, Michal Valko, Alessandro Lazaric
- Abstract summary: We show how AdaGoal can be used to tackle the objective of learning an $epsilon$-optimal goal-conditioned policy.
AdaGoal is anchored in the high-level algorithmic structure of existing methods for goal-conditioned deep reinforcement learning.
- Score: 118.40427257364729
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce a generic strategy for provably efficient multi-goal
exploration. It relies on AdaGoal, a novel goal selection scheme that is based
on a simple constrained optimization problem, which adaptively targets goal
states that are neither too difficult nor too easy to reach according to the
agent's current knowledge. We show how AdaGoal can be used to tackle the
objective of learning an $\epsilon$-optimal goal-conditioned policy for all the
goal states that are reachable within $L$ steps in expectation from a reference
state $s_0$ in a reward-free Markov decision process. In the tabular case with
$S$ states and $A$ actions, our algorithm requires $\tilde{O}(L^3 S A
\epsilon^{-2})$ exploration steps, which is nearly minimax optimal. We also
readily instantiate AdaGoal in linear mixture Markov decision processes, which
yields the first goal-oriented PAC guarantee with linear function
approximation. Beyond its strong theoretical guarantees, AdaGoal is anchored in
the high-level algorithmic structure of existing methods for goal-conditioned
deep reinforcement learning.
Related papers
- Traversing Pareto Optimal Policies: Provably Efficient Multi-Objective Reinforcement Learning [14.260168974085376]
This paper investigates multi-objective reinforcement learning (MORL)
It focuses on learning optimal policies in the presence of multiple reward functions.
Despite MORL's success, there is still a lack of satisfactory understanding of various MORL optimization targets and efficient learning algorithms.
arXiv Detail & Related papers (2024-07-24T17:58:49Z) - Scalable Online Exploration via Coverability [45.66375686120087]
Exploration is a major challenge in reinforcement learning, especially for high-dimensional domains that require function approximation.
We introduce a new objective, $L_Coverage, which generalizes previous exploration schemes and supports three fundamental desideratas.
$L_Coverage enables the first computationally efficient model-based and model-free algorithms for online (reward-free or reward-driven) reinforcement learning in MDPs with low coverability.
arXiv Detail & Related papers (2024-03-11T10:14:06Z) - A Nearly Optimal and Low-Switching Algorithm for Reinforcement Learning
with General Function Approximation [66.26739783789387]
We propose a new algorithm, Monotonic Q-Learning with Upper Confidence Bound (MQL-UCB) for reinforcement learning.
MQL-UCB achieves minimax optimal regret of $tildeO(dsqrtHK)$ when $K$ is sufficiently large and near-optimal policy switching cost.
Our work sheds light on designing provably sample-efficient and deployment-efficient Q-learning with nonlinear function approximation.
arXiv Detail & Related papers (2023-11-26T08:31:57Z) - Optimal Horizon-Free Reward-Free Exploration for Linear Mixture MDPs [60.40452803295326]
We propose a new reward-free algorithm for learning linear Markov decision processes (MDPs)
At the core of our algorithm is uncertainty-weighted value-targeted regression with exploration-driven pseudo-reward.
We show that our algorithm only needs to explore $tilde O( d2varepsilon-2)$ episodes to find an $varepsilon$-optimal policy.
arXiv Detail & Related papers (2023-03-17T17:53:28Z) - Near-Optimal Algorithms for Autonomous Exploration and Multi-Goal
Stochastic Shortest Path [26.27529098205787]
We revisit the incremental autonomous exploration problem proposed by Lim & Auer (2012).
In this setting, the agent aims to learn a set of near-optimal goal-conditioned policies to reach the $L$-controllable states.
We introduce a new algorithm with stronger sample bounds than existing ones.
arXiv Detail & Related papers (2022-05-22T03:54:15Z) - On Reward-Free RL with Kernel and Neural Function Approximations:
Single-Agent MDP and Markov Game [140.19656665344917]
We study the reward-free RL problem, where an agent aims to thoroughly explore the environment without any pre-specified reward function.
We tackle this problem under the context of function approximation, leveraging powerful function approximators.
We establish the first provably efficient reward-free RL algorithm with kernel and neural function approximators.
arXiv Detail & Related papers (2021-10-19T07:26:33Z) - Improved Sample Complexity for Incremental Autonomous Exploration in
MDPs [132.88757893161699]
We learn the set of $epsilon$-optimal goal-conditioned policies attaining all states that are incrementally reachable within $L$ steps.
DisCo is the first algorithm that can return an $epsilon/c_min$-optimal policy for any cost-sensitive shortest-path problem.
arXiv Detail & Related papers (2020-12-29T14:06:09Z) - Byzantine-Resilient Non-Convex Stochastic Gradient Descent [61.6382287971982]
adversary-resilient distributed optimization, in which.
machines can independently compute gradients, and cooperate.
Our algorithm is based on a new concentration technique, and its sample complexity.
It is very practical: it improves upon the performance of all prior methods when no.
setting machines are present.
arXiv Detail & Related papers (2020-12-28T17:19:32Z) - Refined approachability algorithms and application to regret
minimization with global costs [0.38073142980732994]
Blackwell's approachability is a framework where two players, the Decision Maker and the Environment, play a repeated game with vector-valued payoffs.
We construct and analyze a class of Follow the Regularized Leader algorithms (FTRL) for Blackwell's approachability.
This flexibility enables us to apply these algorithms to closely minimize the quantity of interest in various online learning problems.
arXiv Detail & Related papers (2020-09-08T15:54:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.