Adaptive Multi-Goal Exploration
        - URL: http://arxiv.org/abs/2111.12045v1
 - Date: Tue, 23 Nov 2021 17:59:50 GMT
 - Title: Adaptive Multi-Goal Exploration
 - Authors: Jean Tarbouriech, Omar Darwiche Domingues, Pierre M\'enard, Matteo
  Pirotta, Michal Valko, Alessandro Lazaric
 - Abstract summary: We show how AdaGoal can be used to tackle the objective of learning an $epsilon$-optimal goal-conditioned policy.
AdaGoal is anchored in the high-level algorithmic structure of existing methods for goal-conditioned deep reinforcement learning.
 - Score: 118.40427257364729
 - License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
 - Abstract:   We introduce a generic strategy for provably efficient multi-goal
exploration. It relies on AdaGoal, a novel goal selection scheme that is based
on a simple constrained optimization problem, which adaptively targets goal
states that are neither too difficult nor too easy to reach according to the
agent's current knowledge. We show how AdaGoal can be used to tackle the
objective of learning an $\epsilon$-optimal goal-conditioned policy for all the
goal states that are reachable within $L$ steps in expectation from a reference
state $s_0$ in a reward-free Markov decision process. In the tabular case with
$S$ states and $A$ actions, our algorithm requires $\tilde{O}(L^3 S A
\epsilon^{-2})$ exploration steps, which is nearly minimax optimal. We also
readily instantiate AdaGoal in linear mixture Markov decision processes, which
yields the first goal-oriented PAC guarantee with linear function
approximation. Beyond its strong theoretical guarantees, AdaGoal is anchored in
the high-level algorithmic structure of existing methods for goal-conditioned
deep reinforcement learning.
 
       
      
        Related papers
        - Span-Agnostic Optimal Sample Complexity and Oracle Inequalities for   Average-Reward RL [6.996002801232415]
We study the sample complexity of finding an $varepsilon$-optimal policy in Markov Decision Processes (MDPs) with a generative model.
We develop the first algorithms matching the optimal span-based complexity without $H$ knowledge.
arXiv  Detail & Related papers  (2025-02-16T19:10:55Z) - Traversing Pareto Optimal Policies: Provably Efficient Multi-Objective   Reinforcement Learning [14.260168974085376]
This paper investigates multi-objective reinforcement learning (MORL)
It focuses on learning optimal policies in the presence of multiple reward functions.
Despite MORL's success, there is still a lack of satisfactory understanding of various MORL optimization targets and efficient learning algorithms.
arXiv  Detail & Related papers  (2024-07-24T17:58:49Z) - Scalable Online Exploration via Coverability [45.66375686120087]
Exploration is a major challenge in reinforcement learning, especially for high-dimensional domains that require function approximation.
We introduce a new objective, $L_Coverage, which generalizes previous exploration schemes and supports three fundamental desideratas.
$L_Coverage enables the first computationally efficient model-based and model-free algorithms for online (reward-free or reward-driven) reinforcement learning in MDPs with low coverability.
arXiv  Detail & Related papers  (2024-03-11T10:14:06Z) - A Nearly Optimal and Low-Switching Algorithm for Reinforcement Learning
  with General Function Approximation [66.26739783789387]
We propose a new algorithm, Monotonic Q-Learning with Upper Confidence Bound (MQL-UCB) for reinforcement learning.
MQL-UCB achieves minimax optimal regret of $tildeO(dsqrtHK)$ when $K$ is sufficiently large and near-optimal policy switching cost.
Our work sheds light on designing provably sample-efficient and deployment-efficient Q-learning with nonlinear function approximation.
arXiv  Detail & Related papers  (2023-11-26T08:31:57Z) - Optimal Horizon-Free Reward-Free Exploration for Linear Mixture MDPs [60.40452803295326]
We propose a new reward-free algorithm for learning linear Markov decision processes (MDPs)
At the core of our algorithm is uncertainty-weighted value-targeted regression with exploration-driven pseudo-reward.
We show that our algorithm only needs to explore $tilde O( d2varepsilon-2)$ episodes to find an $varepsilon$-optimal policy.
arXiv  Detail & Related papers  (2023-03-17T17:53:28Z) - Near-Optimal Algorithms for Autonomous Exploration and Multi-Goal
  Stochastic Shortest Path [26.27529098205787]
We revisit the incremental autonomous exploration problem proposed by Lim & Auer (2012).
In this setting, the agent aims to learn a set of near-optimal goal-conditioned policies to reach the $L$-controllable states.
We introduce a new algorithm with stronger sample bounds than existing ones.
arXiv  Detail & Related papers  (2022-05-22T03:54:15Z) - On Reward-Free RL with Kernel and Neural Function Approximations:
  Single-Agent MDP and Markov Game [140.19656665344917]
We study the reward-free RL problem, where an agent aims to thoroughly explore the environment without any pre-specified reward function.
We tackle this problem under the context of function approximation, leveraging powerful function approximators.
We establish the first provably efficient reward-free RL algorithm with kernel and neural function approximators.
arXiv  Detail & Related papers  (2021-10-19T07:26:33Z) - Improved Sample Complexity for Incremental Autonomous Exploration in
  MDPs [132.88757893161699]
We learn the set of $epsilon$-optimal goal-conditioned policies attaining all states that are incrementally reachable within $L$ steps.
DisCo is the first algorithm that can return an $epsilon/c_min$-optimal policy for any cost-sensitive shortest-path problem.
arXiv  Detail & Related papers  (2020-12-29T14:06:09Z) - Byzantine-Resilient Non-Convex Stochastic Gradient Descent [61.6382287971982]
adversary-resilient distributed optimization, in which.
machines can independently compute gradients, and cooperate.
Our algorithm is based on a new concentration technique, and its sample complexity.
It is very practical: it improves upon the performance of all prior methods when no.
setting machines are present.
arXiv  Detail & Related papers  (2020-12-28T17:19:32Z) - Refined approachability algorithms and application to regret
  minimization with global costs [0.38073142980732994]
Blackwell's approachability is a framework where two players, the Decision Maker and the Environment, play a repeated game with vector-valued payoffs.
We construct and analyze a class of Follow the Regularized Leader algorithms (FTRL) for Blackwell's approachability.
This flexibility enables us to apply these algorithms to closely minimize the quantity of interest in various online learning problems.
arXiv  Detail & Related papers  (2020-09-08T15:54:08Z) 
        This list is automatically generated from the titles and abstracts of the papers in this site.
       
     
           This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.