Related papers: Near-Optimal Algorithms for Autonomous Exploration and Multi-Goal Stochastic Shortest Path

Near-Optimal Algorithms for Autonomous Exploration and Multi-Goal Stochastic Shortest Path

URL: http://arxiv.org/abs/2205.10729v1
Date: Sun, 22 May 2022 03:54:15 GMT
Title: Near-Optimal Algorithms for Autonomous Exploration and Multi-Goal Stochastic Shortest Path
Authors: Haoyuan Cai, Tengyu Ma, Simon Du
Abstract summary: We revisit the incremental autonomous exploration problem proposed by Lim & Auer (2012). In this setting, the agent aims to learn a set of near-optimal goal-conditioned policies to reach the $L$-controllable states. We introduce a new algorithm with stronger sample bounds than existing ones.
Score: 26.27529098205787
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We revisit the incremental autonomous exploration problem proposed by Lim & Auer (2012). In this setting, the agent aims to learn a set of near-optimal goal-conditioned policies to reach the $L$-controllable states: states that are incrementally reachable from an initial state $s_0$ within $L$ steps in expectation. We introduce a new algorithm with stronger sample complexity bounds than existing ones. Furthermore, we also prove the first lower bound for the autonomous exploration problem. In particular, the lower bound implies that our proposed algorithm, Value-Aware Autonomous Exploration, is nearly minimax-optimal when the number of $L$-controllable states grows polynomially with respect to $L$. Key in our algorithm design is a connection between autonomous exploration and multi-goal stochastic shortest path, a new problem that naturally generalizes the classical stochastic shortest path problem. This new problem and its connection to autonomous exploration can be of independent interest.

Related papers

Span-Agnostic Optimal Sample Complexity and Oracle Inequalities for Average-Reward RL [6.996002801232415]
We study the sample complexity of finding an $varepsilon$-optimal policy in Markov Decision Processes (MDPs) with a generative model. We develop the first algorithms matching the optimal span-based complexity without $H$ knowledge.
arXiv Detail & Related papers (2025-02-16T19:10:55Z)
Nearly Minimax Optimal Regret for Learning Linear Mixture Stochastic Shortest Path [80.60592344361073]
We study the Shortest Path (SSP) problem with a linear mixture transition kernel. An agent repeatedly interacts with a environment and seeks to reach certain goal state while minimizing the cumulative cost. Existing works often assume a strictly positive lower bound of the iteration cost function or an upper bound of the expected length for the optimal policy.
arXiv Detail & Related papers (2024-02-14T07:52:00Z)
Optimal Horizon-Free Reward-Free Exploration for Linear Mixture MDPs [60.40452803295326]
We propose a new reward-free algorithm for learning linear Markov decision processes (MDPs) At the core of our algorithm is uncertainty-weighted value-targeted regression with exploration-driven pseudo-reward. We show that our algorithm only needs to explore $tilde O( d2varepsilon-2)$ episodes to find an $varepsilon$-optimal policy.
arXiv Detail & Related papers (2023-03-17T17:53:28Z)
Layered State Discovery for Incremental Autonomous Exploration [106.37656068276901]
Layered Autonomous Exploration (LAE) is a novel algorithm for AX that attains a sample complexity of $tildemathcalO(LSrightarrow_LAln12(Srightarrow_LAln12(Srightarrow_LAln12(Srightarrow_LAln12(Srightar row_LAln12(Srightarrow_LAln12
arXiv Detail & Related papers (2023-02-07T22:58:12Z)
Adaptive Multi-Goal Exploration [118.40427257364729]
We show how AdaGoal can be used to tackle the objective of learning an $epsilon$-optimal goal-conditioned policy. AdaGoal is anchored in the high-level algorithmic structure of existing methods for goal-conditioned deep reinforcement learning.
arXiv Detail & Related papers (2021-11-23T17:59:50Z)
Improved Sample Complexity for Incremental Autonomous Exploration in MDPs [132.88757893161699]
We learn the set of $epsilon$-optimal goal-conditioned policies attaining all states that are incrementally reachable within $L$ steps. DisCo is the first algorithm that can return an $epsilon/c_min$-optimal policy for any cost-sensitive shortest-path problem.
arXiv Detail & Related papers (2020-12-29T14:06:09Z)
Reward-Free Exploration for Reinforcement Learning [82.3300753751066]
We propose a new "reward-free RL" framework to isolate the challenges of exploration. We give an efficient algorithm that conducts $tildemathcalO(S2Amathrmpoly(H)/epsilon2)$ episodes of exploration. We also give a nearly-matching $Omega(S2AH2/epsilon2)$ lower bound, demonstrating the near-optimality of our algorithm in this setting.
arXiv Detail & Related papers (2020-02-07T14:03:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.