Layered State Discovery for Incremental Autonomous Exploration
- URL: http://arxiv.org/abs/2302.03789v1
- Date: Tue, 7 Feb 2023 22:58:12 GMT
- Title: Layered State Discovery for Incremental Autonomous Exploration
- Authors: Liyu Chen, Andrea Tirinzoni, Alessandro Lazaric, Matteo Pirotta
- Abstract summary: Layered Autonomous Exploration (LAE) is a novel algorithm for AX that attains a sample complexity of $tildemathcalO(LSrightarrow_LAln12(Srightarrow_LAln12(Srightarrow_LAln12(Srightarrow_LAln12(Srightar row_LAln12(Srightarrow_LAln12
- Score: 106.37656068276901
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study the autonomous exploration (AX) problem proposed by Lim & Auer
(2012). In this setting, the objective is to discover a set of
$\epsilon$-optimal policies reaching a set $\mathcal{S}_L^{\rightarrow}$ of
incrementally $L$-controllable states. We introduce a novel layered
decomposition of the set of incrementally $L$-controllable states that is based
on the iterative application of a state-expansion operator. We leverage these
results to design Layered Autonomous Exploration (LAE), a novel algorithm for
AX that attains a sample complexity of
$\tilde{\mathcal{O}}(LS^{\rightarrow}_{L(1+\epsilon)}\Gamma_{L(1+\epsilon)} A
\ln^{12}(S^{\rightarrow}_{L(1+\epsilon)})/\epsilon^2)$, where
$S^{\rightarrow}_{L(1+\epsilon)}$ is the number of states that are
incrementally $L(1+\epsilon)$-controllable, $A$ is the number of actions, and
$\Gamma_{L(1+\epsilon)}$ is the branching factor of the transitions over such
states. LAE improves over the algorithm of Tarbouriech et al. (2020a) by a
factor of $L^2$ and it is the first algorithm for AX that works in a
countably-infinite state space. Moreover, we show that, under a certain
identifiability assumption, LAE achieves minimax-optimal sample complexity of
$\tilde{\mathcal{O}}(LS^{\rightarrow}_{L}A\ln^{12}(S^{\rightarrow}_{L})/\epsilon^2)$,
outperforming existing algorithms and matching for the first time the lower
bound proved by Cai et al. (2022) up to logarithmic factors.
Related papers
- Gap-Dependent Unsupervised Exploration for Reinforcement Learning [40.990467706237396]
We present an efficient algorithm for task-agnostic reinforcement learning.
The algorithm takes only $widetildemathcalO (1/epsilon cdot (H3SA / rho + H4 S2 A) )$ episodes of exploration.
We show that, information-theoretically, this bound is nearly tight for $rho Theta (1/(HS))$ and $H>1$.
arXiv Detail & Related papers (2021-08-11T20:42:46Z) - Private Stochastic Convex Optimization: Optimal Rates in $\ell_1$
Geometry [69.24618367447101]
Up to logarithmic factors the optimal excess population loss of any $(varepsilon,delta)$-differently private is $sqrtlog(d)/n + sqrtd/varepsilon n.$
We show that when the loss functions satisfy additional smoothness assumptions, the excess loss is upper bounded (up to logarithmic factors) by $sqrtlog(d)/n + (log(d)/varepsilon n)2/3.
arXiv Detail & Related papers (2021-03-02T06:53:44Z) - Improved Sample Complexity for Incremental Autonomous Exploration in
MDPs [132.88757893161699]
We learn the set of $epsilon$-optimal goal-conditioned policies attaining all states that are incrementally reachable within $L$ steps.
DisCo is the first algorithm that can return an $epsilon/c_min$-optimal policy for any cost-sensitive shortest-path problem.
arXiv Detail & Related papers (2020-12-29T14:06:09Z) - Nearly Minimax Optimal Reinforcement Learning for Linear Mixture Markov
Decision Processes [91.38793800392108]
We study reinforcement learning with linear function approximation where the underlying transition probability kernel of the Markov decision process (MDP) is a linear mixture model.
We propose a new, computationally efficient algorithm with linear function approximation named $textUCRL-VTR+$ for the aforementioned linear mixture MDPs.
To the best of our knowledge, these are the first computationally efficient, nearly minimax optimal algorithms for RL with linear function approximation.
arXiv Detail & Related papers (2020-12-15T18:56:46Z) - $Q$-learning with Logarithmic Regret [60.24952657636464]
We prove that an optimistic $Q$-learning enjoys a $mathcalOleft(fracSAcdot mathrmpolyleft(Hright)Delta_minlogleft(SATright)right)$ cumulative regret bound, where $S$ is the number of states, $A$ is the number of actions, $H$ is the planning horizon, $T$ is the total number of steps, and $Delta_min$ is the minimum sub-optimality gap.
arXiv Detail & Related papers (2020-06-16T13:01:33Z) - Model-Free Reinforcement Learning: from Clipped Pseudo-Regret to Sample
Complexity [59.34067736545355]
Given an MDP with $S$ states, $A$ actions, the discount factor $gamma in (0,1)$, and an approximation threshold $epsilon > 0$, we provide a model-free algorithm to learn an $epsilon$-optimal policy.
For small enough $epsilon$, we show an improved algorithm with sample complexity.
arXiv Detail & Related papers (2020-06-06T13:34:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.