Improved No-Regret Algorithms for Stochastic Shortest Path with Linear
MDP
- URL: http://arxiv.org/abs/2112.09859v1
- Date: Sat, 18 Dec 2021 06:47:31 GMT
- Title: Improved No-Regret Algorithms for Stochastic Shortest Path with Linear
MDP
- Authors: Liyu Chen, Rahul Jain, Haipeng Luo
- Abstract summary: We introduce two new no-regret algorithms for the shortest path (SSP) problem with a linear MDP.
Our first algorithm is computationally efficient and achieves a regret bound $widetildeOleft(sqrtd3B_star2T_star Kright)$.
Our second algorithm is computationally inefficient but achieves the first "horizon-free" regret bound $widetildeO(d3.5B_starsqrtK)$ with no dependency on $T_star
- Score: 31.62899359543925
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce two new no-regret algorithms for the stochastic shortest path
(SSP) problem with a linear MDP that significantly improve over the only
existing results of (Vial et al., 2021). Our first algorithm is computationally
efficient and achieves a regret bound
$\widetilde{O}\left(\sqrt{d^3B_{\star}^2T_{\star} K}\right)$, where $d$ is the
dimension of the feature space, $B_{\star}$ and $T_{\star}$ are upper bounds of
the expected costs and hitting time of the optimal policy respectively, and $K$
is the number of episodes. The same algorithm with a slight modification also
achieves logarithmic regret of order
$O\left(\frac{d^3B_{\star}^4}{c_{\min}^2\text{gap}_{\min}}\ln^5\frac{dB_{\star}
K}{c_{\min}} \right)$, where $\text{gap}_{\min}$ is the minimum sub-optimality
gap and $c_{\min}$ is the minimum cost over all state-action pairs. Our result
is obtained by developing a simpler and improved analysis for the
finite-horizon approximation of (Cohen et al., 2021) with a smaller
approximation error, which might be of independent interest. On the other hand,
using variance-aware confidence sets in a global optimization problem, our
second algorithm is computationally inefficient but achieves the first
"horizon-free" regret bound $\widetilde{O}(d^{3.5}B_{\star}\sqrt{K})$ with no
polynomial dependency on $T_{\star}$ or $1/c_{\min}$, almost matching the
$\Omega(dB_{\star}\sqrt{K})$ lower bound from (Min et al., 2021).
Related papers
- Nearly Minimax Optimal Regret for Learning Linear Mixture Stochastic
Shortest Path [80.60592344361073]
We study the Shortest Path (SSP) problem with a linear mixture transition kernel.
An agent repeatedly interacts with a environment and seeks to reach certain goal state while minimizing the cumulative cost.
Existing works often assume a strictly positive lower bound of the iteration cost function or an upper bound of the expected length for the optimal policy.
arXiv Detail & Related papers (2024-02-14T07:52:00Z) - Towards Optimal Regret in Adversarial Linear MDPs with Bandit Feedback [30.337826346496385]
We study online reinforcement learning in linear Markov decision processes with adversarial losses and bandit feedback.
We introduce two algorithms that achieve improved regret performance compared to existing approaches.
arXiv Detail & Related papers (2023-10-17T19:43:37Z) - Low Rank Matrix Completion via Robust Alternating Minimization in Nearly Linear Time [8.808780937701522]
We take a major step towards a more efficient and error-robust alternating minimization framework.
Our algorithm runs in time $widetilde O(|Omega| k)$, which is nearly linear in the time to verify the solution.
arXiv Detail & Related papers (2023-02-21T23:49:36Z) - Near-Optimal Goal-Oriented Reinforcement Learning in Non-Stationary
Environments [40.027926921772355]
We study the study of dynamic regret for goal-oriented reinforcement learning.
The different roles of $Delta_c$ and $Delta_P$ in this lower bound inspire us to design algorithms that estimate costs and transitions separately.
arXiv Detail & Related papers (2022-05-25T20:29:01Z) - The First Optimal Algorithm for Smooth and
Strongly-Convex-Strongly-Concave Minimax Optimization [88.91190483500932]
In this paper, we revisit the smooth and strongly-strongly-concave minimax optimization problem.
Existing state-of-the-art methods do not match lower bound $Omegaleft(sqrtkappa_xkappa_ylog3 (kappa_xkappa_y)logfrac1epsilonright)$.
We fix this fundamental issue by providing the first algorithm with $mathcalOleft( sqrtkappa_xkappa_ylog
arXiv Detail & Related papers (2022-05-11T17:33:07Z) - Regret Bounds for Stochastic Shortest Path Problems with Linear Function
Approximation [29.549633678730054]
We propose two algorithms for episodic shortest path problems with linear function approximation.
The first is computationally expensive but provably obtains $tildeO (sqrtB_star3 d3 K/cmin )$ regret, where $B_star$ is a upper bound on the optimal cost-to-go function.
The second is computationally efficient in practice, and we conjecture that it obtains the same regret bound.
arXiv Detail & Related papers (2021-05-04T16:05:08Z) - Private Stochastic Convex Optimization: Optimal Rates in $\ell_1$
Geometry [69.24618367447101]
Up to logarithmic factors the optimal excess population loss of any $(varepsilon,delta)$-differently private is $sqrtlog(d)/n + sqrtd/varepsilon n.$
We show that when the loss functions satisfy additional smoothness assumptions, the excess loss is upper bounded (up to logarithmic factors) by $sqrtlog(d)/n + (log(d)/varepsilon n)2/3.
arXiv Detail & Related papers (2021-03-02T06:53:44Z) - Nearly Minimax Optimal Reinforcement Learning for Linear Mixture Markov
Decision Processes [91.38793800392108]
We study reinforcement learning with linear function approximation where the underlying transition probability kernel of the Markov decision process (MDP) is a linear mixture model.
We propose a new, computationally efficient algorithm with linear function approximation named $textUCRL-VTR+$ for the aforementioned linear mixture MDPs.
To the best of our knowledge, these are the first computationally efficient, nearly minimax optimal algorithms for RL with linear function approximation.
arXiv Detail & Related papers (2020-12-15T18:56:46Z) - Logarithmic Regret for Reinforcement Learning with Linear Function
Approximation [99.59319332864129]
We show that logarithmic regret is attainable under two recently proposed linear MDP assumptions.
To the best of our knowledge, these are the first logarithmic regret bounds for RL with linear function approximation.
arXiv Detail & Related papers (2020-11-23T17:25:00Z) - $Q$-learning with Logarithmic Regret [60.24952657636464]
We prove that an optimistic $Q$-learning enjoys a $mathcalOleft(fracSAcdot mathrmpolyleft(Hright)Delta_minlogleft(SATright)right)$ cumulative regret bound, where $S$ is the number of states, $A$ is the number of actions, $H$ is the planning horizon, $T$ is the total number of steps, and $Delta_min$ is the minimum sub-optimality gap.
arXiv Detail & Related papers (2020-06-16T13:01:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.