Related papers: Improved No-Regret Algorithms for Stochastic Shortest Path with Linear MDP

Improved No-Regret Algorithms for Stochastic Shortest Path with Linear MDP

URL: http://arxiv.org/abs/2112.09859v1
Date: Sat, 18 Dec 2021 06:47:31 GMT
Title: Improved No-Regret Algorithms for Stochastic Shortest Path with Linear MDP
Authors: Liyu Chen, Rahul Jain, Haipeng Luo
Abstract summary: We introduce two new no-regret algorithms for the shortest path (SSP) problem with a linear MDP. Our first algorithm is computationally efficient and achieves a regret bound $widetildeOleft(sqrtd3B_star2T_star Kright)$. Our second algorithm is computationally inefficient but achieves the first "horizon-free" regret bound $widetildeO(d3.5B_starsqrtK)$ with no dependency on $T_star
Score: 31.62899359543925
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We introduce two new no-regret algorithms for the stochastic shortest path (SSP) problem with a linear MDP that significantly improve over the only existing results of (Vial et al., 2021). Our first algorithm is computationally efficient and achieves a regret bound $\widetilde{O}\left(\sqrt{d^3B_{\star}^2T_{\star} K}\right)$, where $d$ is the dimension of the feature space, $B_{\star}$ and $T_{\star}$ are upper bounds of the expected costs and hitting time of the optimal policy respectively, and $K$ is the number of episodes. The same algorithm with a slight modification also achieves logarithmic regret of order $O\left(\frac{d^3B_{\star}^4}{c_{\min}^2\text{gap}_{\min}}\ln^5\frac{dB_{\star} K}{c_{\min}} \right)$, where $\text{gap}_{\min}$ is the minimum sub-optimality gap and $c_{\min}$ is the minimum cost over all state-action pairs. Our result is obtained by developing a simpler and improved analysis for the finite-horizon approximation of (Cohen et al., 2021) with a smaller approximation error, which might be of independent interest. On the other hand, using variance-aware confidence sets in a global optimization problem, our second algorithm is computationally inefficient but achieves the first "horizon-free" regret bound $\widetilde{O}(d^{3.5}B_{\star}\sqrt{K})$ with no polynomial dependency on $T_{\star}$ or $1/c_{\min}$, almost matching the $\Omega(dB_{\star}\sqrt{K})$ lower bound from (Min et al., 2021).

Related papers

Improved Robust Estimation for Erdős-Rényi Graphs: The Sparse Regime and Optimal Breakdown Point [3.793609515750114]
We study the problem of robustly estimating the edge density of ErdHos-R'enyi random graphs $G(n, dcirc/n)$. Our algorithm is based on the sum-of-squares hierarchy.
arXiv Detail & Related papers (2025-03-05T21:45:17Z)
Nearly Minimax Optimal Regret for Learning Linear Mixture Stochastic Shortest Path [80.60592344361073]
We study the Shortest Path (SSP) problem with a linear mixture transition kernel. An agent repeatedly interacts with a environment and seeks to reach certain goal state while minimizing the cumulative cost. Existing works often assume a strictly positive lower bound of the iteration cost function or an upper bound of the expected length for the optimal policy.
arXiv Detail & Related papers (2024-02-14T07:52:00Z)
Towards Optimal Regret in Adversarial Linear MDPs with Bandit Feedback [30.337826346496385]
We study online reinforcement learning in linear Markov decision processes with adversarial losses and bandit feedback. We introduce two algorithms that achieve improved regret performance compared to existing approaches.
arXiv Detail & Related papers (2023-10-17T19:43:37Z)
Low Rank Matrix Completion via Robust Alternating Minimization in Nearly Linear Time [8.808780937701522]
We take a major step towards a more efficient and error-robust alternating minimization framework. Our algorithm runs in time $widetilde O(|Omega| k)$, which is nearly linear in the time to verify the solution.
arXiv Detail & Related papers (2023-02-21T23:49:36Z)
Near-Optimal Regret Bounds for Multi-batch Reinforcement Learning [54.806166861456035]
We study the episodic reinforcement learning (RL) problem modeled by finite-horizon Markov Decision Processes (MDPs) with constraint on the number of batches. We design a computational efficient algorithm to achieve near-optimal regret of $tildeO(sqrtSAH3Kln (1/delta))$tildeO(cdot) hides logarithmic terms of $(S,A,H,K)$ in $K$ episodes. Our technical contribution are two-fold: 1) a near-optimal design scheme to explore
arXiv Detail & Related papers (2022-10-15T09:22:22Z)
Near-Optimal Goal-Oriented Reinforcement Learning in Non-Stationary Environments [40.027926921772355]
We study the study of dynamic regret for goal-oriented reinforcement learning. The different roles of $Delta_c$ and $Delta_P$ in this lower bound inspire us to design algorithms that estimate costs and transitions separately.
arXiv Detail & Related papers (2022-05-25T20:29:01Z)
The First Optimal Algorithm for Smooth and Strongly-Convex-Strongly-Concave Minimax Optimization [88.91190483500932]
In this paper, we revisit the smooth and strongly-strongly-concave minimax optimization problem. Existing state-of-the-art methods do not match lower bound $Omegaleft(sqrtkappa_xkappa_ylog3 (kappa_xkappa_y)logfrac1epsilonright)$. We fix this fundamental issue by providing the first algorithm with $mathcalOleft( sqrtkappa_xkappa_ylog
arXiv Detail & Related papers (2022-05-11T17:33:07Z)
Regret Bounds for Stochastic Shortest Path Problems with Linear Function Approximation [29.549633678730054]
We propose two algorithms for episodic shortest path problems with linear function approximation. The first is computationally expensive but provably obtains $tildeO (sqrtB_star3 d3 K/cmin )$ regret, where $B_star$ is a upper bound on the optimal cost-to-go function. The second is computationally efficient in practice, and we conjecture that it obtains the same regret bound.
arXiv Detail & Related papers (2021-05-04T16:05:08Z)
Private Stochastic Convex Optimization: Optimal Rates in $\ell_1$ Geometry [69.24618367447101]
Up to logarithmic factors the optimal excess population loss of any $(varepsilon,delta)$-differently private is $sqrtlog(d)/n + sqrtd/varepsilon n.$ We show that when the loss functions satisfy additional smoothness assumptions, the excess loss is upper bounded (up to logarithmic factors) by $sqrtlog(d)/n + (log(d)/varepsilon n)2/3.
arXiv Detail & Related papers (2021-03-02T06:53:44Z)
Nearly Minimax Optimal Reinforcement Learning for Linear Mixture Markov Decision Processes [91.38793800392108]
We study reinforcement learning with linear function approximation where the underlying transition probability kernel of the Markov decision process (MDP) is a linear mixture model. We propose a new, computationally efficient algorithm with linear function approximation named $textUCRL-VTR+$ for the aforementioned linear mixture MDPs. To the best of our knowledge, these are the first computationally efficient, nearly minimax optimal algorithms for RL with linear function approximation.
arXiv Detail & Related papers (2020-12-15T18:56:46Z)
Logarithmic Regret for Reinforcement Learning with Linear Function Approximation [99.59319332864129]
We show that logarithmic regret is attainable under two recently proposed linear MDP assumptions. To the best of our knowledge, these are the first logarithmic regret bounds for RL with linear function approximation.
arXiv Detail & Related papers (2020-11-23T17:25:00Z)
$Q$-learning with Logarithmic Regret [60.24952657636464]
We prove that an optimistic $Q$-learning enjoys a $mathcalOleft(fracSAcdot mathrmpolyleft(Hright)Delta_minlogleft(SATright)right)$ cumulative regret bound, where $S$ is the number of states, $A$ is the number of actions, $H$ is the planning horizon, $T$ is the total number of steps, and $Delta_min$ is the minimum sub-optimality gap.
arXiv Detail & Related papers (2020-06-16T13:01:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.