Regret Bounds for Stochastic Shortest Path Problems with Linear Function
Approximation
- URL: http://arxiv.org/abs/2105.01593v1
- Date: Tue, 4 May 2021 16:05:08 GMT
- Title: Regret Bounds for Stochastic Shortest Path Problems with Linear Function
Approximation
- Authors: Daniel Vial, Advait Parulekar, Sanjay Shakkottai, R. Srikant
- Abstract summary: We propose two algorithms for episodic shortest path problems with linear function approximation.
The first is computationally expensive but provably obtains $tildeO (sqrtB_star3 d3 K/cmin )$ regret, where $B_star$ is a upper bound on the optimal cost-to-go function.
The second is computationally efficient in practice, and we conjecture that it obtains the same regret bound.
- Score: 29.549633678730054
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose two algorithms for episodic stochastic shortest path problems with
linear function approximation. The first is computationally expensive but
provably obtains $\tilde{O} (\sqrt{B_\star^3 d^3 K/c_{min}} )$ regret, where
$B_\star$ is a (known) upper bound on the optimal cost-to-go function, $d$ is
the feature dimension, $K$ is the number of episodes, and $c_{min}$ is the
minimal cost of non-goal state-action pairs (assumed to be positive). The
second is computationally efficient in practice, and we conjecture that it
obtains the same regret bound. Both algorithms are based on an optimistic
least-squares version of value iteration analogous to the finite-horizon
backward induction approach from Jin et al. 2020. To the best of our knowledge,
these are the first regret bounds for stochastic shortest path that are
independent of the size of the state and action spaces.
Related papers
- Achieving Tractable Minimax Optimal Regret in Average Reward MDPs [19.663336027878408]
We present the first tractable algorithm with minimax optimal regret of $widetildemathrmO(sqrtmathrmsp(h*) S A T)$.
Remarkably, our algorithm does not require prior information on $mathrmsp(h*)$.
arXiv Detail & Related papers (2024-06-03T11:53:44Z) - Provably Efficient Reinforcement Learning with Multinomial Logit Function Approximation [67.8414514524356]
We study a new class of MDPs that employs multinomial logit (MNL) function approximation to ensure valid probability distributions over the state space.
introducing the non-linear function raises significant challenges in both computational and statistical efficiency.
We propose an algorithm that achieves the same regret with only $mathcalO(1)$ cost.
arXiv Detail & Related papers (2024-05-27T11:31:54Z) - Nearly Minimax Optimal Regret for Learning Linear Mixture Stochastic
Shortest Path [80.60592344361073]
We study the Shortest Path (SSP) problem with a linear mixture transition kernel.
An agent repeatedly interacts with a environment and seeks to reach certain goal state while minimizing the cumulative cost.
Existing works often assume a strictly positive lower bound of the iteration cost function or an upper bound of the expected length for the optimal policy.
arXiv Detail & Related papers (2024-02-14T07:52:00Z) - Refined Regret for Adversarial MDPs with Linear Function Approximation [50.00022394876222]
We consider learning in an adversarial Decision Process (MDP) where the loss functions can change arbitrarily over $K$ episodes.
This paper provides two algorithms that improve the regret to $tildemathcal O(K2/3)$ in the same setting.
arXiv Detail & Related papers (2023-01-30T14:37:21Z) - Improved No-Regret Algorithms for Stochastic Shortest Path with Linear
MDP [31.62899359543925]
We introduce two new no-regret algorithms for the shortest path (SSP) problem with a linear MDP.
Our first algorithm is computationally efficient and achieves a regret bound $widetildeOleft(sqrtd3B_star2T_star Kright)$.
Our second algorithm is computationally inefficient but achieves the first "horizon-free" regret bound $widetildeO(d3.5B_starsqrtK)$ with no dependency on $T_star
arXiv Detail & Related papers (2021-12-18T06:47:31Z) - Learning Stochastic Shortest Path with Linear Function Approximation [74.08819218747341]
We study the shortest path (SSP) problem in reinforcement learning with linear function approximation, where the transition kernel is represented as a linear mixture of unknown models.
We propose a novel algorithm for learning the linear mixture SSP, which can attain a $tilde O(d B_star1.5sqrtK/c_min)$ regret.
arXiv Detail & Related papers (2021-10-25T08:34:00Z) - Minimax Regret for Stochastic Shortest Path [63.45407095296692]
We study the Shortest Path (SSP) problem in which an agent has to reach a goal state in minimum total expected cost.
We show that the minimax regret for this setting is $widetilde O(B_star sqrt|S| |A| K)$ where $B_star$ is a bound on the expected cost of the optimal policy from any state.
Our algorithm runs in-time per episode, and is based on a novel reduction to reinforcement learning in finite-horizon MDPs.
arXiv Detail & Related papers (2021-03-24T10:11:49Z) - Near-optimal Regret Bounds for Stochastic Shortest Path [63.029132134792555]
shortest path (SSP) is a well-known problem in planning and control, in which an agent has to reach a goal state in minimum total expected cost.
We show that any learning algorithm must have at least $Omega(B_star sqrt|S| |A| K)$ regret in the worst case.
arXiv Detail & Related papers (2020-02-23T09:10:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.