Frequentist Regret Bounds for Randomized Least-Squares Value Iteration
- URL: http://arxiv.org/abs/1911.00567v7
- Date: Fri, 8 Sep 2023 16:34:55 GMT
- Title: Frequentist Regret Bounds for Randomized Least-Squares Value Iteration
- Authors: Andrea Zanette, David Brandfonbrener, Emma Brunskill, Matteo Pirotta,
Alessandro Lazaric
- Abstract summary: We consider the exploration-exploitation dilemma in finite-horizon reinforcement learning (RL)
In this paper, we introduce an optimistically-perturbed variant of the popular randomized least-squares value (RLSVI)
Under the assumption that the Markov decision process has low-rank transition dynamics, we prove that the frequentist regret of RLSVI is upper-bounded by $widetilde O(d2 H2 sqrtT)$ where $ d $ are the feature dimension, $ H $ is the horizon, and $ T $ is the total number of
- Score: 94.47472987987805
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider the exploration-exploitation dilemma in finite-horizon
reinforcement learning (RL). When the state space is large or continuous,
traditional tabular approaches are unfeasible and some form of function
approximation is mandatory. In this paper, we introduce an
optimistically-initialized variant of the popular randomized least-squares
value iteration (RLSVI), a model-free algorithm where exploration is induced by
perturbing the least-squares approximation of the action-value function. Under
the assumption that the Markov decision process has low-rank transition
dynamics, we prove that the frequentist regret of RLSVI is upper-bounded by
$\widetilde O(d^2 H^2 \sqrt{T})$ where $ d $ are the feature dimension, $ H $
is the horizon, and $ T $ is the total number of steps. To the best of our
knowledge, this is the first frequentist regret analysis for randomized
exploration with function approximation.
Related papers
- Randomized Exploration for Reinforcement Learning with Multinomial Logistic Function Approximation [8.274693573069442]
We study reinforcement learning with multinomial logistic (MNL) function approximation.
We propose provably efficient algorithms with randomized exploration having frequentist regret guarantees.
Numerical experiments demonstrate the superior performance of the proposed algorithms.
arXiv Detail & Related papers (2024-05-30T15:39:19Z) - Nearly Minimax Optimal Regret for Learning Linear Mixture Stochastic
Shortest Path [80.60592344361073]
We study the Shortest Path (SSP) problem with a linear mixture transition kernel.
An agent repeatedly interacts with a environment and seeks to reach certain goal state while minimizing the cumulative cost.
Existing works often assume a strictly positive lower bound of the iteration cost function or an upper bound of the expected length for the optimal policy.
arXiv Detail & Related papers (2024-02-14T07:52:00Z) - Optimal Horizon-Free Reward-Free Exploration for Linear Mixture MDPs [60.40452803295326]
We propose a new reward-free algorithm for learning linear Markov decision processes (MDPs)
At the core of our algorithm is uncertainty-weighted value-targeted regression with exploration-driven pseudo-reward.
We show that our algorithm only needs to explore $tilde O( d2varepsilon-2)$ episodes to find an $varepsilon$-optimal policy.
arXiv Detail & Related papers (2023-03-17T17:53:28Z) - Bilinear Exponential Family of MDPs: Frequentist Regret Bound with
Tractable Exploration and Planning [0.0]
We study the problem of episodic reinforcement learning in continuous state-action spaces with unknown rewards and transitions.
We propose an algorithm, BEF-RLSVI, that uses penalized maximum likelihood estimators to learn the unknown parameters.
arXiv Detail & Related papers (2022-10-05T08:26:49Z) - Provably Efficient Model-Free Constrained RL with Linear Function
Approximation [4.060731229044571]
We develop the first model-free, simulator-free algorithm that achieves a sublinear regret and a sublinear constraint violation even in large-scale systems.
Our results are achieved via novel adaptations of the standard LSVI-UCB algorithms.
arXiv Detail & Related papers (2022-06-23T17:54:31Z) - First-Order Regret in Reinforcement Learning with Linear Function
Approximation: A Robust Estimation Approach [57.570201404222935]
We show that it is possible to obtain regret scaling as $mathcalO(sqrtV_1star K)$ in reinforcement learning with large state spaces.
We demonstrate that existing techniques based on at least squares estimation are insufficient to obtain this result.
arXiv Detail & Related papers (2021-12-07T00:29:57Z) - Breaking the Sample Complexity Barrier to Regret-Optimal Model-Free
Reinforcement Learning [52.76230802067506]
A novel model-free algorithm is proposed to minimize regret in episodic reinforcement learning.
The proposed algorithm employs an em early-settled reference update rule, with the aid of two Q-learning sequences.
The design principle of our early-settled variance reduction method might be of independent interest to other RL settings.
arXiv Detail & Related papers (2021-10-09T21:13:48Z) - Randomized Exploration for Reinforcement Learning with General Value
Function Approximation [122.70803181751135]
We propose a model-free reinforcement learning algorithm inspired by the popular randomized least squares value iteration (RLSVI) algorithm.
Our algorithm drives exploration by simply perturbing the training data with judiciously chosen i.i.d. scalar noises.
We complement the theory with an empirical evaluation across known difficult exploration tasks.
arXiv Detail & Related papers (2021-06-15T02:23:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.