Horizon-Free Regret for Linear Markov Decision Processes
- URL: http://arxiv.org/abs/2403.10738v1
- Date: Fri, 15 Mar 2024 23:50:58 GMT
- Title: Horizon-Free Regret for Linear Markov Decision Processes
- Authors: Zihan Zhang, Jason D. Lee, Yuxin Chen, Simon S. Du,
- Abstract summary: A recent line of works showed regret bounds in reinforcement learning can be (nearly) independent of planning horizon.
We give the first horizon-free bound for the popular linear Markov Decision Process (MDP) setting.
In contrast to prior works which explicitly estimate the transition model and compute the inhomogeneous value functions, we directly estimate the value functions and confidence sets.
- Score: 92.02082223856479
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A recent line of works showed regret bounds in reinforcement learning (RL) can be (nearly) independent of planning horizon, a.k.a.~the horizon-free bounds. However, these regret bounds only apply to settings where a polynomial dependency on the size of transition model is allowed, such as tabular Markov Decision Process (MDP) and linear mixture MDP. We give the first horizon-free bound for the popular linear MDP setting where the size of the transition model can be exponentially large or even uncountable. In contrast to prior works which explicitly estimate the transition model and compute the inhomogeneous value functions at different time steps, we directly estimate the value functions and confidence sets. We obtain the horizon-free bound by: (1) maintaining multiple weighted least square estimators for the value functions; and (2) a structural lemma which shows the maximal total variation of the inhomogeneous value functions is bounded by a polynomial factor of the feature dimension.
Related papers
- Model-based RL as a Minimalist Approach to Horizon-Free and Second-Order Bounds [59.875550175217874]
We show that a simple Model-based Reinforcement Learning scheme achieves strong regret and sample bounds in online and offline RL settings.
We highlight that our algorithms are simple, fairly standard, and indeed have been extensively studied in the RL literature.
arXiv Detail & Related papers (2024-08-16T19:52:53Z) - Double Duality: Variational Primal-Dual Policy Optimization for
Constrained Reinforcement Learning [132.7040981721302]
We study the Constrained Convex Decision Process (MDP), where the goal is to minimize a convex functional of the visitation measure.
Design algorithms for a constrained convex MDP faces several challenges, including handling the large state space.
arXiv Detail & Related papers (2024-02-16T16:35:18Z) - Value-Biased Maximum Likelihood Estimation for Model-based Reinforcement
Learning in Discounted Linear MDPs [16.006893624836554]
We propose to solve linear MDPs through the lens of Value-Biased Maximum Likelihood Estimation (VBMLE)
VBMLE is computationally more efficient as it only requires solving one optimization problem in each time step.
In our regret analysis, we offer a generic convergence result of MLE in linear MDPs through a novel supermartingale construct.
arXiv Detail & Related papers (2023-10-17T18:27:27Z) - Nearly Minimax Optimal Reinforcement Learning for Linear Markov Decision
Processes [80.89852729380425]
We propose the first computationally efficient algorithm that achieves the nearly minimax optimal regret $tilde O(dsqrtH3K)$.
Our work provides a complete answer to optimal RL with linear MDPs, and the developed algorithm and theoretical tools may be of independent interest.
arXiv Detail & Related papers (2022-12-12T18:58:59Z) - Provably Efficient Model-Free Constrained RL with Linear Function
Approximation [4.060731229044571]
We develop the first model-free, simulator-free algorithm that achieves a sublinear regret and a sublinear constraint violation even in large-scale systems.
Our results are achieved via novel adaptations of the standard LSVI-UCB algorithms.
arXiv Detail & Related papers (2022-06-23T17:54:31Z) - Deep Learning Approximation of Diffeomorphisms via Linear-Control
Systems [91.3755431537592]
We consider a control system of the form $dot x = sum_i=1lF_i(x)u_i$, with linear dependence in the controls.
We use the corresponding flow to approximate the action of a diffeomorphism on a compact ensemble of points.
arXiv Detail & Related papers (2021-10-24T08:57:46Z) - Lifting the Convex Conjugate in Lagrangian Relaxations: A Tractable
Approach for Continuous Markov Random Fields [53.31927549039624]
We show that a piecewise discretization preserves better contrast from existing discretization problems.
We apply this theory to the problem of matching two images.
arXiv Detail & Related papers (2021-07-13T12:31:06Z) - Polynomial Time Reinforcement Learning in Correlated FMDPs with Linear
Value Functions [25.621280373733605]
We present the first algorithm for reinforcement learning with Factored Markov Decision Processes (FMDPs)
We do not assume that the transitions on various factors are independent.
In contrast to prior work, we do not assume that the transitions on various factors are independent.
arXiv Detail & Related papers (2021-07-12T04:13:18Z) - Characterizing the SLOPE Trade-off: A Variational Perspective and the
Donoho-Tanner Limit [29.344264789740894]
Sorted l1 regularization has been incorporated into many methods for solving high-dimensional statistical estimation problems.
We show how this technique improves variable selection by characterizing the optimal SLOPE trade-off between the false discovery proportion (FDP) and true positive proportion (TPP) or, equivalently, between measures of type I error and power.
arXiv Detail & Related papers (2021-05-27T16:56:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.