Regularized Q-Learning with Linear Function Approximation
- URL: http://arxiv.org/abs/2401.15196v2
- Date: Sun, 28 Jul 2024 16:35:05 GMT
- Title: Regularized Q-Learning with Linear Function Approximation
- Authors: Jiachen Xi, Alfredo Garcia, Petar Momcilovic,
- Abstract summary: We consider a bi-level optimization formulation of regularized Q-learning with linear functional approximation.
We show that, under certain assumptions, the proposed algorithm converges to a stationary point in the presence of Markovian noise.
- Score: 2.765106384328772
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Regularized Markov Decision Processes serve as models of sequential decision making under uncertainty wherein the decision maker has limited information processing capacity and/or aversion to model ambiguity. With functional approximation, the convergence properties of learning algorithms for regularized MDPs (e.g. soft Q-learning) are not well understood because the composition of the regularized Bellman operator and a projection onto the span of basis vectors is not a contraction with respect to any norm. In this paper, we consider a bi-level optimization formulation of regularized Q-learning with linear functional approximation. The {\em lower} level optimization problem aims to identify a value function approximation that satisfies Bellman's recursive optimality condition and the {\em upper} level aims to find the projection onto the span of basis vectors. This formulation motivates a single-loop algorithm with finite time convergence guarantees. The algorithm operates on two time-scales: updates to the projection of state-action values are `slow' in that they are implemented with a step size that is smaller than the one used for `faster' updates of approximate solutions to Bellman's recursive optimality equation. We show that, under certain assumptions, the proposed algorithm converges to a stationary point in the presence of Markovian noise. In addition, we provide a performance guarantee for the policies derived from the proposed algorithm.
Related papers
- Accelerated First-Order Optimization under Nonlinear Constraints [73.2273449996098]
We exploit between first-order algorithms for constrained optimization and non-smooth systems to design a new class of accelerated first-order algorithms.
An important property of these algorithms is that constraints are expressed in terms of velocities instead of sparse variables.
arXiv Detail & Related papers (2023-02-01T08:50:48Z) - Adaptive Stochastic Optimisation of Nonconvex Composite Objectives [2.1700203922407493]
We propose and analyse a family of generalised composite mirror descent algorithms.
With adaptive step sizes, the proposed algorithms converge without requiring prior knowledge of the problem.
We exploit the low-dimensional structure of the decision sets for high-dimensional problems.
arXiv Detail & Related papers (2022-11-21T18:31:43Z) - Structural Estimation of Markov Decision Processes in High-Dimensional
State Space with Finite-Time Guarantees [39.287388288477096]
We consider the task of estimating a structural model of dynamic decisions by a human agent based upon the observable history of implemented actions and visited states.
This problem has an inherent nested structure: in the inner problem, an optimal policy for a given reward function is identified while in the outer problem, a measure of fit is maximized.
We propose a single-loop estimation algorithm with finite time guarantees that is equipped to deal with high-dimensional state spaces.
arXiv Detail & Related papers (2022-10-04T00:11:38Z) - Zeroth and First Order Stochastic Frank-Wolfe Algorithms for Constrained
Optimization [13.170519806372075]
Problems of convex optimization with two sets of constraints arise frequently in the context of semidefinite programming.
Since projection onto the first set of constraints is difficult, it becomes necessary to explore projection-free algorithms.
The efficacy of the proposed algorithms is tested on relevant applications of sparse matrix estimation, clustering via semidefinite relaxation, and uniform sparsest cut problem.
arXiv Detail & Related papers (2021-07-14T08:01:30Z) - A Stochastic Sequential Quadratic Optimization Algorithm for Nonlinear
Equality Constrained Optimization with Rank-Deficient Jacobians [11.03311584463036]
A sequential quadratic optimization algorithm is proposed for solving smooth nonlinear equality constrained optimization problems.
Results of numerical experiments demonstrate that the algorithm offers superior performance when compared to popular alternatives.
arXiv Detail & Related papers (2021-06-24T13:46:52Z) - High Probability Complexity Bounds for Non-Smooth Stochastic Optimization with Heavy-Tailed Noise [51.31435087414348]
It is essential to theoretically guarantee that algorithms provide small objective residual with high probability.
Existing methods for non-smooth convex optimization have complexity bounds with dependence on confidence level.
We propose novel stepsize rules for two methods with gradient clipping.
arXiv Detail & Related papers (2021-06-10T17:54:21Z) - Logistic Q-Learning [87.00813469969167]
We propose a new reinforcement learning algorithm derived from a regularized linear-programming formulation of optimal control in MDPs.
The main feature of our algorithm is a convex loss function for policy evaluation that serves as a theoretically sound alternative to the widely used squared Bellman error.
arXiv Detail & Related papers (2020-10-21T17:14:31Z) - Adaptive Sampling for Best Policy Identification in Markov Decision
Processes [79.4957965474334]
We investigate the problem of best-policy identification in discounted Markov Decision (MDPs) when the learner has access to a generative model.
The advantages of state-of-the-art algorithms are discussed and illustrated.
arXiv Detail & Related papers (2020-09-28T15:22:24Z) - Sequential Quadratic Optimization for Nonlinear Equality Constrained
Stochastic Optimization [10.017195276758454]
It is assumed in this setting that it is intractable to compute objective function and derivative values explicitly.
An algorithm is proposed for the deterministic setting that is modeled after a state-of-the-art line-search SQP algorithm.
The results of numerical experiments demonstrate the practical performance of our proposed techniques.
arXiv Detail & Related papers (2020-07-20T23:04:26Z) - Accelerated Message Passing for Entropy-Regularized MAP Inference [89.15658822319928]
Maximum a posteriori (MAP) inference in discrete-valued random fields is a fundamental problem in machine learning.
Due to the difficulty of this problem, linear programming (LP) relaxations are commonly used to derive specialized message passing algorithms.
We present randomized methods for accelerating these algorithms by leveraging techniques that underlie classical accelerated gradient.
arXiv Detail & Related papers (2020-07-01T18:43:32Z) - Convergence of adaptive algorithms for weakly convex constrained
optimization [59.36386973876765]
We prove the $mathcaltilde O(t-1/4)$ rate of convergence for the norm of the gradient of Moreau envelope.
Our analysis works with mini-batch size of $1$, constant first and second order moment parameters, and possibly smooth optimization domains.
arXiv Detail & Related papers (2020-06-11T17:43:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.