Convex Q Learning in a Stochastic Environment: Extended Version
- URL: http://arxiv.org/abs/2309.05105v1
- Date: Sun, 10 Sep 2023 18:24:43 GMT
- Title: Convex Q Learning in a Stochastic Environment: Extended Version
- Authors: Fan Lu and Sean Meyn
- Abstract summary: The paper introduces the first formulation of convex Q-learning for Markov decision processes with function approximation.
The proposed algorithms are convergent and new techniques are introduced to obtain the rate of convergence in a mean-square sense.
The theory is illustrated with an application to a classical inventory control problem.
- Score: 1.680268810119084
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The paper introduces the first formulation of convex Q-learning for Markov
decision processes with function approximation. The algorithms and theory rest
on a relaxation of a dual of Manne's celebrated linear programming
characterization of optimal control. The main contributions firstly concern
properties of the relaxation, described as a deterministic convex program: we
identify conditions for a bounded solution, and a significant relationship
between the solution to the new convex program, and the solution to standard
Q-learning. The second set of contributions concern algorithm design and
analysis: (i) A direct model-free method for approximating the convex program
for Q-learning shares properties with its ideal. In particular, a bounded
solution is ensured subject to a simple property of the basis functions; (ii)
The proposed algorithms are convergent and new techniques are introduced to
obtain the rate of convergence in a mean-square sense; (iii) The approach can
be generalized to a range of performance criteria, and it is found that
variance can be reduced by considering ``relative'' dynamic programming
equations; (iv) The theory is illustrated with an application to a classical
inventory control problem.
Related papers
- Trust-Region Sequential Quadratic Programming for Stochastic Optimization with Random Models [57.52124921268249]
We propose a Trust Sequential Quadratic Programming method to find both first and second-order stationary points.
To converge to first-order stationary points, our method computes a gradient step in each iteration defined by minimizing a approximation of the objective subject.
To converge to second-order stationary points, our method additionally computes an eigen step to explore the negative curvature the reduced Hessian matrix.
arXiv Detail & Related papers (2024-09-24T04:39:47Z) - Regularized Q-Learning with Linear Function Approximation [2.765106384328772]
We consider a bi-level optimization formulation of regularized Q-learning with linear functional approximation.
We show that, under certain assumptions, the proposed algorithm converges to a stationary point in the presence of Markovian noise.
arXiv Detail & Related papers (2024-01-26T20:45:40Z) - Comparison of Single- and Multi- Objective Optimization Quality for
Evolutionary Equation Discovery [77.34726150561087]
Evolutionary differential equation discovery proved to be a tool to obtain equations with less a priori assumptions.
The proposed comparison approach is shown on classical model examples -- Burgers equation, wave equation, and Korteweg - de Vries equation.
arXiv Detail & Related papers (2023-06-29T15:37:19Z) - An Optimization-based Deep Equilibrium Model for Hyperspectral Image
Deconvolution with Convergence Guarantees [71.57324258813675]
We propose a novel methodology for addressing the hyperspectral image deconvolution problem.
A new optimization problem is formulated, leveraging a learnable regularizer in the form of a neural network.
The derived iterative solver is then expressed as a fixed-point calculation problem within the Deep Equilibrium framework.
arXiv Detail & Related papers (2023-06-10T08:25:16Z) - Model-Free Characterizations of the Hamilton-Jacobi-Bellman Equation and
Convex Q-Learning in Continuous Time [1.4050836886292872]
This paper explores algorithm design in the continuous time domain, with finite-horizon optimal control objective.
Main contributions are (i) Algorithm design is based on a new Q-ODE, which defines the model-free characterization of the Hamilton-Jacobi-Bellman equation.
A characterization of boundedness of the constraint region is obtained through a non-trivial extension of recent results from the discrete time setting.
arXiv Detail & Related papers (2022-10-14T21:55:57Z) - Parabolic Relaxation for Quadratically-constrained Quadratic Programming
-- Part II: Theoretical & Computational Results [6.355764634492975]
We introduce a convex relaxation for quadratically-constrained quadratic programs, along with a penalized parabolic relaxation to near near feasible solutions.
We show that starting from a sequential point-to-point solution satisfying certain conditions, the penalized parabolic relaxation convergences satisfies certain conditions Karush-Kuhn optimal regularity problem.
arXiv Detail & Related papers (2022-08-07T02:58:04Z) - A conditional gradient homotopy method with applications to Semidefinite
Programming [1.6369790794838281]
homotopy-based conditional gradient method for solving convex optimization problems with a large number of simple conic constraints.
Our theoretical complexity is competitive when confronted to state-of-the-art SDP, with the decisive advantage of cheap projection-frees.
arXiv Detail & Related papers (2022-07-07T05:48:27Z) - Faster Algorithm and Sharper Analysis for Constrained Markov Decision
Process [56.55075925645864]
The problem of constrained decision process (CMDP) is investigated, where an agent aims to maximize the expected accumulated discounted reward subject to multiple constraints.
A new utilities-dual convex approach is proposed with novel integration of three ingredients: regularized policy, dual regularizer, and Nesterov's gradient descent dual.
This is the first demonstration that nonconcave CMDP problems can attain the lower bound of $mathcal O (1/epsilon)$ for all complexity optimization subject to convex constraints.
arXiv Detail & Related papers (2021-10-20T02:57:21Z) - Logistic Q-Learning [87.00813469969167]
We propose a new reinforcement learning algorithm derived from a regularized linear-programming formulation of optimal control in MDPs.
The main feature of our algorithm is a convex loss function for policy evaluation that serves as a theoretically sound alternative to the widely used squared Bellman error.
arXiv Detail & Related papers (2020-10-21T17:14:31Z) - Convex Q-Learning, Part 1: Deterministic Optimal Control [5.685589351789462]
It is well known that the extension of Watkins' algorithm to general function approximation settings is challenging.
The paper begins with a brief survey of linear programming approaches to optimal control, leading to a particular over parameterization that lends itself to applications in reinforcement learning.
It is shown that in fact the algorithms are very different: while convex Q-learning solves a convex program that approximates the Bellman equation, theory for DQN is no stronger than for Watkins' algorithm with function approximation.
arXiv Detail & Related papers (2020-08-08T17:17:42Z) - Conditional gradient methods for stochastically constrained convex
minimization [54.53786593679331]
We propose two novel conditional gradient-based methods for solving structured convex optimization problems.
The most important feature of our framework is that only a subset of the constraints is processed at each iteration.
Our algorithms rely on variance reduction and smoothing used in conjunction with conditional gradient steps, and are accompanied by rigorous convergence guarantees.
arXiv Detail & Related papers (2020-07-07T21:26:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.