Related papers: Convex Q Learning in a Stochastic Environment: Extended Version

Convex Q Learning in a Stochastic Environment: Extended Version

URL: http://arxiv.org/abs/2309.05105v1
Date: Sun, 10 Sep 2023 18:24:43 GMT
Title: Convex Q Learning in a Stochastic Environment: Extended Version
Authors: Fan Lu and Sean Meyn
Abstract summary: The paper introduces the first formulation of convex Q-learning for Markov decision processes with function approximation. The proposed algorithms are convergent and new techniques are introduced to obtain the rate of convergence in a mean-square sense. The theory is illustrated with an application to a classical inventory control problem.
Score: 1.680268810119084
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The paper introduces the first formulation of convex Q-learning for Markov decision processes with function approximation. The algorithms and theory rest on a relaxation of a dual of Manne's celebrated linear programming characterization of optimal control. The main contributions firstly concern properties of the relaxation, described as a deterministic convex program: we identify conditions for a bounded solution, and a significant relationship between the solution to the new convex program, and the solution to standard Q-learning. The second set of contributions concern algorithm design and analysis: (i) A direct model-free method for approximating the convex program for Q-learning shares properties with its ideal. In particular, a bounded solution is ensured subject to a simple property of the basis functions; (ii) The proposed algorithms are convergent and new techniques are introduced to obtain the rate of convergence in a mean-square sense; (iii) The approach can be generalized to a range of performance criteria, and it is found that variance can be reduced by considering ``relative'' dynamic programming equations; (iv) The theory is illustrated with an application to a classical inventory control problem.

Related papers

Online Inference of Constrained Optimization: Primal-Dual Optimality and Sequential Quadratic Programming [55.848340925419286]
We study online statistical inference for the solutions of quadratic optimization problems with equality and inequality constraints.<n>We develop a sequential programming (SSQP) method to solve these problems, where the step direction is computed by sequentially performing an approximation of the objective and a linear approximation of the constraints.<n>We show that our method global almost moving-average convergence and exhibits local normality with an optimal primal-dual limiting matrix in the sense of Hjek and Le Cam.
arXiv Detail & Related papers (2025-11-27T06:16:17Z)
Provably data-driven projection method for quadratic programming [18.20955065779317]
We analyze the generalization guarantee for the data-driven projection matrix learning for convex quadratic programs (QPs)<n>We demonstrate that the solutions of convex QPs can be localized within a feasible region corresponding to a special active set.<n>We then extend our analysis to other settings, including learning to match the optimal solution and input-aware setting.
arXiv Detail & Related papers (2025-09-03T14:44:25Z)
Trust-Region Sequential Quadratic Programming for Stochastic Optimization with Random Models [57.52124921268249]
We propose a Trust Sequential Quadratic Programming method to find both first and second-order stationary points. To converge to first-order stationary points, our method computes a gradient step in each iteration defined by minimizing a approximation of the objective subject. To converge to second-order stationary points, our method additionally computes an eigen step to explore the negative curvature the reduced Hessian matrix.
arXiv Detail & Related papers (2024-09-24T04:39:47Z)
Regularized Q-Learning with Linear Function Approximation [2.765106384328772]
We consider a bi-level optimization formulation of regularized Q-learning with linear functional approximation. We show that, under certain assumptions, the proposed algorithm converges to a stationary point in the presence of Markovian noise.
arXiv Detail & Related papers (2024-01-26T20:45:40Z)
Comparison of Single- and Multi- Objective Optimization Quality for Evolutionary Equation Discovery [77.34726150561087]
Evolutionary differential equation discovery proved to be a tool to obtain equations with less a priori assumptions. The proposed comparison approach is shown on classical model examples -- Burgers equation, wave equation, and Korteweg - de Vries equation.
arXiv Detail & Related papers (2023-06-29T15:37:19Z)
An Optimization-based Deep Equilibrium Model for Hyperspectral Image Deconvolution with Convergence Guarantees [71.57324258813675]
We propose a novel methodology for addressing the hyperspectral image deconvolution problem. A new optimization problem is formulated, leveraging a learnable regularizer in the form of a neural network. The derived iterative solver is then expressed as a fixed-point calculation problem within the Deep Equilibrium framework.
arXiv Detail & Related papers (2023-06-10T08:25:16Z)
Model-Free Characterizations of the Hamilton-Jacobi-Bellman Equation and Convex Q-Learning in Continuous Time [1.4050836886292872]
This paper explores algorithm design in the continuous time domain, with finite-horizon optimal control objective. Main contributions are (i) Algorithm design is based on a new Q-ODE, which defines the model-free characterization of the Hamilton-Jacobi-Bellman equation. A characterization of boundedness of the constraint region is obtained through a non-trivial extension of recent results from the discrete time setting.
arXiv Detail & Related papers (2022-10-14T21:55:57Z)
Parabolic Relaxation for Quadratically-constrained Quadratic Programming -- Part II: Theoretical & Computational Results [6.355764634492975]
We introduce a convex relaxation for quadratically-constrained quadratic programs, along with a penalized parabolic relaxation to near near feasible solutions. We show that starting from a sequential point-to-point solution satisfying certain conditions, the penalized parabolic relaxation convergences satisfies certain conditions Karush-Kuhn optimal regularity problem.
arXiv Detail & Related papers (2022-08-07T02:58:04Z)
A conditional gradient homotopy method with applications to Semidefinite Programming [1.6369790794838281]
homotopy-based conditional gradient method for solving convex optimization problems with a large number of simple conic constraints. Our theoretical complexity is competitive when confronted to state-of-the-art SDP, with the decisive advantage of cheap projection-frees.
arXiv Detail & Related papers (2022-07-07T05:48:27Z)
Faster Algorithm and Sharper Analysis for Constrained Markov Decision Process [56.55075925645864]
The problem of constrained decision process (CMDP) is investigated, where an agent aims to maximize the expected accumulated discounted reward subject to multiple constraints. A new utilities-dual convex approach is proposed with novel integration of three ingredients: regularized policy, dual regularizer, and Nesterov's gradient descent dual. This is the first demonstration that nonconcave CMDP problems can attain the lower bound of $mathcal O (1/epsilon)$ for all complexity optimization subject to convex constraints.
arXiv Detail & Related papers (2021-10-20T02:57:21Z)
Optimal oracle inequalities for solving projected fixed-point equations [53.31620399640334]
We study methods that use a collection of random observations to compute approximate solutions by searching over a known low-dimensional subspace of the Hilbert space. We show how our results precisely characterize the error of a class of temporal difference learning methods for the policy evaluation problem with linear function approximation.
arXiv Detail & Related papers (2020-12-09T20:19:32Z)
Logistic Q-Learning [87.00813469969167]
We propose a new reinforcement learning algorithm derived from a regularized linear-programming formulation of optimal control in MDPs. The main feature of our algorithm is a convex loss function for policy evaluation that serves as a theoretically sound alternative to the widely used squared Bellman error.
arXiv Detail & Related papers (2020-10-21T17:14:31Z)
Convex Q-Learning, Part 1: Deterministic Optimal Control [5.685589351789462]
It is well known that the extension of Watkins' algorithm to general function approximation settings is challenging. The paper begins with a brief survey of linear programming approaches to optimal control, leading to a particular over parameterization that lends itself to applications in reinforcement learning. It is shown that in fact the algorithms are very different: while convex Q-learning solves a convex program that approximates the Bellman equation, theory for DQN is no stronger than for Watkins' algorithm with function approximation.
arXiv Detail & Related papers (2020-08-08T17:17:42Z)
Conditional gradient methods for stochastically constrained convex minimization [54.53786593679331]
We propose two novel conditional gradient-based methods for solving structured convex optimization problems. The most important feature of our framework is that only a subset of the constraints is processed at each iteration. Our algorithms rely on variance reduction and smoothing used in conjunction with conditional gradient steps, and are accompanied by rigorous convergence guarantees.
arXiv Detail & Related papers (2020-07-07T21:26:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.