Related papers: Convex Q-Learning, Part 1: Deterministic Optimal Control

Convex Q-Learning, Part 1: Deterministic Optimal Control

URL: http://arxiv.org/abs/2008.03559v1
Date: Sat, 8 Aug 2020 17:17:42 GMT
Title: Convex Q-Learning, Part 1: Deterministic Optimal Control
Authors: Prashant G. Mehta and Sean P. Meyn
Abstract summary: It is well known that the extension of Watkins' algorithm to general function approximation settings is challenging. The paper begins with a brief survey of linear programming approaches to optimal control, leading to a particular over parameterization that lends itself to applications in reinforcement learning. It is shown that in fact the algorithms are very different: while convex Q-learning solves a convex program that approximates the Bellman equation, theory for DQN is no stronger than for Watkins' algorithm with function approximation.
Score: 5.685589351789462
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: It is well known that the extension of Watkins' algorithm to general function approximation settings is challenging: does the projected Bellman equation have a solution? If so, is the solution useful in the sense of generating a good policy? And, if the preceding questions are answered in the affirmative, is the algorithm consistent? These questions are unanswered even in the special case of Q-function approximations that are linear in the parameter. The challenge seems paradoxical, given the long history of convex analytic approaches to dynamic programming. The paper begins with a brief survey of linear programming approaches to optimal control, leading to a particular over parameterization that lends itself to applications in reinforcement learning. The main conclusions are summarized as follows: (i) The new class of convex Q-learning algorithms is introduced based on the convex relaxation of the Bellman equation. Convergence is established under general conditions, including a linear function approximation for the Q-function. (ii) A batch implementation appears similar to the famed DQN algorithm (one engine behind AlphaZero). It is shown that in fact the algorithms are very different: while convex Q-learning solves a convex program that approximates the Bellman equation, theory for DQN is no stronger than for Watkins' algorithm with function approximation: (a) it is shown that both seek solutions to the same fixed point equation, and (b) the ODE approximations for the two algorithms coincide, and little is known about the stability of this ODE. These results are obtained for deterministic nonlinear systems with total cost criterion. Many extensions are proposed, including kernel implementation, and extension to MDP models.

Related papers

Accelerated zero-order SGD under high-order smoothness and overparameterized regime [79.85163929026146]
We present a novel gradient-free algorithm to solve convex optimization problems. Such problems are encountered in medicine, physics, and machine learning. We provide convergence guarantees for the proposed algorithm under both types of noise.
arXiv Detail & Related papers (2024-11-21T10:26:17Z)
Two-Step Q-Learning [0.0]
The paper proposes a novel off-policy two-step Q-learning algorithm, without importance sampling. Numerical experiments demonstrate the superior performance of both the two-step Q-learning and its smooth variants.
arXiv Detail & Related papers (2024-07-02T15:39:00Z)
Regularized Q-Learning with Linear Function Approximation [2.765106384328772]
We consider a bi-level optimization formulation of regularized Q-learning with linear functional approximation. We show that, under certain assumptions, the proposed algorithm converges to a stationary point in the presence of Markovian noise.
arXiv Detail & Related papers (2024-01-26T20:45:40Z)
Convex Q Learning in a Stochastic Environment: Extended Version [1.680268810119084]
The paper introduces the first formulation of convex Q-learning for Markov decision processes with function approximation. The proposed algorithms are convergent and new techniques are introduced to obtain the rate of convergence in a mean-square sense. The theory is illustrated with an application to a classical inventory control problem.
arXiv Detail & Related papers (2023-09-10T18:24:43Z)
Gradient-free optimization of highly smooth functions: improved analysis and a new algorithm [87.22224691317766]
This work studies problems with zero-order noisy oracle information under the assumption that the objective function is highly smooth. We consider two kinds of zero-order projected gradient descent algorithms.
arXiv Detail & Related papers (2023-06-03T17:05:13Z)
Statistical Inference of Constrained Stochastic Optimization via Sketched Sequential Quadratic Programming [53.63469275932989]
We consider online statistical inference of constrained nonlinear optimization problems. We apply the Sequential Quadratic Programming (StoSQP) method to solve these problems.
arXiv Detail & Related papers (2022-05-27T00:34:03Z)
Hamilton-Jacobi Deep Q-Learning for Deterministic Continuous-Time Systems with Lipschitz Continuous Controls [2.922007656878633]
We propose Q-learning algorithms for continuous-time deterministic optimal control problems with Lipschitz continuous controls. A novel semi-discrete version of the HJB equation is proposed to design a Q-learning algorithm that uses data collected in discrete time without discretizing or approximating the system dynamics.
arXiv Detail & Related papers (2020-10-27T06:11:04Z)
Finite-Time Analysis for Double Q-learning [50.50058000948908]
We provide the first non-asymptotic, finite-time analysis for double Q-learning. We show that both synchronous and asynchronous double Q-learning are guaranteed to converge to an $epsilon$-accurate neighborhood of the global optimum.
arXiv Detail & Related papers (2020-09-29T18:48:21Z)
Adaptive Sampling for Best Policy Identification in Markov Decision Processes [79.4957965474334]
We investigate the problem of best-policy identification in discounted Markov Decision (MDPs) when the learner has access to a generative model. The advantages of state-of-the-art algorithms are discussed and illustrated.
arXiv Detail & Related papers (2020-09-28T15:22:24Z)
Deep Learning for Constrained Utility Maximisation [0.0]
This paper proposes two algorithms for solving control problems with deep learning. The first algorithm solves Markovian problems via the Hamilton Jacobi Bellman equation. The second uses the full power of the duality method to solve non-Markovian problems.
arXiv Detail & Related papers (2020-08-26T18:40:57Z)
Hardness of Random Optimization Problems for Boolean Circuits, Low-Degree Polynomials, and Langevin Dynamics [78.46689176407936]
We show that families of algorithms fail to produce nearly optimal solutions with high probability. For the case of Boolean circuits, our results improve the state-of-the-art bounds known in circuit complexity theory.
arXiv Detail & Related papers (2020-04-25T05:45:59Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.