Convex Q-Learning, Part 1: Deterministic Optimal Control
- URL: http://arxiv.org/abs/2008.03559v1
- Date: Sat, 8 Aug 2020 17:17:42 GMT
- Title: Convex Q-Learning, Part 1: Deterministic Optimal Control
- Authors: Prashant G. Mehta and Sean P. Meyn
- Abstract summary: It is well known that the extension of Watkins' algorithm to general function approximation settings is challenging.
The paper begins with a brief survey of linear programming approaches to optimal control, leading to a particular over parameterization that lends itself to applications in reinforcement learning.
It is shown that in fact the algorithms are very different: while convex Q-learning solves a convex program that approximates the Bellman equation, theory for DQN is no stronger than for Watkins' algorithm with function approximation.
- Score: 5.685589351789462
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: It is well known that the extension of Watkins' algorithm to general function
approximation settings is challenging: does the projected Bellman equation have
a solution? If so, is the solution useful in the sense of generating a good
policy? And, if the preceding questions are answered in the affirmative, is the
algorithm consistent? These questions are unanswered even in the special case
of Q-function approximations that are linear in the parameter. The challenge
seems paradoxical, given the long history of convex analytic approaches to
dynamic programming. The paper begins with a brief survey of linear programming
approaches to optimal control, leading to a particular over parameterization
that lends itself to applications in reinforcement learning. The main
conclusions are summarized as follows:
(i) The new class of convex Q-learning algorithms is introduced based on the
convex relaxation of the Bellman equation. Convergence is established under
general conditions, including a linear function approximation for the
Q-function.
(ii) A batch implementation appears similar to the famed DQN algorithm (one
engine behind AlphaZero). It is shown that in fact the algorithms are very
different: while convex Q-learning solves a convex program that approximates
the Bellman equation, theory for DQN is no stronger than for Watkins' algorithm
with function approximation: (a) it is shown that both seek solutions to the
same fixed point equation, and (b) the ODE approximations for the two
algorithms coincide, and little is known about the stability of this ODE.
These results are obtained for deterministic nonlinear systems with total
cost criterion. Many extensions are proposed, including kernel implementation,
and extension to MDP models.
Related papers
- Two-Step Q-Learning [0.0]
The paper proposes a novel off-policy two-step Q-learning algorithm, without importance sampling.
Numerical experiments demonstrate the superior performance of both the two-step Q-learning and its smooth variants.
arXiv Detail & Related papers (2024-07-02T15:39:00Z) - Regularized Q-Learning with Linear Function Approximation [2.765106384328772]
We consider a bi-level optimization formulation of regularized Q-learning with linear functional approximation.
We show that, under certain assumptions, the proposed algorithm converges to a stationary point in the presence of Markovian noise.
arXiv Detail & Related papers (2024-01-26T20:45:40Z) - Convex Q Learning in a Stochastic Environment: Extended Version [1.680268810119084]
The paper introduces the first formulation of convex Q-learning for Markov decision processes with function approximation.
The proposed algorithms are convergent and new techniques are introduced to obtain the rate of convergence in a mean-square sense.
The theory is illustrated with an application to a classical inventory control problem.
arXiv Detail & Related papers (2023-09-10T18:24:43Z) - Gradient-free optimization of highly smooth functions: improved analysis
and a new algorithm [87.22224691317766]
This work studies problems with zero-order noisy oracle information under the assumption that the objective function is highly smooth.
We consider two kinds of zero-order projected gradient descent algorithms.
arXiv Detail & Related papers (2023-06-03T17:05:13Z) - Statistical Inference of Constrained Stochastic Optimization via Sketched Sequential Quadratic Programming [53.63469275932989]
We consider online statistical inference of constrained nonlinear optimization problems.
We apply the Sequential Quadratic Programming (StoSQP) method to solve these problems.
arXiv Detail & Related papers (2022-05-27T00:34:03Z) - Hamilton-Jacobi Deep Q-Learning for Deterministic Continuous-Time
Systems with Lipschitz Continuous Controls [2.922007656878633]
We propose Q-learning algorithms for continuous-time deterministic optimal control problems with Lipschitz continuous controls.
A novel semi-discrete version of the HJB equation is proposed to design a Q-learning algorithm that uses data collected in discrete time without discretizing or approximating the system dynamics.
arXiv Detail & Related papers (2020-10-27T06:11:04Z) - Finite-Time Analysis for Double Q-learning [50.50058000948908]
We provide the first non-asymptotic, finite-time analysis for double Q-learning.
We show that both synchronous and asynchronous double Q-learning are guaranteed to converge to an $epsilon$-accurate neighborhood of the global optimum.
arXiv Detail & Related papers (2020-09-29T18:48:21Z) - Adaptive Sampling for Best Policy Identification in Markov Decision
Processes [79.4957965474334]
We investigate the problem of best-policy identification in discounted Markov Decision (MDPs) when the learner has access to a generative model.
The advantages of state-of-the-art algorithms are discussed and illustrated.
arXiv Detail & Related papers (2020-09-28T15:22:24Z) - Deep Learning for Constrained Utility Maximisation [0.0]
This paper proposes two algorithms for solving control problems with deep learning.
The first algorithm solves Markovian problems via the Hamilton Jacobi Bellman equation.
The second uses the full power of the duality method to solve non-Markovian problems.
arXiv Detail & Related papers (2020-08-26T18:40:57Z) - Convergence of adaptive algorithms for weakly convex constrained
optimization [59.36386973876765]
We prove the $mathcaltilde O(t-1/4)$ rate of convergence for the norm of the gradient of Moreau envelope.
Our analysis works with mini-batch size of $1$, constant first and second order moment parameters, and possibly smooth optimization domains.
arXiv Detail & Related papers (2020-06-11T17:43:19Z) - Hardness of Random Optimization Problems for Boolean Circuits,
Low-Degree Polynomials, and Langevin Dynamics [78.46689176407936]
We show that families of algorithms fail to produce nearly optimal solutions with high probability.
For the case of Boolean circuits, our results improve the state-of-the-art bounds known in circuit complexity theory.
arXiv Detail & Related papers (2020-04-25T05:45:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.