Hamilton-Jacobi Deep Q-Learning for Deterministic Continuous-Time
Systems with Lipschitz Continuous Controls
- URL: http://arxiv.org/abs/2010.14087v1
- Date: Tue, 27 Oct 2020 06:11:04 GMT
- Title: Hamilton-Jacobi Deep Q-Learning for Deterministic Continuous-Time
Systems with Lipschitz Continuous Controls
- Authors: Jeongho Kim, Jaeuk Shin, Insoon Yang
- Abstract summary: We propose Q-learning algorithms for continuous-time deterministic optimal control problems with Lipschitz continuous controls.
A novel semi-discrete version of the HJB equation is proposed to design a Q-learning algorithm that uses data collected in discrete time without discretizing or approximating the system dynamics.
- Score: 2.922007656878633
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we propose Q-learning algorithms for continuous-time
deterministic optimal control problems with Lipschitz continuous controls. Our
method is based on a new class of Hamilton-Jacobi-Bellman (HJB) equations
derived from applying the dynamic programming principle to continuous-time
Q-functions. A novel semi-discrete version of the HJB equation is proposed to
design a Q-learning algorithm that uses data collected in discrete time without
discretizing or approximating the system dynamics. We identify the condition
under which the Q-function estimated by this algorithm converges to the optimal
Q-function. For practical implementation, we propose the Hamilton-Jacobi DQN,
which extends the idea of deep Q-networks (DQN) to our continuous control
setting. This approach does not require actor networks or numerical solutions
to optimization problems for greedy actions since the HJB equation provides a
simple characterization of optimal controls via ordinary differential
equations. We empirically demonstrate the performance of our method through
benchmark tasks and high-dimensional linear-quadratic problems.
Related papers
- Hamilton-Jacobi Based Policy-Iteration via Deep Operator Learning [9.950128864603599]
We incorporate DeepONet with a recently developed policy scheme to numerically solve optimal control problems.
A notable feature of our approach is that once the neural network is trained, the solution to the optimal control problem and HJB equations can be inferred quickly.
arXiv Detail & Related papers (2024-06-16T12:53:17Z) - Finite-Time Error Analysis of Soft Q-Learning: Switching System Approach [4.36117236405564]
Soft Q-learning is a variation of Q-learning designed to solve entropy regularized Markov decision problems.
This paper aims to offer a novel and unified finite-time, control-theoretic analysis of soft Q-learning algorithms.
arXiv Detail & Related papers (2024-03-11T01:36:37Z) - Neural Time-Reversed Generalized Riccati Equation [60.92253836775246]
Hamiltonian equations offer an interpretation of optimality through auxiliary variables known as costates.
This paper introduces a novel neural-based approach to optimal control, with the aim of working forward-in-time.
arXiv Detail & Related papers (2023-12-14T19:29:37Z) - Pointer Networks with Q-Learning for Combinatorial Optimization [55.2480439325792]
We introduce the Pointer Q-Network (PQN), a hybrid neural architecture that integrates model-free Q-value policy approximation with Pointer Networks (Ptr-Nets)
Our empirical results demonstrate the efficacy of this approach, also testing the model in unstable environments.
arXiv Detail & Related papers (2023-11-05T12:03:58Z) - Differentially Private Deep Q-Learning for Pattern Privacy Preservation
in MEC Offloading [76.0572817182483]
attackers may eavesdrop on the offloading decisions to infer the edge server's (ES's) queue information and users' usage patterns.
We propose an offloading strategy which jointly minimizes the latency, ES's energy consumption, and task dropping rate, while preserving pattern privacy (PP)
We develop a Differential Privacy Deep Q-learning based Offloading (DP-DQO) algorithm to solve this problem while addressing the PP issue by injecting noise into the generated offloading decisions.
arXiv Detail & Related papers (2023-02-09T12:50:18Z) - Model-Free Characterizations of the Hamilton-Jacobi-Bellman Equation and
Convex Q-Learning in Continuous Time [1.4050836886292872]
This paper explores algorithm design in the continuous time domain, with finite-horizon optimal control objective.
Main contributions are (i) Algorithm design is based on a new Q-ODE, which defines the model-free characterization of the Hamilton-Jacobi-Bellman equation.
A characterization of boundedness of the constraint region is obtained through a non-trivial extension of recent results from the discrete time setting.
arXiv Detail & Related papers (2022-10-14T21:55:57Z) - q-Learning in Continuous Time [1.4213973379473654]
We study the continuous-time counterpart of Q-learning for reinforcement learning (RL) under the entropy-regularized, exploratory diffusion process formulation.
We develop a q-learning" theory around the q-function that is independent of time discretization.
We devise different actor-critic algorithms for solving underlying problems.
arXiv Detail & Related papers (2022-07-02T02:20:41Z) - Q-Match: Iterative Shape Matching via Quantum Annealing [64.74942589569596]
Finding shape correspondences can be formulated as an NP-hard quadratic assignment problem (QAP)
This paper proposes Q-Match, a new iterative quantum method for QAPs inspired by the alpha-expansion algorithm.
Q-Match can be applied for shape matching problems iteratively, on a subset of well-chosen correspondences, allowing us to scale to real-world problems.
arXiv Detail & Related papers (2021-05-06T17:59:38Z) - Logistic Q-Learning [87.00813469969167]
We propose a new reinforcement learning algorithm derived from a regularized linear-programming formulation of optimal control in MDPs.
The main feature of our algorithm is a convex loss function for policy evaluation that serves as a theoretically sound alternative to the widely used squared Bellman error.
arXiv Detail & Related papers (2020-10-21T17:14:31Z) - Convex Q-Learning, Part 1: Deterministic Optimal Control [5.685589351789462]
It is well known that the extension of Watkins' algorithm to general function approximation settings is challenging.
The paper begins with a brief survey of linear programming approaches to optimal control, leading to a particular over parameterization that lends itself to applications in reinforcement learning.
It is shown that in fact the algorithms are very different: while convex Q-learning solves a convex program that approximates the Bellman equation, theory for DQN is no stronger than for Watkins' algorithm with function approximation.
arXiv Detail & Related papers (2020-08-08T17:17:42Z) - Momentum Q-learning with Finite-Sample Convergence Guarantee [49.38471009162477]
This paper analyzes a class of momentum-based Q-learning algorithms with finite-sample guarantee.
We establish the convergence guarantee for MomentumQ with linear function approximations and Markovian sampling.
We demonstrate through various experiments that the proposed MomentumQ outperforms other momentum-based Q-learning algorithms.
arXiv Detail & Related papers (2020-07-30T12:27:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.