Model-Free Characterizations of the Hamilton-Jacobi-Bellman Equation and
Convex Q-Learning in Continuous Time
- URL: http://arxiv.org/abs/2210.08131v1
- Date: Fri, 14 Oct 2022 21:55:57 GMT
- Title: Model-Free Characterizations of the Hamilton-Jacobi-Bellman Equation and
Convex Q-Learning in Continuous Time
- Authors: Fan Lu, Joel Mathias, Sean Meyn and Karanjit Kalsi
- Abstract summary: This paper explores algorithm design in the continuous time domain, with finite-horizon optimal control objective.
Main contributions are (i) Algorithm design is based on a new Q-ODE, which defines the model-free characterization of the Hamilton-Jacobi-Bellman equation.
A characterization of boundedness of the constraint region is obtained through a non-trivial extension of recent results from the discrete time setting.
- Score: 1.4050836886292872
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Convex Q-learning is a recent approach to reinforcement learning, motivated
by the possibility of a firmer theory for convergence, and the possibility of
making use of greater a priori knowledge regarding policy or value function
structure. This paper explores algorithm design in the continuous time domain,
with finite-horizon optimal control objective. The main contributions are (i)
Algorithm design is based on a new Q-ODE, which defines the model-free
characterization of the Hamilton-Jacobi-Bellman equation. (ii) The Q-ODE
motivates a new formulation of Convex Q-learning that avoids the approximations
appearing in prior work. The Bellman error used in the algorithm is defined by
filtered measurements, which is beneficial in the presence of measurement
noise. (iii) A characterization of boundedness of the constraint region is
obtained through a non-trivial extension of recent results from the discrete
time setting. (iv) The theory is illustrated in application to resource
allocation for distributed energy resources, for which the theory is ideally
suited.
Related papers
- A Unified Framework for Neural Computation and Learning Over Time [56.44910327178975]
Hamiltonian Learning is a novel unified framework for learning with neural networks "over time"
It is based on differential equations that: (i) can be integrated without the need of external software solvers; (ii) generalize the well-established notion of gradient-based learning in feed-forward and recurrent networks; (iii) open to novel perspectives.
arXiv Detail & Related papers (2024-09-18T14:57:13Z) - Regularized Q-Learning with Linear Function Approximation [2.765106384328772]
We consider a bi-level optimization formulation of regularized Q-learning with linear functional approximation.
We show that, under certain assumptions, the proposed algorithm converges to a stationary point in the presence of Markovian noise.
arXiv Detail & Related papers (2024-01-26T20:45:40Z) - Convex Q Learning in a Stochastic Environment: Extended Version [1.680268810119084]
The paper introduces the first formulation of convex Q-learning for Markov decision processes with function approximation.
The proposed algorithms are convergent and new techniques are introduced to obtain the rate of convergence in a mean-square sense.
The theory is illustrated with an application to a classical inventory control problem.
arXiv Detail & Related papers (2023-09-10T18:24:43Z) - Value-Distributional Model-Based Reinforcement Learning [59.758009422067]
Quantifying uncertainty about a policy's long-term performance is important to solve sequential decision-making tasks.
We study the problem from a model-based Bayesian reinforcement learning perspective.
We propose Epistemic Quantile-Regression (EQR), a model-based algorithm that learns a value distribution function.
arXiv Detail & Related papers (2023-08-12T14:59:19Z) - Validation Diagnostics for SBI algorithms based on Normalizing Flows [55.41644538483948]
This work proposes easy to interpret validation diagnostics for multi-dimensional conditional (posterior) density estimators based on NF.
It also offers theoretical guarantees based on results of local consistency.
This work should help the design of better specified models or drive the development of novel SBI-algorithms.
arXiv Detail & Related papers (2022-11-17T15:48:06Z) - q-Learning in Continuous Time [1.4213973379473654]
We study the continuous-time counterpart of Q-learning for reinforcement learning (RL) under the entropy-regularized, exploratory diffusion process formulation.
We develop a q-learning" theory around the q-function that is independent of time discretization.
We devise different actor-critic algorithms for solving underlying problems.
arXiv Detail & Related papers (2022-07-02T02:20:41Z) - Hamilton-Jacobi Deep Q-Learning for Deterministic Continuous-Time
Systems with Lipschitz Continuous Controls [2.922007656878633]
We propose Q-learning algorithms for continuous-time deterministic optimal control problems with Lipschitz continuous controls.
A novel semi-discrete version of the HJB equation is proposed to design a Q-learning algorithm that uses data collected in discrete time without discretizing or approximating the system dynamics.
arXiv Detail & Related papers (2020-10-27T06:11:04Z) - Logistic Q-Learning [87.00813469969167]
We propose a new reinforcement learning algorithm derived from a regularized linear-programming formulation of optimal control in MDPs.
The main feature of our algorithm is a convex loss function for policy evaluation that serves as a theoretically sound alternative to the widely used squared Bellman error.
arXiv Detail & Related papers (2020-10-21T17:14:31Z) - Momentum Q-learning with Finite-Sample Convergence Guarantee [49.38471009162477]
This paper analyzes a class of momentum-based Q-learning algorithms with finite-sample guarantee.
We establish the convergence guarantee for MomentumQ with linear function approximations and Markovian sampling.
We demonstrate through various experiments that the proposed MomentumQ outperforms other momentum-based Q-learning algorithms.
arXiv Detail & Related papers (2020-07-30T12:27:03Z) - Can Temporal-Difference and Q-Learning Learn Representation? A Mean-Field Theory [110.99247009159726]
Temporal-difference and Q-learning play a key role in deep reinforcement learning, where they are empowered by expressive nonlinear function approximators such as neural networks.
In particular, temporal-difference learning converges when the function approximator is linear in a feature representation, which is fixed throughout learning, and possibly diverges otherwise.
arXiv Detail & Related papers (2020-06-08T17:25:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.