Related papers: Is Bellman Equation Enough for Learning Control?

Is Bellman Equation Enough for Learning Control?

URL: http://arxiv.org/abs/2503.02171v2
Date: Thu, 06 Mar 2025 03:57:43 GMT
Title: Is Bellman Equation Enough for Learning Control?
Authors: Haoxiang You, Lekan Molu, Ian Abraham,
Abstract summary: We show that the unique solution to the Bellman equation fails to hold in continuous state spaces.<n>We then demonstrate a common failure mode in value-based methods: convergence to unstable solutions due to the exponential imbalance between admissible and inadmissible solutions.<n>Finally, we introduce a positive-definite neural architecture that guarantees convergence to the stable solution by construction to address this issue.
Score: 3.70729078195191
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The Bellman equation and its continuous-time counterpart, the Hamilton-Jacobi-Bellman (HJB) equation, serve as necessary conditions for optimality in reinforcement learning and optimal control. While the value function is known to be the unique solution to the Bellman equation in tabular settings, we demonstrate that this uniqueness fails to hold in continuous state spaces. Specifically, for linear dynamical systems, we prove the Bellman equation admits at least $\binom{2n}{n}$ solutions, where $n$ is the state dimension. Crucially, only one of these solutions yields both an optimal policy and a stable closed-loop system. We then demonstrate a common failure mode in value-based methods: convergence to unstable solutions due to the exponential imbalance between admissible and inadmissible solutions. Finally, we introduce a positive-definite neural architecture that guarantees convergence to the stable solution by construction to address this issue.

Related papers

Solving nonconvex Hamilton--Jacobi--Isaacs equations with PINN-based policy iteration [1.3654846342364308]
We present a framework that combines classical dynamic programming with neural networks (PINNs) to solve non-subscriber Hamilton-Jacobi-Isaac equations.<n>Our results suggest that integrating PINNs with policy policy is a practical and theoretically grounded method for solving high-dimensional, nonsubscriber HJI equations.
arXiv Detail & Related papers (2025-07-21T10:06:53Z)
Deep neural networks can provably solve Bellman equations for Markov decision processes without the curse of dimensionality [3.6185342807265415]
Discrete time optimal control problems and leaky decision processes (MDPs) are fundamental models for sequential decision-making under uncertainty.<n>In this article, we construct deep neural network (DNN) approximations for $Q$-functions associated to MDPs with infinite time horizon and finite control set $A$.
arXiv Detail & Related papers (2025-06-28T11:25:44Z)
A Physics-Informed Learning Framework to Solve the Infinite-Horizon Optimal Control Problem [4.2402873718254535]
We propose a physics-informed neural networks (PINNs) framework to solve the infinite-horizon optimal control problem of nonlinear systems.<n>We tackle this by instead applying PINNs to a finite-horizon variant of the steady-state HJB that has a unique solution.<n>Unlike many existing methods, the proposed technique works well with non-polynomial basis functions.
arXiv Detail & Related papers (2025-05-28T00:21:49Z)
Exponential Improvement on Asian Option Pricing Through Quantum Preconditioning Methods [0.0]
We present a quantum algorithm designed to solve the differential equation used in the pricing of Asian options. Our approach modifies an existing quantum pre-conditioning method for the problem of Asian option pricing.
arXiv Detail & Related papers (2025-01-26T17:44:30Z)
Double Momentum Method for Lower-Level Constrained Bilevel Optimization [31.28781889710351]
We propose a new hypergradient of LCBO leveraging the theory of nonsmooth implicit function theorem instead of using the restrive assumptions. In addition, we propose a textitsingle-loop single-timescale iteration algorithm based on the double-momentum method and adaptive step size method.
arXiv Detail & Related papers (2024-06-25T09:05:22Z)
On the Uniqueness of Solution for the Bellman Equation of LTL Objectives [12.918524838804016]
We show that the uniqueness of the solution to the Bellman equation with two discount factors has not been explicitly discussed. We then propose a condition for the Bellman equation to have the expected return as the unique solution.
arXiv Detail & Related papers (2024-04-07T21:06:52Z)
Unbalanced penalization: A new approach to encode inequality constraints of combinatorial problems for quantum optimization algorithms [42.29248343585333]
We present an alternative method that does not require extra slack variables. We evaluate our approach on the traveling salesman problem, the bin packing problem, and the knapsack problem. This new approach can be used to solve problems with inequality constraints with a reduced number of resources.
arXiv Detail & Related papers (2022-11-25T06:05:18Z)
Gradient-Free Methods for Deterministic and Stochastic Nonsmooth Nonconvex Optimization [94.19177623349947]
Non-smooth non optimization problems emerge in machine learning and business making. Two core challenges impede the development of efficient methods with finitetime convergence guarantee. Two-phase versions of GFM and SGFM are also proposed and proven to achieve improved large-deviation results.
arXiv Detail & Related papers (2022-09-12T06:53:24Z)
Canonically consistent quantum master equation [68.8204255655161]
We put forth a new class of quantum master equations that correctly reproduce the state of an open quantum system beyond the infinitesimally weak system-bath coupling limit. Our method is based on incorporating the knowledge of the reduced steady state into its dynamics.
arXiv Detail & Related papers (2022-05-25T15:22:52Z)
The Franke-Gorini-Kossakowski-Lindblad-Sudarshan (FGKLS) Equation for Two-Dimensional Systems [62.997667081978825]
Open quantum systems can obey the Franke-Gorini-Kossakowski-Lindblad-Sudarshan (FGKLS) equation. We exhaustively study the case of a Hilbert space dimension of $2$.
arXiv Detail & Related papers (2022-04-16T07:03:54Z)
Why Should I Trust You, Bellman? The Bellman Error is a Poor Replacement for Value Error [83.10489974736404]
We study the use of the Bellman equation as a surrogate objective for value prediction accuracy. We find that the Bellman error is a poor proxy for the accuracy of the value function.
arXiv Detail & Related papers (2022-01-28T21:03:59Z)
Continuous-Time Fitted Value Iteration for Robust Policies [93.25997466553929]
Solving the Hamilton-Jacobi-Bellman equation is important in many domains including control, robotics and economics. We propose continuous fitted value iteration (cFVI) and robust fitted value iteration (rFVI) These algorithms leverage the non-linear control-affine dynamics and separable state and action reward of many continuous control problems.
arXiv Detail & Related papers (2021-10-05T11:33:37Z)
Robust Online Control with Model Misspecification [96.23493624553998]
We study online control of an unknown nonlinear dynamical system with model misspecification. Our study focuses on robustness, which measures how much deviation from the assumed linear approximation can be tolerated.
arXiv Detail & Related papers (2021-07-16T07:04:35Z)
Efficient Hamiltonian Simulation for Solving Option Price Dynamics [0.0]
We present a digital quantum algorithm to solve Black-Scholes equation on a quantum computer by mapping it to the Schr"odinger equation. The algorithm shows a feasible approach for using efficient Hamiltonian simulation techniques as Quantum Signal Processing.
arXiv Detail & Related papers (2021-01-11T16:54:53Z)
Hardness of Random Optimization Problems for Boolean Circuits, Low-Degree Polynomials, and Langevin Dynamics [78.46689176407936]
We show that families of algorithms fail to produce nearly optimal solutions with high probability. For the case of Boolean circuits, our results improve the state-of-the-art bounds known in circuit complexity theory.
arXiv Detail & Related papers (2020-04-25T05:45:59Z)
Universal Lindblad equation for open quantum systems [0.0]
We develop a Markovian master equation in the Lindblad form for studying quantum many-body systems. The validity of the master equation is based entirely on properties of the bath and the system-bath coupling. We show how our method can be applied to static or driven quantum many-body systems.
arXiv Detail & Related papers (2020-04-03T11:07:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.