Continuous Policy and Value Iteration for Stochastic Control Problems and Its Convergence
- URL: http://arxiv.org/abs/2506.08121v1
- Date: Mon, 09 Jun 2025 18:20:21 GMT
- Title: Continuous Policy and Value Iteration for Stochastic Control Problems and Its Convergence
- Authors: Qi Feng, Gu Wang,
- Abstract summary: We introduce a continuous policy iteration algorithm where the approximations of the value function of a control problem and the optimal control are simultaneously updated through Langevin-type dynamics.
- Score: 8.65436459753278
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce a continuous policy-value iteration algorithm where the approximations of the value function of a stochastic control problem and the optimal control are simultaneously updated through Langevin-type dynamics. This framework applies to both the entropy-regularized relaxed control problems and the classical control problems, with infinite horizon. We establish policy improvement and demonstrate convergence to the optimal control under the monotonicity condition of the Hamiltonian. By utilizing Langevin-type stochastic differential equations for continuous updates along the policy iteration direction, our approach enables the use of distribution sampling and non-convex learning techniques in machine learning to optimize the value function and identify the optimal control simultaneously.
Related papers
- Primal-Dual Contextual Bayesian Optimization for Control System Online
Optimization with Time-Average Constraints [21.38692458445459]
This paper studies the problem of online performance optimization of constrained closed-loop control systems.
A primal-dual contextual Bayesian optimization algorithm is proposed that achieves sublinear cumulative regret with respect to the dynamic optimal solution.
arXiv Detail & Related papers (2023-04-12T18:37:52Z) - A Policy Gradient Framework for Stochastic Optimal Control Problems with Global Convergence Guarantee [17.813898660635783]
We consider policy gradient methods for optimal control problem in continuous time.<n>We prove the global convergence of the gradient flow and establish a convergence rate under some regularity assumptions.
arXiv Detail & Related papers (2023-02-11T23:30:50Z) - Introduction to Online Control [34.77535508151501]
In online nonstochastic control, both the cost functions as well as the perturbations from the assumed dynamical model are chosen by an adversary.<n>The target is to attain low regret against the best policy in hindsight from a benchmark class of policies.
arXiv Detail & Related papers (2022-11-17T16:12:45Z) - Optimal control for state preparation in two-qubit open quantum systems
driven by coherent and incoherent controls via GRAPE approach [77.34726150561087]
We consider a model of two qubits driven by coherent and incoherent time-dependent controls.
The dynamics of the system is governed by a Gorini-Kossakowski-Sudarshan-Lindblad master equation.
We study evolution of the von Neumann entropy, purity, and one-qubit reduced density matrices under optimized controls.
arXiv Detail & Related papers (2022-11-04T15:20:18Z) - Optimal scheduling of entropy regulariser for continuous-time
linear-quadratic reinforcement learning [9.779769486156631]
Herein agent interacts with the environment by generating noisy controls distributed according to the optimal relaxed policy.
This exploration-exploitation trade-off is determined by the strength of entropy regularisation.
We prove that the regret, for both learning algorithms, is of the order $mathcalO(sqrtN) $ (up to a logarithmic factor) over $N$ episodes, matching the best known result from the literature.
arXiv Detail & Related papers (2022-08-08T23:36:40Z) - Linear convergence of a policy gradient method for finite horizon
continuous time stochastic control problems [3.7971225066055765]
This paper proposes a provably convergent gradient method for general continuous space-time control problems.
We show that the algorithm converges linearly to the point of control, and is stable with respect to policy by steps.
arXiv Detail & Related papers (2022-03-22T14:17:53Z) - Continuous-Time Fitted Value Iteration for Robust Policies [93.25997466553929]
Solving the Hamilton-Jacobi-Bellman equation is important in many domains including control, robotics and economics.
We propose continuous fitted value iteration (cFVI) and robust fitted value iteration (rFVI)
These algorithms leverage the non-linear control-affine dynamics and separable state and action reward of many continuous control problems.
arXiv Detail & Related papers (2021-10-05T11:33:37Z) - Robust Value Iteration for Continuous Control Tasks [99.00362538261972]
When transferring a control policy from simulation to a physical system, the policy needs to be robust to variations in the dynamics to perform well.
We present Robust Fitted Value Iteration, which uses dynamic programming to compute the optimal value function on the compact state domain.
We show that robust value is more robust compared to deep reinforcement learning algorithm and the non-robust version of the algorithm.
arXiv Detail & Related papers (2021-05-25T19:48:35Z) - Policy Analysis using Synthetic Controls in Continuous-Time [101.35070661471124]
Counterfactual estimation using synthetic controls is one of the most successful recent methodological developments in causal inference.
We propose a continuous-time alternative that models the latent counterfactual path explicitly using the formalism of controlled differential equations.
arXiv Detail & Related papers (2021-02-02T16:07:39Z) - Gaussian Process-based Min-norm Stabilizing Controller for
Control-Affine Systems with Uncertain Input Effects and Dynamics [90.81186513537777]
We propose a novel compound kernel that captures the control-affine nature of the problem.
We show that this resulting optimization problem is convex, and we call it Gaussian Process-based Control Lyapunov Function Second-Order Cone Program (GP-CLF-SOCP)
arXiv Detail & Related papers (2020-11-14T01:27:32Z) - Iterative Amortized Policy Optimization [147.63129234446197]
Policy networks are a central feature of deep reinforcement learning (RL) algorithms for continuous control.
From the variational inference perspective, policy networks are a form of textitamortized optimization, optimizing network parameters rather than the policy distributions directly.
We demonstrate that iterative amortized policy optimization, yields performance improvements over direct amortization on benchmark continuous control tasks.
arXiv Detail & Related papers (2020-10-20T23:25:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.