Gradient Flows for Regularized Stochastic Control Problems
- URL: http://arxiv.org/abs/2006.05956v5
- Date: Thu, 25 Jan 2024 09:42:25 GMT
- Title: Gradient Flows for Regularized Stochastic Control Problems
- Authors: David \v{S}i\v{s}ka and {\L}ukasz Szpruch
- Abstract summary: We study control problems with the action space taken to be probability measures with the objective penalised by the relative entropy.
We identify suitable metric space on which we construct a gradient flow for the measure-valued control process.
- Score: 7.801972633035922
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper studies stochastic control problems with the action space taken to
be probability measures, with the objective penalised by the relative entropy.
We identify suitable metric space on which we construct a gradient flow for the
measure-valued control process, in the set of admissible controls, along which
the cost functional is guaranteed to decrease. It is shown that any invariant
measure of this gradient flow satisfies the Pontryagin optimality principle. If
the problem we work with is sufficiently convex, the gradient flow converges
exponentially fast. Furthermore, the optimal measure-valued control process
admits a Bayesian interpretation which means that one can incorporate prior
knowledge when solving such stochastic control problems. This work is motivated
by a desire to extend the theoretical underpinning for the convergence of
stochastic gradient type algorithms widely employed in the reinforcement
learning community to solve control problems.
Related papers
- Revisiting Zeroth-Order Optimization: Minimum-Variance Two-Point Estimators and Directionally Aligned Perturbations [57.179679246370114]
We identify the distribution of random perturbations that minimizes the estimator's variance as the perturbation stepsize tends to zero.<n>Our findings reveal that such desired perturbations can align directionally with the true gradient, instead of maintaining a fixed length.
arXiv Detail & Related papers (2025-10-22T19:06:39Z) - Trust Region Constrained Measure Transport in Path Space for Stochastic Optimal Control and Inference [49.11857020431547]
We show that a trust region based strategy can be understood as a geometric annealing from the prior to the target measure.<n>We demonstrate in multiple optimal control applications that our novel method can improve performance significantly.
arXiv Detail & Related papers (2025-08-17T22:10:35Z) - Growing Q-Networks: Solving Continuous Control Tasks with Adaptive Control Resolution [51.83951489847344]
In robotics applications, smooth control signals are commonly preferred to reduce system wear and energy efficiency.
In this work, we aim to bridge this performance gap by growing discrete action spaces from coarse to fine control resolution.
Our work indicates that an adaptive control resolution in combination with value decomposition yields simple critic-only algorithms that yield surprisingly strong performance on continuous control tasks.
arXiv Detail & Related papers (2024-04-05T17:58:37Z) - FlowPG: Action-constrained Policy Gradient with Normalizing Flows [14.98383953401637]
Action-constrained reinforcement learning (ACRL) is a popular approach for solving safety-critical resource-alential related decision making problems.
A major challenge in ACRL is to ensure agent taking a valid action satisfying constraints in each step.
arXiv Detail & Related papers (2024-02-07T11:11:46Z) - A Policy Gradient Framework for Stochastic Optimal Control Problems with
Global Convergence Guarantee [12.884132885360907]
We consider policy gradient methods for optimal control problem in continuous time.
We prove the global convergence of the gradient flow and establish a convergence rate under some regularity assumptions.
arXiv Detail & Related papers (2023-02-11T23:30:50Z) - Learning to Optimize with Stochastic Dominance Constraints [103.26714928625582]
In this paper, we develop a simple yet efficient approach for the problem of comparing uncertain quantities.
We recast inner optimization in the Lagrangian as a learning problem for surrogate approximation, which bypasses apparent intractability.
The proposed light-SD demonstrates superior performance on several representative problems ranging from finance to supply chain management.
arXiv Detail & Related papers (2022-11-14T21:54:31Z) - Linear convergence of a policy gradient method for finite horizon
continuous time stochastic control problems [3.7971225066055765]
This paper proposes a provably convergent gradient method for general continuous space-time control problems.
We show that the algorithm converges linearly to the point of control, and is stable with respect to policy by steps.
arXiv Detail & Related papers (2022-03-22T14:17:53Z) - Deep Learning Approximation of Diffeomorphisms via Linear-Control
Systems [91.3755431537592]
We consider a control system of the form $dot x = sum_i=1lF_i(x)u_i$, with linear dependence in the controls.
We use the corresponding flow to approximate the action of a diffeomorphism on a compact ensemble of points.
arXiv Detail & Related papers (2021-10-24T08:57:46Z) - Differentiable Annealed Importance Sampling and the Perils of Gradient
Noise [68.44523807580438]
Annealed importance sampling (AIS) and related algorithms are highly effective tools for marginal likelihood estimation.
Differentiability is a desirable property as it would admit the possibility of optimizing marginal likelihood as an objective.
We propose a differentiable algorithm by abandoning Metropolis-Hastings steps, which further unlocks mini-batch computation.
arXiv Detail & Related papers (2021-07-21T17:10:14Z) - Stochastic Control through Approximate Bayesian Input Inference [23.65155934960922]
Optimal control under uncertainty is a prevailing challenge in control, due to the difficulty in producing tractable solutions for the optimization problem.
By framing the control problem as one of input estimation, advanced approximate inference techniques can be used to handle the statistical approximations in a principled and practical manner.
arXiv Detail & Related papers (2021-05-17T09:27:12Z) - Improper Learning with Gradient-based Policy Optimization [62.50997487685586]
We consider an improper reinforcement learning setting where the learner is given M base controllers for an unknown Markov Decision Process.
We propose a gradient-based approach that operates over a class of improper mixtures of the controllers.
arXiv Detail & Related papers (2021-02-16T14:53:55Z) - Gaussian Process-based Min-norm Stabilizing Controller for
Control-Affine Systems with Uncertain Input Effects and Dynamics [90.81186513537777]
We propose a novel compound kernel that captures the control-affine nature of the problem.
We show that this resulting optimization problem is convex, and we call it Gaussian Process-based Control Lyapunov Function Second-Order Cone Program (GP-CLF-SOCP)
arXiv Detail & Related papers (2020-11-14T01:27:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.