Kullback-Leibler control for discrete-time nonlinear systems on
continuous spaces
- URL: http://arxiv.org/abs/2203.12864v1
- Date: Thu, 24 Mar 2022 06:03:42 GMT
- Title: Kullback-Leibler control for discrete-time nonlinear systems on
continuous spaces
- Authors: Kaito Ito, Kenji Kashima
- Abstract summary: Kullback-Leibler (KL) control enables efficient numerical methods for nonlinear optimal control problems.
We show that the reformulated KL control admits efficient numerical algorithms like the original one without unreasonable assumptions.
- Score: 0.24366811507669117
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Kullback-Leibler (KL) control enables efficient numerical methods for
nonlinear optimal control problems. The crucial assumption of KL control is the
full controllability of the transition distribution. However, this assumption
is often violated when the dynamics evolves in a continuous space.
Consequently, applying KL control to problems with continuous spaces requires
some approximation, which leads to the lost of the optimality. To avoid such
approximation, in this paper, we reformulate the KL control problem for
continuous spaces so that it does not require unrealistic assumptions. The key
difference between the original and reformulated KL control is that the former
measures the control effort by KL divergence between controlled and
uncontrolled transition distributions while the latter replaces the
uncontrolled transition by a noise-driven transition. We show that the
reformulated KL control admits efficient numerical algorithms like the original
one without unreasonable assumptions. Specifically, the associated value
function can be computed by using a Monte Carlo method based on its path
integral representation.
Related papers
- Well-Posed KL-Regularized Control via Wasserstein and Kalman-Wasserstein KL Divergences [0.0]
Kullback-Leibler divergence (KL) regularization is widely used in reinforcement learning, but it becomes infinite under support mismatch and can degenerate in low-noise limits.<n>We introduce (Kalman)-Wasserstein-based KL analogues by replacing the Fisher-Rao geometry in the dynamical formulation of the KL with transport-based singularity.<n>We demonstrate the utility of these divergences in KL-regularized optimal control.
arXiv Detail & Related papers (2026-02-02T15:57:32Z) - Unifying Entropy Regularization in Optimal Control: From and Back to Classical Objectives via Iterated Soft Policies and Path Integral Solutions [4.934817254755008]
This paper develops a unified perspective on several optimal control formulations through the lens of Kullback-Leibler regularization.<n>We propose a central problem that separates the KL penalties on policies and transitions, assigning them independent weights.<n>We show that these soft-policy formulations majorize the original SOC and RSOC problem. This means that the regularized solution can be iterated to retrieve the original solution.
arXiv Detail & Related papers (2025-12-05T19:31:39Z) - Verifying Closed-Loop Contractivity of Learning-Based Controllers via Partitioning [52.23804865017831]
We address the problem of verifying closed-loop contraction in nonlinear control systems whose controller and contraction metric are both parameterized by neural networks.<n>We derive a tractable and scalable sufficient condition for closed-loop contractivity that reduces to checking that the dominant eigenvalue of a symmetric Metzler matrix is nonpositive.
arXiv Detail & Related papers (2025-12-01T23:06:56Z) - Neural Port-Hamiltonian Models for Nonlinear Distributed Control: An Unconstrained Parametrization Approach [0.0]
Neural Networks (NNs) can be leveraged to parametrize control policies that yield good performance.
NNs' sensitivity to small input changes poses a risk of destabilizing the closed-loop system.
To address these problems, we leverage the framework of port-Hamiltonian systems to design continuous-time distributed control policies.
The effectiveness of the proposed distributed controllers is demonstrated through consensus control of non-holonomic mobile robots.
arXiv Detail & Related papers (2024-11-15T10:44:29Z) - Thompson Sampling Achieves $\tilde O(\sqrt{T})$ Regret in Linear
Quadratic Control [85.22735611954694]
We study the problem of adaptive control of stabilizable linear-quadratic regulators (LQRs) using Thompson Sampling (TS)
We propose an efficient TS algorithm for the adaptive control of LQRs, TSAC, that attains $tilde O(sqrtT)$ regret, even for multidimensional systems.
arXiv Detail & Related papers (2022-06-17T02:47:53Z) - On optimization of coherent and incoherent controls for two-level
quantum systems [77.34726150561087]
This article considers some control problems for closed and open two-level quantum systems.
The closed system's dynamics is governed by the Schr"odinger equation with coherent control.
The open system's dynamics is governed by the Gorini-Kossakowski-Sudarshan-Lindblad master equation.
arXiv Detail & Related papers (2022-05-05T09:08:03Z) - Correct-by-construction reach-avoid control of partially observable
linear stochastic systems [7.912008109232803]
We formalize a robust feedback controller for reach-avoid control of discrete-time, linear time-invariant systems.
The problem is to compute a controller that satisfies the required provestate abstraction problem.
arXiv Detail & Related papers (2021-03-03T13:46:52Z) - Improper Learning with Gradient-based Policy Optimization [62.50997487685586]
We consider an improper reinforcement learning setting where the learner is given M base controllers for an unknown Markov Decision Process.
We propose a gradient-based approach that operates over a class of improper mixtures of the controllers.
arXiv Detail & Related papers (2021-02-16T14:53:55Z) - Policy Analysis using Synthetic Controls in Continuous-Time [101.35070661471124]
Counterfactual estimation using synthetic controls is one of the most successful recent methodological developments in causal inference.
We propose a continuous-time alternative that models the latent counterfactual path explicitly using the formalism of controlled differential equations.
arXiv Detail & Related papers (2021-02-02T16:07:39Z) - Gaussian Process-based Min-norm Stabilizing Controller for
Control-Affine Systems with Uncertain Input Effects and Dynamics [90.81186513537777]
We propose a novel compound kernel that captures the control-affine nature of the problem.
We show that this resulting optimization problem is convex, and we call it Gaussian Process-based Control Lyapunov Function Second-Order Cone Program (GP-CLF-SOCP)
arXiv Detail & Related papers (2020-11-14T01:27:32Z) - Gradient Flows for Regularized Stochastic Control Problems [7.801972633035922]
We study control problems with the action space taken to be probability measures with the objective penalised by the relative entropy.
We identify suitable metric space on which we construct a gradient flow for the measure-valued control process.
arXiv Detail & Related papers (2020-06-10T17:07:36Z) - Adaptive Control and Regret Minimization in Linear Quadratic Gaussian
(LQG) Setting [91.43582419264763]
We propose LqgOpt, a novel reinforcement learning algorithm based on the principle of optimism in the face of uncertainty.
LqgOpt efficiently explores the system dynamics, estimates the model parameters up to their confidence interval, and deploys the controller of the most optimistic model.
arXiv Detail & Related papers (2020-03-12T19:56:38Z) - A homotopy approach to coherent quantum LQG control synthesis using
discounted performance criteria [2.0508733018954843]
This paper is concerned with linear-quadratic-Gaussian (LQG) control for a field-mediated feedback connection of a plant and a coherent (measurement-free) controller.
The control objective is to make the closed-loop system internally stable and to minimize the infinite-horizon cost involving the plant variables.
arXiv Detail & Related papers (2020-02-06T18:52:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.