Well-Posed KL-Regularized Control via Wasserstein and Kalman-Wasserstein KL Divergences
- URL: http://arxiv.org/abs/2602.02250v1
- Date: Mon, 02 Feb 2026 15:57:32 GMT
- Title: Well-Posed KL-Regularized Control via Wasserstein and Kalman-Wasserstein KL Divergences
- Authors: Viktor Stein, Adwait Datar, Nihat Ay,
- Abstract summary: Kullback-Leibler divergence (KL) regularization is widely used in reinforcement learning, but it becomes infinite under support mismatch and can degenerate in low-noise limits.<n>We introduce (Kalman)-Wasserstein-based KL analogues by replacing the Fisher-Rao geometry in the dynamical formulation of the KL with transport-based singularity.<n>We demonstrate the utility of these divergences in KL-regularized optimal control.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Kullback-Leibler divergence (KL) regularization is widely used in reinforcement learning, but it becomes infinite under support mismatch and can degenerate in low-noise limits. Utilizing a unified information-geometric framework, we introduce (Kalman)-Wasserstein-based KL analogues by replacing the Fisher-Rao geometry in the dynamical formulation of the KL with transport-based geometries, and we derive closed-form values for common distribution families. These divergences remain finite under support mismatch and yield a geometric interpretation of regularization heuristics used in Kalman ensemble methods. We demonstrate the utility of these divergences in KL-regularized optimal control. In the fully tractable setting of linear time-invariant systems with Gaussian process noise, the classical KL reduces to a quadratic control penalty that becomes singular as process noise vanishes. Our variants remove this singularity, yielding well-posed problems. On a double integrator and a cart-pole example, the resulting controls outperform KL-based regularization.
Related papers
- Regularized Online RLHF with Generalized Bilinear Preferences [68.44113000390544]
We consider the problem of contextual online RLHF with general preferences.<n>We adopt the Generalized Bilinear Preference Model to capture preferences via low-rank, skew-symmetric matrices.<n>We prove that the dual gap of the greedy policy is bounded by the square of the estimation error.
arXiv Detail & Related papers (2026-02-26T15:27:53Z) - Tail-Sensitive KL and Rényi Convergence of Unadjusted Hamiltonian Monte Carlo via One-Shot Couplings [9.926709161663053]
We develop a framework for upgrading Wasserstein convergence guarantees for unadjusted HMC algorithms to guarantees in tail-sensitive KL and Rényi divergences.<n>Our results provide quantitative control of relative density mismatch, clarify the role of discretization bias in strong divergences, and provide principled guarantees relevant both for unadjusted sampling and for generating warm starts for Metropolis-adjusted Markov chains.
arXiv Detail & Related papers (2026-01-13T22:39:23Z) - Unifying Entropy Regularization in Optimal Control: From and Back to Classical Objectives via Iterated Soft Policies and Path Integral Solutions [4.934817254755008]
This paper develops a unified perspective on several optimal control formulations through the lens of Kullback-Leibler regularization.<n>We propose a central problem that separates the KL penalties on policies and transitions, assigning them independent weights.<n>We show that these soft-policy formulations majorize the original SOC and RSOC problem. This means that the regularized solution can be iterated to retrieve the original solution.
arXiv Detail & Related papers (2025-12-05T19:31:39Z) - On the Design of KL-Regularized Policy Gradient Algorithms for LLM Reasoning [59.11784194183928]
Policy gradient algorithms have been successfully applied to enhance the reasoning capabilities of large language models (LLMs)<n>Regularized Policy Gradient (RPG) view shows that the widely-used $k_3$ penalty is exactly the unnormalized KL.<n>RPG-REINFORCE with RPG-Style Clip improves accuracy by up to $+6$ absolute percentage points over DAPO.
arXiv Detail & Related papers (2025-05-23T06:01:21Z) - Wasserstein KL-divergence for Gaussian distributions [0.0]
We show that this version is consistent with the geometry of the sample space $Bbb Rn$.<n>In particular, we can evaluate the WKL-divergence of the Dirac measures concentrated in two points which turns out to be proportional to the squared distance between these points.
arXiv Detail & Related papers (2025-03-31T12:49:01Z) - Logarithmic Regret for Online KL-Regularized Reinforcement Learning [51.113248212150964]
KL-regularization plays a pivotal role in improving efficiency of RL fine-tuning for large language models.<n>Despite its empirical advantage, the theoretical difference between KL-regularized RL and standard RL remains largely under-explored.<n>We propose an optimistic-based KL-regularized online contextual bandit algorithm, and provide a novel analysis of its regret.
arXiv Detail & Related papers (2025-02-11T11:11:05Z) - Statistical and Geometrical properties of regularized Kernel Kullback-Leibler divergence [7.273481485032721]
We study the statistical and geometrical properties of the Kullback-Leibler divergence with kernel covariance operators introduced by Bach [2022]<n>Unlike the classical Kullback-Leibler (KL) divergence that involves density ratios, the KKL compares probability distributions through covariance operators (embeddings) in a reproducible kernel Hilbert space (RKHS)<n>This novel divergence hence shares parallel but different aspects with both the standard Kullback-Leibler between probability distributions and kernel embeddings metrics such as the maximum mean discrepancy.
arXiv Detail & Related papers (2024-08-29T14:01:30Z) - Learning Globally Smooth Functions on Manifolds [94.22412028413102]
Learning smooth functions is generally challenging, except in simple cases such as learning linear or kernel models.
This work proposes to overcome these obstacles by combining techniques from semi-infinite constrained learning and manifold regularization.
We prove that, under mild conditions, this method estimates the Lipschitz constant of the solution, learning a globally smooth solution as a byproduct.
arXiv Detail & Related papers (2022-10-01T15:45:35Z) - Kullback-Leibler control for discrete-time nonlinear systems on
continuous spaces [0.24366811507669117]
Kullback-Leibler (KL) control enables efficient numerical methods for nonlinear optimal control problems.
We show that the reformulated KL control admits efficient numerical algorithms like the original one without unreasonable assumptions.
arXiv Detail & Related papers (2022-03-24T06:03:42Z) - Minibatch vs Local SGD with Shuffling: Tight Convergence Bounds and
Beyond [63.59034509960994]
We study shuffling-based variants: minibatch and local Random Reshuffling, which draw gradients without replacement.
For smooth functions satisfying the Polyak-Lojasiewicz condition, we obtain convergence bounds which show that these shuffling-based variants converge faster than their with-replacement counterparts.
We propose an algorithmic modification called synchronized shuffling that leads to convergence rates faster than our lower bounds in near-homogeneous settings.
arXiv Detail & Related papers (2021-10-20T02:25:25Z) - Optimization Issues in KL-Constrained Approximate Policy Iteration [48.24321346619156]
Many reinforcement learning algorithms can be seen as versions of approximate policy iteration (API)
While standard API often performs poorly, it has been shown that learning can be stabilized by regularizing each policy update by the KL-divergence to the previous policy.
Popular practical algorithms such as TRPO, MPO, and VMPO replace regularization by a constraint on KL-divergence of consecutive policies.
arXiv Detail & Related papers (2021-02-11T19:35:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.