On Policy Stochasticity in Mutual Information Optimal Control of Linear Systems
- URL: http://arxiv.org/abs/2507.21543v1
- Date: Tue, 29 Jul 2025 07:18:28 GMT
- Title: On Policy Stochasticity in Mutual Information Optimal Control of Linear Systems
- Authors: Shoju Enami, Kenji Kashima,
- Abstract summary: We study the relationship between the temperature parameter and the maximum optimality of the policy.<n>Unlike in maximum entropy optimal control, this relationship remains unexplored in mutual information control.<n>We derive the respective conditions on the temperature parameter under which the policy becomes deterministic.
- Score: 0.07366405857677226
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In recent years, mutual information optimal control has been proposed as an extension of maximum entropy optimal control. Both approaches introduce regularization terms to render the policy stochastic, and it is important to theoretically clarify the relationship between the temperature parameter (i.e., the coefficient of the regularization term) and the stochasticity of the policy. Unlike in maximum entropy optimal control, this relationship remains unexplored in mutual information optimal control. In this paper, we investigate this relationship for a mutual information optimal control problem (MIOCP) of discrete-time linear systems. After extending the result of a previous study of the MIOCP, we establish the existence of an optimal policy of the MIOCP, and then derive the respective conditions on the temperature parameter under which the optimal policy becomes stochastic and deterministic. Furthermore, we also derive the respective conditions on the temperature parameter under which the policy obtained by an alternating optimization algorithm becomes stochastic and deterministic. The validity of the theoretical results is demonstrated through numerical experiments.
Related papers
- Mutual Information Optimal Control of Discrete-Time Linear Systems [0.07366405857677226]
We formulate a mutual information optimal control problem (MIOCP) for discrete-time linear systems.<n>This problem can be regarded as an extension of a maximum entropy optimal control problem (MEOCP)
arXiv Detail & Related papers (2025-07-07T07:04:27Z) - Learning Deterministic Policies with Policy Gradients in Constrained Markov Decision Processes [59.27926064817273]
We introduce an exploration-agnostic algorithm, called C-PG, which enjoys global last-iterate convergence guarantees under domination assumptions.<n>We empirically validate both the action-based (C-PGAE) and parameter-based (C-PGPE) variants of C-PG on constrained control tasks.
arXiv Detail & Related papers (2025-06-06T10:29:05Z) - Convergence of Policy Mirror Descent Beyond Compatible Function Approximation [66.4260157478436]
We develop theoretical PMD general policy classes where we strictly assume a weaker variational dominance and obtain convergence to the best-in-class policy.<n>Our main notion leverages a novel notion induced by the local norm induced by the occupancy- gradient measure.
arXiv Detail & Related papers (2025-02-16T08:05:46Z) - Deterministic Policy Gradient Primal-Dual Methods for Continuous-Space Constrained MDPs [82.34567890576423]
We develop a deterministic policy gradient primal-dual method to find an optimal deterministic policy with non-asymptotic convergence.<n>We prove that the primal-dual iterates of D-PGPD converge at a sub-linear rate to an optimal regularized primal-dual pair.<n>This appears to be the first work that proposes a deterministic policy search method for continuous-space constrained MDPs.
arXiv Detail & Related papers (2024-08-19T14:11:04Z) - Last-Iterate Convergent Policy Gradient Primal-Dual Methods for
Constrained MDPs [107.28031292946774]
We study the problem of computing an optimal policy of an infinite-horizon discounted Markov decision process (constrained MDP)
We develop two single-time-scale policy-based primal-dual algorithms with non-asymptotic convergence of their policy iterates to an optimal constrained policy.
To the best of our knowledge, this work appears to be the first non-asymptotic policy last-iterate convergence result for single-time-scale algorithms in constrained MDPs.
arXiv Detail & Related papers (2023-06-20T17:27:31Z) - Matryoshka Policy Gradient for Entropy-Regularized RL: Convergence and Global Optimality [0.5261718469769449]
A novel Policy Gradient (PG) algorithm, called $textitMatryoshka Policy Gradient$ (MPG), is introduced and studied.
We prove uniqueness and characterize the optimal policy of the entropy regularized objective, together with global convergence of MPG.
As a proof of concept, we evaluate numerically MPG on standard test benchmarks.
arXiv Detail & Related papers (2023-03-22T17:56:18Z) - Inference on Optimal Dynamic Policies via Softmax Approximation [27.396891119011215]
We show that a simple soft-max approximation to the optimal treatment regime can achieve valid inference on the truly optimal regime.
Our work combines techniques from semi-parametric inference and $g$-estimation, together with an appropriate array central limit theorem.
arXiv Detail & Related papers (2023-03-08T07:42:47Z) - Understanding the Effect of Stochasticity in Policy Optimization [86.7574122154668]
We show that the preferability of optimization methods depends critically on whether exact gradients are used.
Second, to explain these findings we introduce the concept of committal rate for policy optimization.
Third, we show that in the absence of external oracle information, there is an inherent trade-off between exploiting geometry to accelerate convergence versus achieving optimality almost surely.
arXiv Detail & Related papers (2021-10-29T06:35:44Z) - Convergence of Finite Memory Q-Learning for POMDPs and Near Optimality
of Learned Policies under Filter Stability [0.0]
For POMDPs, we provide the convergence of a Q learning algorithm for control policies using a finite history of past observations and control actions.
We present explicit error bounds relating the approximation error to the length of the finite history window.
We show that the limit fixed point equation gives an optimal solution for an approximate belief-MDP.
arXiv Detail & Related papers (2021-03-22T20:14:26Z) - Iterative Amortized Policy Optimization [147.63129234446197]
Policy networks are a central feature of deep reinforcement learning (RL) algorithms for continuous control.
From the variational inference perspective, policy networks are a form of textitamortized optimization, optimizing network parameters rather than the policy distributions directly.
We demonstrate that iterative amortized policy optimization, yields performance improvements over direct amortization on benchmark continuous control tasks.
arXiv Detail & Related papers (2020-10-20T23:25:42Z) - Near Optimality of Finite Memory Feedback Policies in Partially Observed
Markov Decision Processes [0.0]
We study a planning problem for POMDPs where the system dynamics and measurement channel model is assumed to be known.
We find optimal policies for the approximate belief model under mild non-linear filter stability conditions.
We also establish a rate of convergence result which relates the finite window memory size and the approximation error bound.
arXiv Detail & Related papers (2020-10-15T00:37:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.