Policy Learning of MDPs with Mixed Continuous/Discrete Variables: A Case
Study on Model-Free Control of Markovian Jump Systems
- URL: http://arxiv.org/abs/2006.03116v2
- Date: Wed, 15 Jul 2020 01:31:03 GMT
- Title: Policy Learning of MDPs with Mixed Continuous/Discrete Variables: A Case
Study on Model-Free Control of Markovian Jump Systems
- Authors: Joao Paulo Jansch-Porto, Bin Hu, Geir Dullerud
- Abstract summary: We introduce the problem of controlling unknown (discrete-time) MJLS as a new benchmark for policy-based reinforcement learning.
Compared with the traditional linear quadratic regulator (LQR), our proposed problem leads to a special hybrid MDP.
- Score: 3.3343656101775365
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Markovian jump linear systems (MJLS) are an important class of dynamical
systems that arise in many control applications. In this paper, we introduce
the problem of controlling unknown (discrete-time) MJLS as a new benchmark for
policy-based reinforcement learning of Markov decision processes (MDPs) with
mixed continuous/discrete state variables. Compared with the traditional linear
quadratic regulator (LQR), our proposed problem leads to a special hybrid MDP
(with mixed continuous and discrete variables) and poses significant new
challenges due to the appearance of an underlying Markov jump parameter
governing the mode of the system dynamics. Specifically, the state of a MJLS
does not form a Markov chain and hence one cannot study the MJLS control
problem as a MDP with solely continuous state variable. However, one can
augment the state and the jump parameter to obtain a MDP with a mixed
continuous/discrete state space. We discuss how control theory sheds light on
the policy parameterization of such hybrid MDPs. Then we modify the widely used
natural policy gradient method to directly learn the optimal state feedback
control policy for MJLS without identifying either the system dynamics or the
transition probability of the switching parameter. We implement the
(data-driven) natural policy gradient method on different MJLS examples. Our
simulation results suggest that the natural gradient method can efficiently
learn the optimal controller for MJLS with unknown dynamics.
Related papers
- Two-Stage ML-Guided Decision Rules for Sequential Decision Making under Uncertainty [55.06411438416805]
Sequential Decision Making under Uncertainty (SDMU) is ubiquitous in many domains such as energy, finance, and supply chains.
Some SDMU are naturally modeled as Multistage Problems (MSPs) but the resulting optimizations are notoriously challenging from a computational standpoint.
This paper introduces a novel approach Two-Stage General Decision Rules (TS-GDR) to generalize the policy space beyond linear functions.
The effectiveness of TS-GDR is demonstrated through an instantiation using Deep Recurrent Neural Networks named Two-Stage Deep Decision Rules (TS-LDR)
arXiv Detail & Related papers (2024-05-23T18:19:47Z) - Parameter-Adaptive Approximate MPC: Tuning Neural-Network Controllers without Retraining [50.00291020618743]
This work introduces a novel, parameter-adaptive AMPC architecture capable of online tuning without recomputing large datasets and retraining.
We showcase the effectiveness of parameter-adaptive AMPC by controlling the swing-ups of two different real cartpole systems with a severely resource-constrained microcontroller (MCU)
Taken together, these contributions represent a marked step toward the practical application of AMPC in real-world systems.
arXiv Detail & Related papers (2024-04-08T20:02:19Z) - Learning Over Contracting and Lipschitz Closed-Loops for
Partially-Observed Nonlinear Systems (Extended Version) [1.2430809884830318]
This paper presents a policy parameterization for learning-based control on nonlinear, partially-observed dynamical systems.
We prove that the resulting Youla-REN parameterization automatically satisfies stability (contraction) and user-tunable robustness (Lipschitz) conditions.
We find that the Youla-REN performs similarly to existing learning-based and optimal control methods while also ensuring stability and exhibiting improved robustness to adversarial disturbances.
arXiv Detail & Related papers (2023-04-12T23:55:56Z) - Robust Control for Dynamical Systems With Non-Gaussian Noise via Formal
Abstractions [59.605246463200736]
We present a novel controller synthesis method that does not rely on any explicit representation of the noise distributions.
First, we abstract the continuous control system into a finite-state model that captures noise by probabilistic transitions between discrete states.
We use state-of-the-art verification techniques to provide guarantees on the interval Markov decision process and compute a controller for which these guarantees carry over to the original control system.
arXiv Detail & Related papers (2023-01-04T10:40:30Z) - Formal Controller Synthesis for Markov Jump Linear Systems with
Uncertain Dynamics [64.72260320446158]
We propose a method for synthesising controllers for Markov jump linear systems.
Our method is based on a finite-state abstraction that captures both the discrete (mode-jumping) and continuous (stochastic linear) behaviour of the MJLS.
We apply our method to multiple realistic benchmark problems, in particular, a temperature control and an aerial vehicle delivery problem.
arXiv Detail & Related papers (2022-12-01T17:36:30Z) - Deep Learning Explicit Differentiable Predictive Control Laws for
Buildings [1.4121977037543585]
We present a differentiable predictive control (DPC) methodology for learning constrained control laws for unknown nonlinear systems.
DPC poses an approximate solution to multiparametric programming problems emerging from explicit nonlinear model predictive control (MPC)
arXiv Detail & Related papers (2021-07-25T16:47:57Z) - Modular Deep Reinforcement Learning for Continuous Motion Planning with
Temporal Logic [59.94347858883343]
This paper investigates the motion planning of autonomous dynamical systems modeled by Markov decision processes (MDP)
The novelty is to design an embedded product MDP (EP-MDP) between the LDGBA and the MDP.
The proposed LDGBA-based reward shaping and discounting schemes for the model-free reinforcement learning (RL) only depend on the EP-MDP states.
arXiv Detail & Related papers (2021-02-24T01:11:25Z) - Learning Constrained Adaptive Differentiable Predictive Control Policies
With Guarantees [1.1086440815804224]
We present differentiable predictive control (DPC), a method for learning constrained neural control policies for linear systems.
We employ automatic differentiation to obtain direct policy gradients by backpropagating the model predictive control (MPC) loss function and constraints penalties through a differentiable closed-loop system dynamics model.
arXiv Detail & Related papers (2020-04-23T14:24:44Z) - Adaptive Control and Regret Minimization in Linear Quadratic Gaussian
(LQG) Setting [91.43582419264763]
We propose LqgOpt, a novel reinforcement learning algorithm based on the principle of optimism in the face of uncertainty.
LqgOpt efficiently explores the system dynamics, estimates the model parameters up to their confidence interval, and deploys the controller of the most optimistic model.
arXiv Detail & Related papers (2020-03-12T19:56:38Z) - Convergence Guarantees of Policy Optimization Methods for Markovian Jump
Linear Systems [3.3343656101775365]
We show that the Gauss-Newton method converges to the optimal state feedback controller for MJLS at a linear rate if at a controller which stabilizes the closed-loop dynamics in the mean sense.
We present an example to support our theory.
arXiv Detail & Related papers (2020-02-10T21:13:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.