Model-free optimal control of discrete-time systems with additive and
multiplicative noises
- URL: http://arxiv.org/abs/2008.08734v1
- Date: Thu, 20 Aug 2020 02:18:00 GMT
- Title: Model-free optimal control of discrete-time systems with additive and
multiplicative noises
- Authors: Jing Lai, Junlin Xiong, Zhan Shu
- Abstract summary: This paper investigates the optimal control problem for a class of discrete-time systems subject to additive and multiplicative noises.
A modelfree reinforcement learning algorithm is proposed to learn the optimal admissible control policy using the data of the system states and inputs.
It is proven that the learning algorithm converges to the optimal admissible control policy.
- Score: 1.656520517245166
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper investigates the optimal control problem for a class of
discrete-time stochastic systems subject to additive and multiplicative noises.
A stochastic Lyapunov equation and a stochastic algebra Riccati equation are
established for the existence of the optimal admissible control policy. A
model-free reinforcement learning algorithm is proposed to learn the optimal
admissible control policy using the data of the system states and inputs
without requiring any knowledge of the system matrices. It is proven that the
learning algorithm converges to the optimal admissible control policy. The
implementation of the model-free algorithm is based on batch least squares and
numerical average. The proposed algorithm is illustrated through a numerical
example, which shows our algorithm outperforms other policy iteration
algorithms.
Related papers
- Data-Driven H-infinity Control with a Real-Time and Efficient
Reinforcement Learning Algorithm: An Application to Autonomous
Mobility-on-Demand Systems [3.5897534810405403]
This paper presents a model-free, real-time, data-efficient Q-learning-based algorithm to solve the H$_infty$ control of linear discrete-time systems.
An adaptive optimal controller is designed and the parameters of the action and critic networks are learned online without the knowledge of the system dynamics.
arXiv Detail & Related papers (2023-09-16T05:02:41Z) - Maximum-Likelihood Inverse Reinforcement Learning with Finite-Time
Guarantees [56.848265937921354]
Inverse reinforcement learning (IRL) aims to recover the reward function and the associated optimal policy.
Many algorithms for IRL have an inherently nested structure.
We develop a novel single-loop algorithm for IRL that does not compromise reward estimation accuracy.
arXiv Detail & Related papers (2022-10-04T17:13:45Z) - Continuous-Time Fitted Value Iteration for Robust Policies [93.25997466553929]
Solving the Hamilton-Jacobi-Bellman equation is important in many domains including control, robotics and economics.
We propose continuous fitted value iteration (cFVI) and robust fitted value iteration (rFVI)
These algorithms leverage the non-linear control-affine dynamics and separable state and action reward of many continuous control problems.
arXiv Detail & Related papers (2021-10-05T11:33:37Z) - Imitation Learning of Stabilizing Policies for Nonlinear Systems [1.52292571922932]
It is shown that the methods developed for linear systems and controllers can be readily extended to controllers using sum of squares.
A projected gradient descent algorithm and an alternating direction method of algorithm are proposed ass for the stabilizing imitation learning problem.
arXiv Detail & Related papers (2021-09-22T17:27:19Z) - Reinforcement Learning for Adaptive Optimal Stationary Control of Linear
Stochastic Systems [15.410124023805249]
This paper studies the adaptive optimal stationary control of continuous-time linear systems with both additive and multiplicative noises.
A novel off-policy reinforcement learning algorithm, named optimistic least-squares-based iteration policy, is proposed.
arXiv Detail & Related papers (2021-07-16T09:27:02Z) - Robust Value Iteration for Continuous Control Tasks [99.00362538261972]
When transferring a control policy from simulation to a physical system, the policy needs to be robust to variations in the dynamics to perform well.
We present Robust Fitted Value Iteration, which uses dynamic programming to compute the optimal value function on the compact state domain.
We show that robust value is more robust compared to deep reinforcement learning algorithm and the non-robust version of the algorithm.
arXiv Detail & Related papers (2021-05-25T19:48:35Z) - Recurrent Model Predictive Control [19.047059454849897]
We propose an off-line algorithm, called Recurrent Model Predictive Control (RMPC), to solve general nonlinear finite-horizon optimal control problems.
Our algorithm employs a recurrent function to approximate the optimal policy, which maps the system states and reference values directly to the control inputs.
arXiv Detail & Related papers (2021-02-23T15:01:36Z) - Logistic Q-Learning [87.00813469969167]
We propose a new reinforcement learning algorithm derived from a regularized linear-programming formulation of optimal control in MDPs.
The main feature of our algorithm is a convex loss function for policy evaluation that serves as a theoretically sound alternative to the widely used squared Bellman error.
arXiv Detail & Related papers (2020-10-21T17:14:31Z) - Average Cost Optimal Control of Stochastic Systems Using Reinforcement
Learning [0.19036571490366497]
We propose an online learning scheme to estimate the kernel matrix of Q-function.
The obtained control gain and kernel matrix are proved to converge to the optimal ones.
arXiv Detail & Related papers (2020-10-13T08:51:06Z) - Adaptive Sampling for Best Policy Identification in Markov Decision
Processes [79.4957965474334]
We investigate the problem of best-policy identification in discounted Markov Decision (MDPs) when the learner has access to a generative model.
The advantages of state-of-the-art algorithms are discussed and illustrated.
arXiv Detail & Related papers (2020-09-28T15:22:24Z) - Variance-Reduced Off-Policy Memory-Efficient Policy Search [61.23789485979057]
Off-policy policy optimization is a challenging problem in reinforcement learning.
Off-policy algorithms are memory-efficient and capable of learning from off-policy samples.
arXiv Detail & Related papers (2020-09-14T16:22:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.