Average Cost Optimal Control of Stochastic Systems Using Reinforcement
Learning
- URL: http://arxiv.org/abs/2010.06236v1
- Date: Tue, 13 Oct 2020 08:51:06 GMT
- Title: Average Cost Optimal Control of Stochastic Systems Using Reinforcement
Learning
- Authors: Jing Lai and Junlin Xiong
- Abstract summary: We propose an online learning scheme to estimate the kernel matrix of Q-function.
The obtained control gain and kernel matrix are proved to converge to the optimal ones.
- Score: 0.19036571490366497
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper addresses the average cost minimization problem for discrete-time
systems with multiplicative and additive noises via reinforcement learning. By
using Q-function, we propose an online learning scheme to estimate the kernel
matrix of Q-function and to update the control gain using the data along the
system trajectories. The obtained control gain and kernel matrix are proved to
converge to the optimal ones. To implement the proposed learning scheme, an
online model-free reinforcement learning algorithm is given, where recursive
least squares method is used to estimate the kernel matrix of Q-function. A
numerical example is presented to illustrate the proposed approach.
Related papers
- Integrating Reinforcement Learning and Model Predictive Control with Applications to Microgrids [14.389086937116582]
This work proposes an approach that integrates reinforcement learning and model predictive control (MPC) to solve optimal control problems in mixed-logical dynamical systems.
The proposed method significantly reduces the online computation time of the MPC approach and that it generates policies with small optimality gaps and high feasibility rates.
arXiv Detail & Related papers (2024-09-17T15:17:16Z) - Spectral Entry-wise Matrix Estimation for Low-Rank Reinforcement
Learning [53.445068584013896]
We study matrix estimation problems arising in reinforcement learning (RL) with low-rank structure.
In low-rank bandits, the matrix to be recovered specifies the expected arm rewards, and for low-rank Markov Decision Processes (MDPs), it may for example characterize the transition kernel of the MDP.
We show that simple spectral-based matrix estimation approaches efficiently recover the singular subspaces of the matrix and exhibit nearly-minimal entry-wise error.
arXiv Detail & Related papers (2023-10-10T17:06:41Z) - Large-Scale OD Matrix Estimation with A Deep Learning Method [70.78575952309023]
The proposed method integrates deep learning and numerical optimization algorithms to infer matrix structure and guide numerical optimization.
We conducted tests to demonstrate the good generalization performance of our method on a large-scale synthetic dataset.
arXiv Detail & Related papers (2023-10-09T14:30:06Z) - Data-Driven H-infinity Control with a Real-Time and Efficient
Reinforcement Learning Algorithm: An Application to Autonomous
Mobility-on-Demand Systems [3.5897534810405403]
This paper presents a model-free, real-time, data-efficient Q-learning-based algorithm to solve the H$_infty$ control of linear discrete-time systems.
An adaptive optimal controller is designed and the parameters of the action and critic networks are learned online without the knowledge of the system dynamics.
arXiv Detail & Related papers (2023-09-16T05:02:41Z) - Computationally Efficient Data-Driven Discovery and Linear
Representation of Nonlinear Systems For Control [0.0]
This work focuses on developing a data-driven framework using Koopman operator theory for system identification and linearization of nonlinear systems for control.
We show that our proposed method is trained more efficiently and is more accurate than an autoencoder baseline.
arXiv Detail & Related papers (2023-09-08T02:19:14Z) - Deep Unrolling for Nonconvex Robust Principal Component Analysis [75.32013242448151]
We design algorithms for Robust Component Analysis (A)
It consists in decomposing a matrix into the sum of a low Principaled matrix and a sparse Principaled matrix.
arXiv Detail & Related papers (2023-07-12T03:48:26Z) - Imitation Learning of Stabilizing Policies for Nonlinear Systems [1.52292571922932]
It is shown that the methods developed for linear systems and controllers can be readily extended to controllers using sum of squares.
A projected gradient descent algorithm and an alternating direction method of algorithm are proposed ass for the stabilizing imitation learning problem.
arXiv Detail & Related papers (2021-09-22T17:27:19Z) - Gaussian Process-based Min-norm Stabilizing Controller for
Control-Affine Systems with Uncertain Input Effects and Dynamics [90.81186513537777]
We propose a novel compound kernel that captures the control-affine nature of the problem.
We show that this resulting optimization problem is convex, and we call it Gaussian Process-based Control Lyapunov Function Second-Order Cone Program (GP-CLF-SOCP)
arXiv Detail & Related papers (2020-11-14T01:27:32Z) - Logistic Q-Learning [87.00813469969167]
We propose a new reinforcement learning algorithm derived from a regularized linear-programming formulation of optimal control in MDPs.
The main feature of our algorithm is a convex loss function for policy evaluation that serves as a theoretically sound alternative to the widely used squared Bellman error.
arXiv Detail & Related papers (2020-10-21T17:14:31Z) - Model-free optimal control of discrete-time systems with additive and
multiplicative noises [1.656520517245166]
This paper investigates the optimal control problem for a class of discrete-time systems subject to additive and multiplicative noises.
A modelfree reinforcement learning algorithm is proposed to learn the optimal admissible control policy using the data of the system states and inputs.
It is proven that the learning algorithm converges to the optimal admissible control policy.
arXiv Detail & Related papers (2020-08-20T02:18:00Z) - Adaptive Control and Regret Minimization in Linear Quadratic Gaussian
(LQG) Setting [91.43582419264763]
We propose LqgOpt, a novel reinforcement learning algorithm based on the principle of optimism in the face of uncertainty.
LqgOpt efficiently explores the system dynamics, estimates the model parameters up to their confidence interval, and deploys the controller of the most optimistic model.
arXiv Detail & Related papers (2020-03-12T19:56:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.