Non-Episodic Learning for Online LQR of Unknown Linear Gaussian System
- URL: http://arxiv.org/abs/2103.13278v2
- Date: Fri, 26 Mar 2021 06:39:26 GMT
- Title: Non-Episodic Learning for Online LQR of Unknown Linear Gaussian System
- Authors: Yiwen Lu and Yilin Mo
- Abstract summary: We propose an online non-episodic algorithm that gains knowledge about the system from a single trajectory.
We characterize the almost sure convergence rates of identification and control, and reveal an optimal trade-off between exploration and exploitation.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper considers the data-driven linear-quadratic regulation (LQR)
problem where the system parameters are unknown and need to be identified in
real time. Contrary to existing system identification and data-driven control
methods, which typically require either offline data collection or multiple
resets, we propose an online non-episodic algorithm that gains knowledge about
the system from a single trajectory. The algorithm guarantees that both the
identification error and the suboptimality gap of control performance in this
trajectory converge to zero almost surely. Furthermore, we characterize the
almost sure convergence rates of identification and control, and reveal an
optimal trade-off between exploration and exploitation. We provide a numerical
example to illustrate the effectiveness of our proposed strategy.
Related papers
- A least-square method for non-asymptotic identification in linear switching control [17.938732931331064]
It is known that the underlying partially-observed linear dynamical system lies within a finite collection of known candidate models.
We characterize the finite-time sample complexity of this problem by leveraging recent advances in the non-asymptotic analysis of linear least-square methods.
We propose a data-driven switching strategy that identifies the unknown parameters of the underlying system.
arXiv Detail & Related papers (2024-04-11T20:55:38Z) - Computationally Efficient Data-Driven Discovery and Linear
Representation of Nonlinear Systems For Control [0.0]
This work focuses on developing a data-driven framework using Koopman operator theory for system identification and linearization of nonlinear systems for control.
We show that our proposed method is trained more efficiently and is more accurate than an autoencoder baseline.
arXiv Detail & Related papers (2023-09-08T02:19:14Z) - Data-Driven Adversarial Online Control for Unknown Linear Systems [17.595231077524467]
We present a novel data-driven online adaptive control algorithm to address this online control problem.
Our algorithm guarantees an $tmO(T2/3)$ regret gradient bound with high probability, which matches the best-known regret bound for this problem.
arXiv Detail & Related papers (2023-08-16T04:05:22Z) - Interactive System-wise Anomaly Detection [66.3766756452743]
Anomaly detection plays a fundamental role in various applications.
It is challenging for existing methods to handle the scenarios where the instances are systems whose characteristics are not readily observed as data.
We develop an end-to-end approach which includes an encoder-decoder module that learns system embeddings.
arXiv Detail & Related papers (2023-04-21T02:20:24Z) - Efficient Reinforcement Learning Through Trajectory Generation [5.766441610380447]
A key barrier to using reinforcement learning in real-world applications is the requirement of a large number of system interactions to learn a good control policy.
Off-policy and Offline RL methods have been proposed to reduce the number of interactions with the physical environment by learning control policies from historical data.
We propose a trajectory generation algorithm, which adaptively generates new trajectories as if the system is being operated and explored under the updated control policies.
arXiv Detail & Related papers (2022-11-30T18:49:43Z) - A Robust and Explainable Data-Driven Anomaly Detection Approach For
Power Electronics [56.86150790999639]
We present two anomaly detection and classification approaches, namely the Matrix Profile algorithm and anomaly transformer.
The Matrix Profile algorithm is shown to be well suited as a generalizable approach for detecting real-time anomalies in streaming time-series data.
A series of custom filters is created and added to the detector to tune its sensitivity, recall, and detection accuracy.
arXiv Detail & Related papers (2022-09-23T06:09:35Z) - Federated Offline Reinforcement Learning [55.326673977320574]
We propose a multi-site Markov decision process model that allows for both homogeneous and heterogeneous effects across sites.
We design the first federated policy optimization algorithm for offline RL with sample complexity.
We give a theoretical guarantee for the proposed algorithm, where the suboptimality for the learned policies is comparable to the rate as if data is not distributed.
arXiv Detail & Related papers (2022-06-11T18:03:26Z) - Networked Online Learning for Control of Safety-Critical
Resource-Constrained Systems based on Gaussian Processes [9.544146562919792]
We propose a novel networked online learning approach based on Gaussian process regression.
We propose an effective data transmission scheme between the local system and the cloud taking bandwidth limitations and time delay of the transmission channel into account.
arXiv Detail & Related papers (2022-02-23T13:12:12Z) - Active Learning for Nonlinear System Identification with Guarantees [102.43355665393067]
We study a class of nonlinear dynamical systems whose state transitions depend linearly on a known feature embedding of state-action pairs.
We propose an active learning approach that achieves this by repeating three steps: trajectory planning, trajectory tracking, and re-estimation of the system from all available data.
We show that our method estimates nonlinear dynamical systems at a parametric rate, similar to the statistical rate of standard linear regression.
arXiv Detail & Related papers (2020-06-18T04:54:11Z) - Logarithmic Regret Bound in Partially Observable Linear Dynamical
Systems [91.43582419264763]
We study the problem of system identification and adaptive control in partially observable linear dynamical systems.
We present the first model estimation method with finite-time guarantees in both open and closed-loop system identification.
We show that AdaptOn is the first algorithm that achieves $textpolylogleft(Tright)$ regret in adaptive control of unknown partially observable linear dynamical systems.
arXiv Detail & Related papers (2020-03-25T06:00:33Z) - Adaptive Control and Regret Minimization in Linear Quadratic Gaussian
(LQG) Setting [91.43582419264763]
We propose LqgOpt, a novel reinforcement learning algorithm based on the principle of optimism in the face of uncertainty.
LqgOpt efficiently explores the system dynamics, estimates the model parameters up to their confidence interval, and deploys the controller of the most optimistic model.
arXiv Detail & Related papers (2020-03-12T19:56:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.