Regret Analysis of Distributed Online LQR Control for Unknown LTI
Systems
- URL: http://arxiv.org/abs/2105.07310v1
- Date: Sat, 15 May 2021 23:02:58 GMT
- Title: Regret Analysis of Distributed Online LQR Control for Unknown LTI
Systems
- Authors: Ting-Jui Chang and Shahin Shahrampour
- Abstract summary: We study the distributed online linear quadratic regulator (LQR) problem for linear time-invariant (LTI) systems with unknown dynamics.
We propose a distributed variant of the online LQR algorithm where each agent computes its system estimate during an exploration stage.
We prove that our proposed algorithm scales $tildeO(T2/3)$, implying the consensus of the network over time.
- Score: 8.832969171530056
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Online learning has recently opened avenues for rethinking classical optimal
control beyond time-invariant cost metrics, and online controllers are designed
when the performance criteria changes adversarially over time. Inspired by this
line of research, we study the distributed online linear quadratic regulator
(LQR) problem for linear time-invariant (LTI) systems with unknown dynamics.
Consider a multi-agent network where each agent is modeled as a LTI system. The
LTI systems are associated with time-varying quadratic costs that are revealed
sequentially. The goal of the network is to collectively (i) estimate the
unknown dynamics and (ii) compute local control sequences competitive to that
of the best centralized policy in hindsight that minimizes the sum of costs for
all agents. This problem is formulated as a {\it regret} minimization. We
propose a distributed variant of the online LQR algorithm where each agent
computes its system estimate during an exploration stage. The agent then
applies distributed online gradient descent on a semi-definite programming
(SDP) whose feasible set is based on the agent's system estimate. We prove that
the regret bound of our proposed algorithm scales $\tilde{O}(T^{2/3})$,
implying the consensus of the network over time. We also provide simulation
results verifying our theoretical guarantee.
Related papers
- Regret Analysis of Distributed Online Control for LTI Systems with
Adversarial Disturbances [12.201535821920624]
This paper addresses the distributed online control problem over a network of linear time-invariant (LTI) systems with possibly unknown dynamics.
For known dynamics, we propose a fully distributed disturbance feedback controller that guarantees a regret bound of $O(sqrtTlog T)$.
For the unknown dynamics case, we design a distributed explore-then-commit approach, where in the exploration phase all agents jointly learn the system dynamics.
arXiv Detail & Related papers (2023-10-04T23:24:39Z) - Online Nonsubmodular Minimization with Delayed Costs: From Full
Information to Bandit Feedback [98.7678704343537]
We focus on a class of nonsubmodular functions with special structure, and prove regret guarantees for several variants of the online and approximate online bandit gradient descent algorithms.
We derive bounds for the agent's regret in the full information and bandit feedback setting, even if the delay between choosing a decision and receiving the incurred cost is unbounded.
arXiv Detail & Related papers (2022-05-15T08:27:12Z) - Finite-time System Identification and Adaptive Control in Autoregressive
Exogenous Systems [79.67879934935661]
We study the problem of system identification and adaptive control of unknown ARX systems.
We provide finite-time learning guarantees for the ARX systems under both open-loop and closed-loop data collection.
arXiv Detail & Related papers (2021-08-26T18:00:00Z) - Online Markov Decision Processes with Aggregate Bandit Feedback [74.85532145498742]
We study a novel variant of online finite-horizon Markov Decision Processes with adversarially changing loss functions and initially unknown dynamics.
In each episode, the learner suffers the loss accumulated along the trajectory realized by the policy chosen for the episode, and observes aggregate bandit feedback.
Our main result is a computationally efficient algorithm with $O(sqrtK)$ regret for this setting, where $K$ is the number of episodes.
arXiv Detail & Related papers (2021-01-31T16:49:07Z) - Decentralized Multi-Agent Linear Bandits with Safety Constraints [31.67685495996986]
We study decentralized linear bandits, where a network of $N$ agents acts cooperatively to solve a linear bandit-optimization problem.
We propose DLUCB: a fully decentralized algorithm that minimizes the cumulative regret over the entire network.
We show that our ideas extend naturally to the emerging, albeit more challenging, setting of safe bandits.
arXiv Detail & Related papers (2020-12-01T07:33:00Z) - Distributed Online Linear Quadratic Control for Linear Time-invariant
Systems [14.924672048447334]
We study the distributed online linear quadratic (LQ) problem for identical linear time-invariant (LTI) systems.
Consider a multi-agent network where each agent is modeled as an LTI system.
We develop a distributed variant of the online LQ algorithm, which runs distributed online gradient descent with a projection to a semi-definite programming (SDP) to generate controllers.
arXiv Detail & Related papers (2020-09-29T03:30:49Z) - Logarithmic Regret Bound in Partially Observable Linear Dynamical
Systems [91.43582419264763]
We study the problem of system identification and adaptive control in partially observable linear dynamical systems.
We present the first model estimation method with finite-time guarantees in both open and closed-loop system identification.
We show that AdaptOn is the first algorithm that achieves $textpolylogleft(Tright)$ regret in adaptive control of unknown partially observable linear dynamical systems.
arXiv Detail & Related papers (2020-03-25T06:00:33Z) - Adaptive Control and Regret Minimization in Linear Quadratic Gaussian
(LQG) Setting [91.43582419264763]
We propose LqgOpt, a novel reinforcement learning algorithm based on the principle of optimism in the face of uncertainty.
LqgOpt efficiently explores the system dynamics, estimates the model parameters up to their confidence interval, and deploys the controller of the most optimistic model.
arXiv Detail & Related papers (2020-03-12T19:56:38Z) - Regret Bounds for Decentralized Learning in Cooperative Multi-Agent
Dynamical Systems [3.9599054392856488]
quadratic analysis is challenging in Multi-Agent Reinforcement Learning (MARL)
We propose a MARL algorithm based on the construction of an auxiliary single-agent LQ problem.
We show that our algorithm provides a $tildeO(sqrtT)$ regret bound.
arXiv Detail & Related papers (2020-01-27T23:37:41Z) - Improper Learning for Non-Stochastic Control [78.65807250350755]
We consider the problem of controlling a possibly unknown linear dynamical system with adversarial perturbations, adversarially chosen convex loss functions, and partially observed states.
Applying online descent to this parametrization yields a new controller which attains sublinear regret vs. a large class of closed-loop policies.
Our bounds are the first in the non-stochastic control setting that compete with emphall stabilizing linear dynamical controllers.
arXiv Detail & Related papers (2020-01-25T02:12:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.