Related papers: Scalable regret for learning to control network-coupled subsystems with unknown dynamics

Scalable regret for learning to control network-coupled subsystems with unknown dynamics

URL: http://arxiv.org/abs/2108.07970v1
Date: Wed, 18 Aug 2021 04:45:34 GMT
Title: Scalable regret for learning to control network-coupled subsystems with unknown dynamics
Authors: Sagar Sudhakara and Aditya Mahajan and Ashutosh Nayyar and Yi Ouyang
Abstract summary: Viewing the interconnected subsystems globally results in a regret that increases super-linearly with the number of subsystems. We propose a new Thompson sampling based learning algorithm which exploits the structure of the underlying network. We show that the expected regret of the proposed algorithm is bounded by $tildemathcalO big( n sqrtT big)$ where $n$ is the number of subsystems, $T$ is the time horizon and the $tildemathcalO(cdot)$ notation hides logarithmic terms in $n
Score: 5.670584589057048
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We consider the problem of controlling an unknown linear quadratic Gaussian (LQG) system consisting of multiple subsystems connected over a network. Our goal is to minimize and quantify the regret (i.e. loss in performance) of our strategy with respect to an oracle who knows the system model. Viewing the interconnected subsystems globally and directly using existing LQG learning algorithms for the global system results in a regret that increases super-linearly with the number of subsystems. Instead, we propose a new Thompson sampling based learning algorithm which exploits the structure of the underlying network. We show that the expected regret of the proposed algorithm is bounded by $\tilde{\mathcal{O}} \big( n \sqrt{T} \big)$ where $n$ is the number of subsystems, $T$ is the time horizon and the $\tilde{\mathcal{O}}(\cdot)$ notation hides logarithmic terms in $n$ and $T$. Thus, the regret scales linearly with the number of subsystems. We present numerical experiments to illustrate the salient features of the proposed algorithm.

Related papers

Efficiently Learning One-Hidden-Layer ReLU Networks via Schur Polynomials [50.90125395570797]
We study the problem of PAC learning a linear combination of $k$ ReLU activations under the standard Gaussian distribution on $mathbbRd$ with respect to the square loss. Our main result is an efficient algorithm for this learning task with sample and computational complexity $(dk/epsilon)O(k)$, whereepsilon>0$ is the target accuracy.
arXiv Detail & Related papers (2023-07-24T14:37:22Z)
Refined Regret for Adversarial MDPs with Linear Function Approximation [50.00022394876222]
We consider learning in an adversarial Decision Process (MDP) where the loss functions can change arbitrarily over $K$ episodes. This paper provides two algorithms that improve the regret to $tildemathcal O(K2/3)$ in the same setting.
arXiv Detail & Related papers (2023-01-30T14:37:21Z)
Algorithms for Efficiently Learning Low-Rank Neural Networks [12.916132936159713]
We study algorithms for learning low-rank neural networks. We present a provably efficient algorithm which learns an optimal low-rank approximation to a single-hidden-layer ReLU network. We propose a novel low-rank framework for training low-rank $textitdeep$ networks.
arXiv Detail & Related papers (2022-02-02T01:08:29Z)
Dynamic Regret Minimization for Control of Non-stationary Linear Dynamical Systems [18.783925692307054]
We present an algorithm that achieves the optimal dynamic regret of $tildemathcalO(sqrtST)$ where $S$ is the number of switches. The crux of our algorithm is an adaptive non-stationarity detection strategy, which builds on an approach recently developed for contextual Multi-armed Bandit problems.
arXiv Detail & Related papers (2021-11-06T01:30:51Z)
Finite-time System Identification and Adaptive Control in Autoregressive Exogenous Systems [79.67879934935661]
We study the problem of system identification and adaptive control of unknown ARX systems. We provide finite-time learning guarantees for the ARX systems under both open-loop and closed-loop data collection.
arXiv Detail & Related papers (2021-08-26T18:00:00Z)
A relaxed technical assumption for posterior sampling-based reinforcement learning for control of unknown linear systems [5.715788333977634]
We revisit the Thompson sampling algorithm to control an unknown linear quadratic (LQ) system recently proposed by Ouyang et al. We show that by making a minor modification in the algorithm, this technical assumption on the induced norm can be replaced by a milder assumption in terms of the spectral radius of the closed loop system.
arXiv Detail & Related papers (2021-08-19T05:25:28Z)
Thompson sampling for linear quadratic mean-field teams [3.957353452014781]
We consider optimal control of an unknown multi-agent linear quadratic (LQ) system where the dynamics and the cost are coupled across the agents. We propose a new Thompson sampling based learning algorithm which exploits the structure of the system model and show that the expected Bayesian regret of our proposed algorithm for a system with agents of different types at time horizon $T$ is $tildemathcalO big( |M|1.5 sqrtT big)$ irrespective of the total number of agents.
arXiv Detail & Related papers (2020-11-09T19:07:32Z)
Episodic Linear Quadratic Regulators with Low-rank Transitions [31.8243883890202]
We propose an algorithm that utilizes the intrinsic system low-rank structure for efficient learning. Our algorithm achieves a $K$-episode regret bound of order $widetildeO(m3/2 K1/2)$.
arXiv Detail & Related papers (2020-11-03T08:48:31Z)
Algorithms and SQ Lower Bounds for PAC Learning One-Hidden-Layer ReLU Networks [48.32532049640782]
We study the problem of learning one-hidden-layer ReLU networks with $k$ hidden units on $mathbbRd$ under Gaussian marginals. For the case of positive coefficients, we give the first-time algorithm for this learning problem for $k$ up to $tildeOOmega(sqrtlog d)$.
arXiv Detail & Related papers (2020-06-22T17:53:54Z)
Adaptive Control and Regret Minimization in Linear Quadratic Gaussian (LQG) Setting [91.43582419264763]
We propose LqgOpt, a novel reinforcement learning algorithm based on the principle of optimism in the face of uncertainty. LqgOpt efficiently explores the system dynamics, estimates the model parameters up to their confidence interval, and deploys the controller of the most optimistic model.
arXiv Detail & Related papers (2020-03-12T19:56:38Z)
Naive Exploration is Optimal for Online LQR [49.681825576239355]
We show that the optimal regret scales as $widetildeTheta(sqrtd_mathbfu2 d_mathbfx T)$, where $T$ is the number of time steps, $d_mathbfu$ is the dimension of the input space, and $d_mathbfx$ is the dimension of the system state. Our lower bounds rule out the possibility of a $mathrmpoly(logT)$-regret algorithm, which had been
arXiv Detail & Related papers (2020-01-27T03:44:54Z)
Backward Feature Correction: How Deep Learning Performs Deep (Hierarchical) Learning [66.05472746340142]
This paper analyzes how multi-layer neural networks can perform hierarchical learning _efficiently_ and _automatically_ by SGD on the training objective. We establish a new principle called "backward feature correction", where the errors in the lower-level features can be automatically corrected when training together with the higher-level layers.
arXiv Detail & Related papers (2020-01-13T17:28:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.