Scalable regret for learning to control network-coupled subsystems with
unknown dynamics
- URL: http://arxiv.org/abs/2108.07970v1
- Date: Wed, 18 Aug 2021 04:45:34 GMT
- Title: Scalable regret for learning to control network-coupled subsystems with
unknown dynamics
- Authors: Sagar Sudhakara and Aditya Mahajan and Ashutosh Nayyar and Yi Ouyang
- Abstract summary: Viewing the interconnected subsystems globally results in a regret that increases super-linearly with the number of subsystems.
We propose a new Thompson sampling based learning algorithm which exploits the structure of the underlying network.
We show that the expected regret of the proposed algorithm is bounded by $tildemathcalO big( n sqrtT big)$ where $n$ is the number of subsystems, $T$ is the time horizon and the $tildemathcalO(cdot)$ notation hides logarithmic terms in $n
- Score: 5.670584589057048
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider the problem of controlling an unknown linear quadratic Gaussian
(LQG) system consisting of multiple subsystems connected over a network. Our
goal is to minimize and quantify the regret (i.e. loss in performance) of our
strategy with respect to an oracle who knows the system model. Viewing the
interconnected subsystems globally and directly using existing LQG learning
algorithms for the global system results in a regret that increases
super-linearly with the number of subsystems. Instead, we propose a new
Thompson sampling based learning algorithm which exploits the structure of the
underlying network. We show that the expected regret of the proposed algorithm
is bounded by $\tilde{\mathcal{O}} \big( n \sqrt{T} \big)$ where $n$ is the
number of subsystems, $T$ is the time horizon and the
$\tilde{\mathcal{O}}(\cdot)$ notation hides logarithmic terms in $n$ and $T$.
Thus, the regret scales linearly with the number of subsystems. We present
numerical experiments to illustrate the salient features of the proposed
algorithm.
Related papers
- Efficiently Learning One-Hidden-Layer ReLU Networks via Schur
Polynomials [50.90125395570797]
We study the problem of PAC learning a linear combination of $k$ ReLU activations under the standard Gaussian distribution on $mathbbRd$ with respect to the square loss.
Our main result is an efficient algorithm for this learning task with sample and computational complexity $(dk/epsilon)O(k)$, whereepsilon>0$ is the target accuracy.
arXiv Detail & Related papers (2023-07-24T14:37:22Z) - Dynamic Regret Minimization for Control of Non-stationary Linear
Dynamical Systems [18.783925692307054]
We present an algorithm that achieves the optimal dynamic regret of $tildemathcalO(sqrtST)$ where $S$ is the number of switches.
The crux of our algorithm is an adaptive non-stationarity detection strategy, which builds on an approach recently developed for contextual Multi-armed Bandit problems.
arXiv Detail & Related papers (2021-11-06T01:30:51Z) - Finite-time System Identification and Adaptive Control in Autoregressive
Exogenous Systems [79.67879934935661]
We study the problem of system identification and adaptive control of unknown ARX systems.
We provide finite-time learning guarantees for the ARX systems under both open-loop and closed-loop data collection.
arXiv Detail & Related papers (2021-08-26T18:00:00Z) - A relaxed technical assumption for posterior sampling-based
reinforcement learning for control of unknown linear systems [5.715788333977634]
We revisit the Thompson sampling algorithm to control an unknown linear quadratic (LQ) system recently proposed by Ouyang et al.
We show that by making a minor modification in the algorithm, this technical assumption on the induced norm can be replaced by a milder assumption in terms of the spectral radius of the closed loop system.
arXiv Detail & Related papers (2021-08-19T05:25:28Z) - Thompson sampling for linear quadratic mean-field teams [3.957353452014781]
We consider optimal control of an unknown multi-agent linear quadratic (LQ) system where the dynamics and the cost are coupled across the agents.
We propose a new Thompson sampling based learning algorithm which exploits the structure of the system model and show that the expected Bayesian regret of our proposed algorithm for a system with agents of different types at time horizon $T$ is $tildemathcalO big( |M|1.5 sqrtT big)$ irrespective of the total number of agents.
arXiv Detail & Related papers (2020-11-09T19:07:32Z) - Episodic Linear Quadratic Regulators with Low-rank Transitions [31.8243883890202]
We propose an algorithm that utilizes the intrinsic system low-rank structure for efficient learning.
Our algorithm achieves a $K$-episode regret bound of order $widetildeO(m3/2 K1/2)$.
arXiv Detail & Related papers (2020-11-03T08:48:31Z) - Algorithms and SQ Lower Bounds for PAC Learning One-Hidden-Layer ReLU
Networks [48.32532049640782]
We study the problem of learning one-hidden-layer ReLU networks with $k$ hidden units on $mathbbRd$ under Gaussian marginals.
For the case of positive coefficients, we give the first-time algorithm for this learning problem for $k$ up to $tildeOOmega(sqrtlog d)$.
arXiv Detail & Related papers (2020-06-22T17:53:54Z) - Adaptive Control and Regret Minimization in Linear Quadratic Gaussian
(LQG) Setting [91.43582419264763]
We propose LqgOpt, a novel reinforcement learning algorithm based on the principle of optimism in the face of uncertainty.
LqgOpt efficiently explores the system dynamics, estimates the model parameters up to their confidence interval, and deploys the controller of the most optimistic model.
arXiv Detail & Related papers (2020-03-12T19:56:38Z) - Naive Exploration is Optimal for Online LQR [49.681825576239355]
We show that the optimal regret scales as $widetildeTheta(sqrtd_mathbfu2 d_mathbfx T)$, where $T$ is the number of time steps, $d_mathbfu$ is the dimension of the input space, and $d_mathbfx$ is the dimension of the system state.
Our lower bounds rule out the possibility of a $mathrmpoly(logT)$-regret algorithm, which had been
arXiv Detail & Related papers (2020-01-27T03:44:54Z) - Backward Feature Correction: How Deep Learning Performs Deep
(Hierarchical) Learning [66.05472746340142]
This paper analyzes how multi-layer neural networks can perform hierarchical learning _efficiently_ and _automatically_ by SGD on the training objective.
We establish a new principle called "backward feature correction", where the errors in the lower-level features can be automatically corrected when training together with the higher-level layers.
arXiv Detail & Related papers (2020-01-13T17:28:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.