Regret Minimization in Partially Observable Linear Quadratic Control
- URL: http://arxiv.org/abs/2002.00082v2
- Date: Sun, 8 Mar 2020 02:35:18 GMT
- Title: Regret Minimization in Partially Observable Linear Quadratic Control
- Authors: Sahin Lale, Kamyar Azizzadenesheli, Babak Hassibi, Anima Anandkumar
- Abstract summary: We study the problem of regret in partially observable linear quadratic control systems when the model dynamics are unknown a priori.
We propose a novel way to decompose the regret and provide an end-to-end sublinear regret upper bound for partially observable linear quadratic control.
- Score: 91.43582419264763
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study the problem of regret minimization in partially observable linear
quadratic control systems when the model dynamics are unknown a priori. We
propose ExpCommit, an explore-then-commit algorithm that learns the model
Markov parameters and then follows the principle of optimism in the face of
uncertainty to design a controller. We propose a novel way to decompose the
regret and provide an end-to-end sublinear regret upper bound for partially
observable linear quadratic control. Finally, we provide stability guarantees
and establish a regret upper bound of $\tilde{\mathcal{O}}(T^{2/3})$ for
ExpCommit, where $T$ is the time horizon of the problem.
Related papers
- Regret Bounds for Episodic Risk-Sensitive Linear Quadratic Regulator [5.445357652101423]
Risk-sensitive linear quadratic regulator is one of the most fundamental problems in risk-sensitive optimal control.
We propose a simple least-squares greedy algorithm and show that it achieves $widetildemathcalO(log N)$ regret.
This is the first set of regret bounds for episodic risk-sensitive linear quadratic regulator.
arXiv Detail & Related papers (2024-06-08T06:06:20Z) - Borda Regret Minimization for Generalized Linear Dueling Bandits [65.09919504862496]
We study the Borda regret minimization problem for dueling bandits, which aims to identify the item with the highest Borda score.
We propose a rich class of generalized linear dueling bandit models, which cover many existing models.
Our algorithm achieves an $tildeO(d2/3 T2/3)$ regret, which is also optimal.
arXiv Detail & Related papers (2023-03-15T17:59:27Z) - Regret Bounds for Adaptive Nonlinear Control [14.489004143703825]
We prove the first finite-time regret bounds for adaptive nonlinear control with subject uncertainty in the setting.
We show that the regret suffered by certainty equivalence adaptive control, compared to an oracle controller with perfect knowledge of the unmodeled disturbances, is upper bounded by $widetildeO(sqrtT)$ in expectation.
arXiv Detail & Related papers (2020-11-26T03:01:09Z) - Nonstationary Reinforcement Learning with Linear Function Approximation [19.521419943509784]
We consider reinforcement learning in episodic Markov decision processes (MDPs) with linear function approximation under drifting environment.
We first develop an optimistic modification of least-squares value with periodic restart, and bound its dynamic regret when variation budgets are known.
We derive the first minimax dynamic regret lower bound for nonstationary linear MDPs and as a byproduct establish a minimax regret lower bound for linear MDPs unsolved by Jin et al.
arXiv Detail & Related papers (2020-10-08T20:07:44Z) - Adaptive Control and Regret Minimization in Linear Quadratic Gaussian
(LQG) Setting [91.43582419264763]
We propose LqgOpt, a novel reinforcement learning algorithm based on the principle of optimism in the face of uncertainty.
LqgOpt efficiently explores the system dynamics, estimates the model parameters up to their confidence interval, and deploys the controller of the most optimistic model.
arXiv Detail & Related papers (2020-03-12T19:56:38Z) - Upper Confidence Primal-Dual Reinforcement Learning for CMDP with
Adversarial Loss [145.54544979467872]
We consider online learning for episodically constrained Markov decision processes (CMDPs)
We propose a new emphupper confidence primal-dual algorithm, which only requires the trajectories sampled from the transition model.
Our analysis incorporates a new high-probability drift analysis of Lagrange multiplier processes into the celebrated regret analysis of upper confidence reinforcement learning.
arXiv Detail & Related papers (2020-03-02T05:02:23Z) - Naive Exploration is Optimal for Online LQR [49.681825576239355]
We show that the optimal regret scales as $widetildeTheta(sqrtd_mathbfu2 d_mathbfx T)$, where $T$ is the number of time steps, $d_mathbfu$ is the dimension of the input space, and $d_mathbfx$ is the dimension of the system state.
Our lower bounds rule out the possibility of a $mathrmpoly(logT)$-regret algorithm, which had been
arXiv Detail & Related papers (2020-01-27T03:44:54Z) - Improper Learning for Non-Stochastic Control [78.65807250350755]
We consider the problem of controlling a possibly unknown linear dynamical system with adversarial perturbations, adversarially chosen convex loss functions, and partially observed states.
Applying online descent to this parametrization yields a new controller which attains sublinear regret vs. a large class of closed-loop policies.
Our bounds are the first in the non-stochastic control setting that compete with emphall stabilizing linear dynamical controllers.
arXiv Detail & Related papers (2020-01-25T02:12:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.