Logarithmic Regret for Learning Linear Quadratic Regulators Efficiently
- URL: http://arxiv.org/abs/2002.08095v2
- Date: Wed, 1 Jul 2020 20:37:42 GMT
- Title: Logarithmic Regret for Learning Linear Quadratic Regulators Efficiently
- Authors: Asaf Cassel (1), Alon Cohen (2), Tomer Koren (1) ((1) School of
Computer Science, Tel Aviv University, (2) Google Research, Tel Aviv)
- Abstract summary: Recent results have demonstrated efficient learning algorithms with regret growing with the square root of the number of decision steps.
We present new efficient algorithms that achieve, perhaps surprisingly, regret that scales only (poly)logarithmically with the number of steps.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider the problem of learning in Linear Quadratic Control systems whose
transition parameters are initially unknown. Recent results in this setting
have demonstrated efficient learning algorithms with regret growing with the
square root of the number of decision steps. We present new efficient
algorithms that achieve, perhaps surprisingly, regret that scales only
(poly)logarithmically with the number of steps in two scenarios: when only the
state transition matrix $A$ is unknown, and when only the state-action
transition matrix $B$ is unknown and the optimal policy satisfies a certain
non-degeneracy condition. On the other hand, we give a lower bound that shows
that when the latter condition is violated, square root regret is unavoidable.
Related papers
- Truly No-Regret Learning in Constrained MDPs [61.78619476991494]
We propose a model-based primal-dual algorithm to learn in an unknown CMDP.
We prove that our algorithm achieves sublinear regret without error cancellations.
arXiv Detail & Related papers (2024-02-24T09:47:46Z) - Learning Adversarial Low-rank Markov Decision Processes with Unknown
Transition and Full-information Feedback [30.23951525723659]
We study the low-rank MDPs with adversarially changed losses in the full-information feedback setting.
We propose a policy optimization-based algorithm POLO, and we prove that it attains the $widetildeO(Kfrac56Afrac12dln (1+M)/ (1-gamma)2)$ regret guarantee.
arXiv Detail & Related papers (2023-11-14T03:12:43Z) - Learning the Positions in CountSketch [49.57951567374372]
We consider sketching algorithms which first compress data by multiplication with a random sketch matrix, and then apply the sketch to quickly solve an optimization problem.
In this work, we propose the first learning-based algorithms that also optimize the locations of the non-zero entries.
arXiv Detail & Related papers (2023-06-11T07:28:35Z) - Refined Regret for Adversarial MDPs with Linear Function Approximation [50.00022394876222]
We consider learning in an adversarial Decision Process (MDP) where the loss functions can change arbitrarily over $K$ episodes.
This paper provides two algorithms that improve the regret to $tildemathcal O(K2/3)$ in the same setting.
arXiv Detail & Related papers (2023-01-30T14:37:21Z) - Implicit Parameter-free Online Learning with Truncated Linear Models [51.71216912089413]
parameter-free algorithms are online learning algorithms that do not require setting learning rates.
We propose new parameter-free algorithms that can take advantage of truncated linear models through a new update that has an "implicit" flavor.
Based on a novel decomposition of the regret, the new update is efficient, requires only one gradient at each step, never overshoots the minimum of the truncated model, and retains the favorable parameter-free properties.
arXiv Detail & Related papers (2022-03-19T13:39:49Z) - Safe Adaptive Learning-based Control for Constrained Linear Quadratic
Regulators with Regret Guarantees [11.627320138064684]
We study the adaptive control of an unknown linear system with a quadratic cost function subject to safety constraints on both the states and actions.
Our algorithm is implemented on a single trajectory and does not require system restarts.
arXiv Detail & Related papers (2021-10-31T05:52:42Z) - Online estimation and control with optimal pathlength regret [52.28457815067461]
A natural goal when designing online learning algorithms is to bound the regret of the algorithm in terms of the temporal variation of the input sequence.
Data-dependent "pathlength" regret bounds have recently been obtained for a wide variety of online learning problems, including OCO and bandits.
arXiv Detail & Related papers (2021-10-24T22:43:15Z) - Episodic Linear Quadratic Regulators with Low-rank Transitions [31.8243883890202]
We propose an algorithm that utilizes the intrinsic system low-rank structure for efficient learning.
Our algorithm achieves a $K$-episode regret bound of order $widetildeO(m3/2 K1/2)$.
arXiv Detail & Related papers (2020-11-03T08:48:31Z) - Logarithmic Regret for Adversarial Online Control [56.12283443161479]
We give the first algorithm with logarithmic regret for arbitrary adversarial disturbance sequences.
Our algorithm and analysis use a characterization for the offline control law to reduce the online control problem to (delayed) online learning.
arXiv Detail & Related papers (2020-02-29T06:29:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.