Episodic Linear Quadratic Regulators with Low-rank Transitions
- URL: http://arxiv.org/abs/2011.01568v2
- Date: Fri, 12 Feb 2021 04:40:14 GMT
- Title: Episodic Linear Quadratic Regulators with Low-rank Transitions
- Authors: Tianyu Wang, Lin F. Yang
- Abstract summary: We propose an algorithm that utilizes the intrinsic system low-rank structure for efficient learning.
Our algorithm achieves a $K$-episode regret bound of order $widetildeO(m3/2 K1/2)$.
- Score: 31.8243883890202
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Linear Quadratic Regulators (LQR) achieve enormous successful real-world
applications. Very recently, people have been focusing on efficient learning
algorithms for LQRs when their dynamics are unknown. Existing results
effectively learn to control the unknown system using number of episodes
depending polynomially on the system parameters, including the ambient
dimension of the states. These traditional approaches, however, become
inefficient in common scenarios, e.g., when the states are high-resolution
images. In this paper, we propose an algorithm that utilizes the intrinsic
system low-rank structure for efficient learning. For problems of rank-$m$, our
algorithm achieves a $K$-episode regret bound of order $\widetilde{O}(m^{3/2}
K^{1/2})$. Consequently, the sample complexity of our algorithm only depends on
the rank, $m$, rather than the ambient dimension, $d$, which can be
orders-of-magnitude larger.
Related papers
- Learning to Relax: Setting Solver Parameters Across a Sequence of Linear System Instances [42.16343861129583]
We show that a bandit online learning algorithm can select parameters for a sequence of instances such that the overall cost approaches that of the best fixed $omega$ as the sequence length increases.
Our work provides the first learning-theoretic treatment of high-precision linear system solvers and the first end-to-end guarantees for data-driven scientific computing.
arXiv Detail & Related papers (2023-10-03T17:51:42Z) - Efficiently Learning One-Hidden-Layer ReLU Networks via Schur
Polynomials [50.90125395570797]
We study the problem of PAC learning a linear combination of $k$ ReLU activations under the standard Gaussian distribution on $mathbbRd$ with respect to the square loss.
Our main result is an efficient algorithm for this learning task with sample and computational complexity $(dk/epsilon)O(k)$, whereepsilon>0$ is the target accuracy.
arXiv Detail & Related papers (2023-07-24T14:37:22Z) - Provably Efficient Reinforcement Learning via Surprise Bound [66.15308700413814]
We propose a provably efficient reinforcement learning algorithm (both computationally and statistically) with general value function approximations.
Our algorithm achieves reasonable regret bounds when applied to both the linear setting and the sparse high-dimensional linear setting.
arXiv Detail & Related papers (2023-02-22T20:21:25Z) - Refined Regret for Adversarial MDPs with Linear Function Approximation [50.00022394876222]
We consider learning in an adversarial Decision Process (MDP) where the loss functions can change arbitrarily over $K$ episodes.
This paper provides two algorithms that improve the regret to $tildemathcal O(K2/3)$ in the same setting.
arXiv Detail & Related papers (2023-01-30T14:37:21Z) - Adaptive Federated Minimax Optimization with Lower Complexities [82.51223883622552]
We propose an efficient adaptive minimax optimization algorithm (i.e., AdaFGDA) to solve these minimax problems.
It builds our momentum-based reduced and localSGD techniques, and it flexibly incorporate various adaptive learning rates.
arXiv Detail & Related papers (2022-11-14T12:32:18Z) - Computationally Efficient Horizon-Free Reinforcement Learning for Linear
Mixture MDPs [111.75736569611159]
We propose the first computationally efficient horizon-free algorithm for linear mixture MDPs.
Our algorithm adapts a weighted least square estimator for the unknown transitional dynamic.
This also improves upon the best-known algorithms in this setting when $sigma_k2$'s are known.
arXiv Detail & Related papers (2022-05-23T17:59:18Z) - Dynamic Regret Minimization for Control of Non-stationary Linear
Dynamical Systems [18.783925692307054]
We present an algorithm that achieves the optimal dynamic regret of $tildemathcalO(sqrtST)$ where $S$ is the number of switches.
The crux of our algorithm is an adaptive non-stationarity detection strategy, which builds on an approach recently developed for contextual Multi-armed Bandit problems.
arXiv Detail & Related papers (2021-11-06T01:30:51Z) - Scalable regret for learning to control network-coupled subsystems with
unknown dynamics [5.670584589057048]
Viewing the interconnected subsystems globally results in a regret that increases super-linearly with the number of subsystems.
We propose a new Thompson sampling based learning algorithm which exploits the structure of the underlying network.
We show that the expected regret of the proposed algorithm is bounded by $tildemathcalO big( n sqrtT big)$ where $n$ is the number of subsystems, $T$ is the time horizon and the $tildemathcalO(cdot)$ notation hides logarithmic terms in $n
arXiv Detail & Related papers (2021-08-18T04:45:34Z) - Linear-Time Gromov Wasserstein Distances using Low Rank Couplings and
Costs [45.87981728307819]
The ability to compare and align related datasets living in heterogeneous spaces plays an increasingly important role in machine learning.
The Gromov-Wasserstein (GW) formalism can help tackle this problem.
arXiv Detail & Related papers (2021-06-02T12:50:56Z) - Towards Optimally Efficient Tree Search with Deep Learning [76.64632985696237]
This paper investigates the classical integer least-squares problem which estimates signals integer from linear models.
The problem is NP-hard and often arises in diverse applications such as signal processing, bioinformatics, communications and machine learning.
We propose a general hyper-accelerated tree search (HATS) algorithm by employing a deep neural network to estimate the optimal estimation for the underlying simplified memory-bounded A* algorithm.
arXiv Detail & Related papers (2021-01-07T08:00:02Z) - Logarithmic Regret for Learning Linear Quadratic Regulators Efficiently [0.0]
Recent results have demonstrated efficient learning algorithms with regret growing with the square root of the number of decision steps.
We present new efficient algorithms that achieve, perhaps surprisingly, regret that scales only (poly)logarithmically with the number of steps.
arXiv Detail & Related papers (2020-02-19T10:09:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.