Learning the Linear Quadratic Regulator from Nonlinear Observations
- URL: http://arxiv.org/abs/2010.03799v1
- Date: Thu, 8 Oct 2020 07:02:47 GMT
- Title: Learning the Linear Quadratic Regulator from Nonlinear Observations
- Authors: Zakaria Mhammedi and Dylan J. Foster and Max Simchowitz and Dipendra
Misra and Wen Sun and Akshay Krishnamurthy and Alexander Rakhlin and John
Langford
- Abstract summary: We introduce a new problem setting for continuous control called the LQR with Rich Observations, or RichLQR.
In our setting, the environment is summarized by a low-dimensional continuous latent state with linear dynamics and quadratic costs.
Our results constitute the first provable sample complexity guarantee for continuous control with an unknown nonlinearity in the system model and general function approximation.
- Score: 135.66883119468707
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce a new problem setting for continuous control called the LQR with
Rich Observations, or RichLQR. In our setting, the environment is summarized by
a low-dimensional continuous latent state with linear dynamics and quadratic
costs, but the agent operates on high-dimensional, nonlinear observations such
as images from a camera. To enable sample-efficient learning, we assume that
the learner has access to a class of decoder functions (e.g., neural networks)
that is flexible enough to capture the mapping from observations to latent
states. We introduce a new algorithm, RichID, which learns a near-optimal
policy for the RichLQR with sample complexity scaling only with the dimension
of the latent state space and the capacity of the decoder function class.
RichID is oracle-efficient and accesses the decoder class only through calls to
a least-squares regression oracle. Our results constitute the first provable
sample complexity guarantee for continuous control with an unknown nonlinearity
in the system model and general function approximation.
Related papers
- How Feature Learning Can Improve Neural Scaling Laws [86.9540615081759]
We develop a solvable model of neural scaling laws beyond the kernel limit.
We show how performance scales with model size, training time, and the total amount of available data.
arXiv Detail & Related papers (2024-09-26T14:05:32Z) - Rich-Observation Reinforcement Learning with Continuous Latent Dynamics [43.84391209459658]
We introduce a new theoretical framework, RichCLD (Rich-Observation RL with Continuous Latent Dynamics), in which the agent performs control based on high-dimensional observations.
Our main contribution is a new algorithm for this setting that is provably statistically and computationally efficient.
arXiv Detail & Related papers (2024-05-29T17:02:49Z) - Offline RL via Feature-Occupancy Gradient Ascent [9.983014605039658]
We study offline Reinforcement Learning in large infinite-horizon discounted Markov Decision Processes (MDPs)
We develop a new algorithm that performs a form of gradient ascent in the space of feature occupancies.
We show that the resulting simple algorithm satisfies strong computational and sample complexity guarantees.
arXiv Detail & Related papers (2024-05-22T15:39:05Z) - Stabilizing Q-learning with Linear Architectures for Provably Efficient
Learning [53.17258888552998]
This work proposes an exploration variant of the basic $Q$-learning protocol with linear function approximation.
We show that the performance of the algorithm degrades very gracefully under a novel and more permissive notion of approximation error.
arXiv Detail & Related papers (2022-06-01T23:26:51Z) - The merged-staircase property: a necessary and nearly sufficient condition for SGD learning of sparse functions on two-layer neural networks [19.899987851661354]
We study SGD-learnability with $O(d)$ sample complexity in a large ambient dimension.
Our main results characterize a hierarchical property, the "merged-staircase property", that is both necessary and nearly sufficient for learning in this setting.
Key tools are a new "dimension-free" dynamics approximation that applies to functions defined on a latent low-dimensional subspace.
arXiv Detail & Related papers (2022-02-17T13:43:06Z) - Smoothed Online Learning is as Easy as Statistical Learning [77.00766067963195]
We provide the first oracle-efficient, no-regret algorithms in this setting.
We show that if a function class is learnable in the classical setting, then there is an oracle-efficient, no-regret algorithm for contextual bandits.
arXiv Detail & Related papers (2022-02-09T19:22:34Z) - Neural Active Learning with Performance Guarantees [37.16062387461106]
We investigate the problem of active learning in the streaming setting in non-parametric regimes, where the labels are generated from a class of functions on which we make no assumptions whatsoever.
We rely on recently proposed Neural Tangent Kernel (NTK) approximation tools to construct a suitable neural embedding that determines the feature space the algorithm operates on and the learned model computed atop.
arXiv Detail & Related papers (2021-06-06T20:44:23Z) - Sample-Efficient Reinforcement Learning Is Feasible for Linearly
Realizable MDPs with Limited Revisiting [60.98700344526674]
Low-complexity models such as linear function representation play a pivotal role in enabling sample-efficient reinforcement learning.
In this paper, we investigate a new sampling protocol, which draws samples in an online/exploratory fashion but allows one to backtrack and revisit previous states in a controlled and infrequent manner.
We develop an algorithm tailored to this setting, achieving a sample complexity that scales practicallyly with the feature dimension, the horizon, and the inverse sub-optimality gap, but not the size of the state/action space.
arXiv Detail & Related papers (2021-05-17T17:22:07Z) - Derivative-Free Policy Optimization for Risk-Sensitive and Robust
Control Design: Implicit Regularization and Sample Complexity [15.940861063732608]
Direct policy search serves as one of the workhorses in modern reinforcement learning (RL)
We investigate the convergence theory of policy robustness (PG) methods for the linear risk-sensitive and robust controller.
One feature of our algorithms is that during the learning phase, a certain level complexity/risk-sensitivity controller is preserved.
arXiv Detail & Related papers (2021-01-04T16:00:46Z) - Adaptive Control and Regret Minimization in Linear Quadratic Gaussian
(LQG) Setting [91.43582419264763]
We propose LqgOpt, a novel reinforcement learning algorithm based on the principle of optimism in the face of uncertainty.
LqgOpt efficiently explores the system dynamics, estimates the model parameters up to their confidence interval, and deploys the controller of the most optimistic model.
arXiv Detail & Related papers (2020-03-12T19:56:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.