Related papers: Learning the Linear Quadratic Regulator from Nonlinear Observations

Learning the Linear Quadratic Regulator from Nonlinear Observations

URL: http://arxiv.org/abs/2010.03799v1
Date: Thu, 8 Oct 2020 07:02:47 GMT
Title: Learning the Linear Quadratic Regulator from Nonlinear Observations
Authors: Zakaria Mhammedi and Dylan J. Foster and Max Simchowitz and Dipendra Misra and Wen Sun and Akshay Krishnamurthy and Alexander Rakhlin and John Langford
Abstract summary: We introduce a new problem setting for continuous control called the LQR with Rich Observations, or RichLQR. In our setting, the environment is summarized by a low-dimensional continuous latent state with linear dynamics and quadratic costs. Our results constitute the first provable sample complexity guarantee for continuous control with an unknown nonlinearity in the system model and general function approximation.
Score: 135.66883119468707
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We introduce a new problem setting for continuous control called the LQR with Rich Observations, or RichLQR. In our setting, the environment is summarized by a low-dimensional continuous latent state with linear dynamics and quadratic costs, but the agent operates on high-dimensional, nonlinear observations such as images from a camera. To enable sample-efficient learning, we assume that the learner has access to a class of decoder functions (e.g., neural networks) that is flexible enough to capture the mapping from observations to latent states. We introduce a new algorithm, RichID, which learns a near-optimal policy for the RichLQR with sample complexity scaling only with the dimension of the latent state space and the capacity of the decoder function class. RichID is oracle-efficient and accesses the decoder class only through calls to a least-squares regression oracle. Our results constitute the first provable sample complexity guarantee for continuous control with an unknown nonlinearity in the system model and general function approximation.

Related papers

Neural Lyapunov Function Approximation with Self-Supervised Reinforcement Learning [6.359354545489252]
This paper presents a novel, sample-efficient method for neural approximation of nonlinear Lyapunov functions. The proposed approach employs a data-driven World Model to train Lyapunov functions from off-policy trajectories. The method is validated on both standard and goal-conditioned robotic tasks, demonstrating faster convergence and higher approximation accuracy.
arXiv Detail & Related papers (2025-03-19T18:29:25Z)
Learning Koopman-based Stability Certificates for Unknown Nonlinear Systems [4.2162963332651575]
We propose an algorithmic framework to simultaneously learn the vector field and Lyapunov functions for unknown nonlinear systems. We show that the learned Lyapunov functions can be formally verified using a satisfiability modulo theories (SMT) solver.
arXiv Detail & Related papers (2024-12-03T20:18:24Z)
How Feature Learning Can Improve Neural Scaling Laws [86.9540615081759]
We develop a solvable model of neural scaling laws beyond the kernel limit. We show how performance scales with model size, training time, and the total amount of available data.
arXiv Detail & Related papers (2024-09-26T14:05:32Z)
Rich-Observation Reinforcement Learning with Continuous Latent Dynamics [43.84391209459658]
We introduce a new theoretical framework, RichCLD (Rich-Observation RL with Continuous Latent Dynamics), in which the agent performs control based on high-dimensional observations. Our main contribution is a new algorithm for this setting that is provably statistically and computationally efficient.
arXiv Detail & Related papers (2024-05-29T17:02:49Z)
Offline RL via Feature-Occupancy Gradient Ascent [9.983014605039658]
We study offline Reinforcement Learning in large infinite-horizon discounted Markov Decision Processes (MDPs) We develop a new algorithm that performs a form of gradient ascent in the space of feature occupancies. We show that the resulting simple algorithm satisfies strong computational and sample complexity guarantees.
arXiv Detail & Related papers (2024-05-22T15:39:05Z)
Stabilizing Q-learning with Linear Architectures for Provably Efficient Learning [53.17258888552998]
This work proposes an exploration variant of the basic $Q$-learning protocol with linear function approximation. We show that the performance of the algorithm degrades very gracefully under a novel and more permissive notion of approximation error.
arXiv Detail & Related papers (2022-06-01T23:26:51Z)
The merged-staircase property: a necessary and nearly sufficient condition for SGD learning of sparse functions on two-layer neural networks [19.899987851661354]
We study SGD-learnability with $O(d)$ sample complexity in a large ambient dimension. Our main results characterize a hierarchical property, the "merged-staircase property", that is both necessary and nearly sufficient for learning in this setting. Key tools are a new "dimension-free" dynamics approximation that applies to functions defined on a latent low-dimensional subspace.
arXiv Detail & Related papers (2022-02-17T13:43:06Z)
Smoothed Online Learning is as Easy as Statistical Learning [77.00766067963195]
We provide the first oracle-efficient, no-regret algorithms in this setting. We show that if a function class is learnable in the classical setting, then there is an oracle-efficient, no-regret algorithm for contextual bandits.
arXiv Detail & Related papers (2022-02-09T19:22:34Z)
Neural Active Learning with Performance Guarantees [37.16062387461106]
We investigate the problem of active learning in the streaming setting in non-parametric regimes, where the labels are generated from a class of functions on which we make no assumptions whatsoever. We rely on recently proposed Neural Tangent Kernel (NTK) approximation tools to construct a suitable neural embedding that determines the feature space the algorithm operates on and the learned model computed atop.
arXiv Detail & Related papers (2021-06-06T20:44:23Z)
Sample-Efficient Reinforcement Learning Is Feasible for Linearly Realizable MDPs with Limited Revisiting [60.98700344526674]
Low-complexity models such as linear function representation play a pivotal role in enabling sample-efficient reinforcement learning. In this paper, we investigate a new sampling protocol, which draws samples in an online/exploratory fashion but allows one to backtrack and revisit previous states in a controlled and infrequent manner. We develop an algorithm tailored to this setting, achieving a sample complexity that scales practicallyly with the feature dimension, the horizon, and the inverse sub-optimality gap, but not the size of the state/action space.
arXiv Detail & Related papers (2021-05-17T17:22:07Z)
Derivative-Free Policy Optimization for Risk-Sensitive and Robust Control Design: Implicit Regularization and Sample Complexity [15.940861063732608]
Direct policy search serves as one of the workhorses in modern reinforcement learning (RL) We investigate the convergence theory of policy robustness (PG) methods for the linear risk-sensitive and robust controller. One feature of our algorithms is that during the learning phase, a certain level complexity/risk-sensitivity controller is preserved.
arXiv Detail & Related papers (2021-01-04T16:00:46Z)
Adaptive Control and Regret Minimization in Linear Quadratic Gaussian (LQG) Setting [91.43582419264763]
We propose LqgOpt, a novel reinforcement learning algorithm based on the principle of optimism in the face of uncertainty. LqgOpt efficiently explores the system dynamics, estimates the model parameters up to their confidence interval, and deploys the controller of the most optimistic model.
arXiv Detail & Related papers (2020-03-12T19:56:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.