Implicit Bias of Policy Gradient in Linear Quadratic Control: Extrapolation to Unseen Initial States
- URL: http://arxiv.org/abs/2402.07875v2
- Date: Sat, 1 Jun 2024 18:17:12 GMT
- Title: Implicit Bias of Policy Gradient in Linear Quadratic Control: Extrapolation to Unseen Initial States
- Authors: Noam Razin, Yotam Alexander, Edo Cohen-Karlik, Raja Giryes, Amir Globerson, Nadav Cohen,
- Abstract summary: gradient descent frequently exhibits an implicit bias that leads to excellent performance on unseen data.
This paper theoretically studies the implicit bias of policy gradient in terms of extrapolation to unseen initial states.
- Score: 52.56827348431552
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In modern machine learning, models can often fit training data in numerous ways, some of which perform well on unseen (test) data, while others do not. Remarkably, in such cases gradient descent frequently exhibits an implicit bias that leads to excellent performance on unseen data. This implicit bias was extensively studied in supervised learning, but is far less understood in optimal control (reinforcement learning). There, learning a controller applied to a system via gradient descent is known as policy gradient, and a question of prime importance is the extent to which a learned controller extrapolates to unseen initial states. This paper theoretically studies the implicit bias of policy gradient in terms of extrapolation to unseen initial states. Focusing on the fundamental Linear Quadratic Regulator (LQR) problem, we establish that the extent of extrapolation depends on the degree of exploration induced by the system when commencing from initial states included in training. Experiments corroborate our theory, and demonstrate its conclusions on problems beyond LQR, where systems are non-linear and controllers are neural networks. We hypothesize that real-world optimal control may be greatly improved by developing methods for informed selection of initial states to train on.
Related papers
- Pessimistic Causal Reinforcement Learning with Mediators for Confounded Offline Data [17.991833729722288]
We propose a novel policy learning algorithm, PESsimistic CAusal Learning (PESCAL)
Our key observation is that, by incorporating auxiliary variables that mediate the effect of actions on system dynamics, it is sufficient to learn a lower bound of the mediator distribution function, instead of the Q-function.
We provide theoretical guarantees for the algorithms we propose, and demonstrate their efficacy through simulations, as well as real-world experiments utilizing offline datasets from a leading ride-hailing platform.
arXiv Detail & Related papers (2024-03-18T14:51:19Z) - Deep active learning for nonlinear system identification [0.4485566425014746]
We develop a novel deep active learning acquisition scheme for nonlinear system identification.
Global exploration acquires a batch of initial states corresponding to the most informative state-action trajectories.
Local exploration solves an optimal control problem, finding the control trajectory that maximizes some measure of information.
arXiv Detail & Related papers (2023-02-24T14:46:36Z) - Physics-Informed Kernel Embeddings: Integrating Prior System Knowledge
with Data-Driven Control [22.549914935697366]
We present a method to incorporate priori knowledge into data-driven control algorithms using kernel embeddings.
Our proposed approach incorporates prior knowledge of the system dynamics as a bias term in the kernel learning problem.
We demonstrate the improved sample efficiency and out-of-sample generalization of our approach over a purely data-driven baseline.
arXiv Detail & Related papers (2023-01-09T18:35:32Z) - Testing Stationarity and Change Point Detection in Reinforcement
Learning [10.343546104340962]
We develop a consistent procedure to test the nonstationarity of the optimal Q-function based on pre-collected historical data.
We further develop a sequential change point detection method that can be naturally coupled with existing state-of-the-art RL methods for policy optimization in nonstationary environments.
arXiv Detail & Related papers (2022-03-03T13:30:28Z) - Sparsity in Partially Controllable Linear Systems [56.142264865866636]
We study partially controllable linear dynamical systems specified by an underlying sparsity pattern.
Our results characterize those state variables which are irrelevant for optimal control.
arXiv Detail & Related papers (2021-10-12T16:41:47Z) - Deep learning: a statistical viewpoint [120.94133818355645]
Deep learning has revealed some major surprises from a theoretical perspective.
In particular, simple gradient methods easily find near-perfect solutions to non-optimal training problems.
We conjecture that specific principles underlie these phenomena.
arXiv Detail & Related papers (2021-03-16T16:26:36Z) - Direction Matters: On the Implicit Bias of Stochastic Gradient Descent
with Moderate Learning Rate [105.62979485062756]
This paper attempts to characterize the particular regularization effect of SGD in the moderate learning rate regime.
We show that SGD converges along the large eigenvalue directions of the data matrix, while GD goes after the small eigenvalue directions.
arXiv Detail & Related papers (2020-11-04T21:07:52Z) - Unbiased Risk Estimators Can Mislead: A Case Study of Learning with
Complementary Labels [92.98756432746482]
We study a weakly supervised problem called learning with complementary labels.
We show that the quality of gradient estimation matters more in risk minimization.
We propose a novel surrogate complementary loss(SCL) framework that trades zero bias with reduced variance.
arXiv Detail & Related papers (2020-07-05T04:19:37Z) - Uncovering the Underlying Physics of Degrading System Behavior Through a
Deep Neural Network Framework: The Case of Remaining Useful Life Prognosis [0.0]
We propose an open-box approach using a deep neural network framework to explore the physics of degradation.
The framework has three stages, and it aims to discover a latent variable and corresponding PDE to represent the health state of the system.
arXiv Detail & Related papers (2020-06-10T21:05:59Z) - How Training Data Impacts Performance in Learning-based Control [67.7875109298865]
This paper derives an analytical relationship between the density of the training data and the control performance.
We formulate a quality measure for the data set, which we refer to as $rho$-gap.
We show how the $rho$-gap can be applied to a feedback linearizing control law.
arXiv Detail & Related papers (2020-05-25T12:13:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.