The Power of Learned Locally Linear Models for Nonlinear Policy
Optimization
- URL: http://arxiv.org/abs/2305.09619v1
- Date: Tue, 16 May 2023 17:13:00 GMT
- Title: The Power of Learned Locally Linear Models for Nonlinear Policy
Optimization
- Authors: Daniel Pfrommer, Max Simchowitz, Tyler Westenbroek, Nikolai Matni,
Stephen Tu
- Abstract summary: This paper conducts a rigorous analysis of a simplified variant of this strategy for general nonlinear systems.
We analyze an algorithm which iterates between estimating local linear models of nonlinear system dynamics and performing $mathttiLQR$-like policy updates.
- Score: 26.45568696453259
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A common pipeline in learning-based control is to iteratively estimate a
model of system dynamics, and apply a trajectory optimization algorithm -
e.g.~$\mathtt{iLQR}$ - on the learned model to minimize a target cost. This
paper conducts a rigorous analysis of a simplified variant of this strategy for
general nonlinear systems. We analyze an algorithm which iterates between
estimating local linear models of nonlinear system dynamics and performing
$\mathtt{iLQR}$-like policy updates. We demonstrate that this algorithm attains
sample complexity polynomial in relevant problem parameters, and, by
synthesizing locally stabilizing gains, overcomes exponential dependence in
problem horizon. Experimental results validate the performance of our
algorithm, and compare to natural deep-learning baselines.
Related papers
- Model-Agnostic Zeroth-Order Policy Optimization for Meta-Learning of Ergodic Linear Quadratic Regulators [13.343937277604892]
We study the problem of using meta-learning to deal with uncertainty and heterogeneity in ergodic linear quadratic regulators.
We propose an algorithm that omits the estimation of policy Hessian, which applies to tasks of learning a set of heterogeneous but similar linear dynamic systems.
We provide a convergence result for the exact gradient descent process by analyzing the boundedness and smoothness of the gradient for the meta-objective.
arXiv Detail & Related papers (2024-05-27T17:26:36Z) - Neural ODEs as Feedback Policies for Nonlinear Optimal Control [1.8514606155611764]
We use Neural ordinary differential equations (Neural ODEs) to model continuous time dynamics as differential equations parametrized with neural networks.
We propose the use of a neural control policy posed as a Neural ODE to solve general nonlinear optimal control problems.
arXiv Detail & Related papers (2022-10-20T13:19:26Z) - A Priori Denoising Strategies for Sparse Identification of Nonlinear
Dynamical Systems: A Comparative Study [68.8204255655161]
We investigate and compare the performance of several local and global smoothing techniques to a priori denoise the state measurements.
We show that, in general, global methods, which use the entire measurement data set, outperform local methods, which employ a neighboring data subset around a local point.
arXiv Detail & Related papers (2022-01-29T23:31:25Z) - Fractal Structure and Generalization Properties of Stochastic
Optimization Algorithms [71.62575565990502]
We prove that the generalization error of an optimization algorithm can be bounded on the complexity' of the fractal structure that underlies its generalization measure.
We further specialize our results to specific problems (e.g., linear/logistic regression, one hidden/layered neural networks) and algorithms.
arXiv Detail & Related papers (2021-06-09T08:05:36Z) - COMBO: Conservative Offline Model-Based Policy Optimization [120.55713363569845]
Uncertainty estimation with complex models, such as deep neural networks, can be difficult and unreliable.
We develop a new model-based offline RL algorithm, COMBO, that regularizes the value function on out-of-support state-actions.
We find that COMBO consistently performs as well or better as compared to prior offline model-free and model-based methods.
arXiv Detail & Related papers (2021-02-16T18:50:32Z) - Learning Fast Approximations of Sparse Nonlinear Regression [50.00693981886832]
In this work, we bridge the gap by introducing the Threshold Learned Iterative Shrinkage Algorithming (NLISTA)
Experiments on synthetic data corroborate our theoretical results and show our method outperforms state-of-the-art methods.
arXiv Detail & Related papers (2020-10-26T11:31:08Z) - A polynomial-time algorithm for learning nonparametric causal graphs [18.739085486953698]
The analysis is model-free and does not assume linearity, additivity, independent noise, or faithfulness.
We impose a condition on the residual variances that is closely related to previous work on linear models with equal variances.
arXiv Detail & Related papers (2020-06-22T02:21:53Z) - The role of optimization geometry in single neuron learning [12.891722496444036]
Recent experiments have demonstrated the choice of optimization geometry can impact generalization performance when learning expressive neural model networks.
We show how the interplay between geometry and the feature geometry sets the out-of-sample leads and improves performance.
arXiv Detail & Related papers (2020-06-15T17:39:44Z) - Is Temporal Difference Learning Optimal? An Instance-Dependent Analysis [102.29671176698373]
We address the problem of policy evaluation in discounted decision processes, and provide Markov-dependent guarantees on the $ell_infty$error under a generative model.
We establish both and non-asymptotic versions of local minimax lower bounds for policy evaluation, thereby providing an instance-dependent baseline by which to compare algorithms.
arXiv Detail & Related papers (2020-03-16T17:15:28Z) - Adaptive Control and Regret Minimization in Linear Quadratic Gaussian
(LQG) Setting [91.43582419264763]
We propose LqgOpt, a novel reinforcement learning algorithm based on the principle of optimism in the face of uncertainty.
LqgOpt efficiently explores the system dynamics, estimates the model parameters up to their confidence interval, and deploys the controller of the most optimistic model.
arXiv Detail & Related papers (2020-03-12T19:56:38Z) - Local Policy Optimization for Trajectory-Centric Reinforcement Learning [31.495672846638346]
A lot of robotic manipulation tasks are trajectory-centric, and thus do not require a global model or policy.
We present a method for simultaneous trajectory and local stabilizing policy optimization to generate local policies for trajectory-centric model-based reinforcement learning.
arXiv Detail & Related papers (2020-01-22T15:56:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.