Convergence and sample complexity of gradient methods for the model-free
linear quadratic regulator problem
- URL: http://arxiv.org/abs/1912.11899v3
- Date: Mon, 15 Mar 2021 18:45:23 GMT
- Title: Convergence and sample complexity of gradient methods for the model-free
linear quadratic regulator problem
- Authors: Hesameddin Mohammadi, Armin Zare, Mahdi Soltanolkotabi, Mihailo R.
Jovanovi\'c
- Abstract summary: We show that ODE searches for optimal control for an unknown computation system by directly searching over the corresponding space of controllers.
We take a step towards demystifying the performance and efficiency of such methods by focusing on the gradient-flow dynamics set of stabilizing feedback gains and a similar result holds for the forward disctization of the ODE.
- Score: 27.09339991866556
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Model-free reinforcement learning attempts to find an optimal control action
for an unknown dynamical system by directly searching over the parameter space
of controllers. The convergence behavior and statistical properties of these
approaches are often poorly understood because of the nonconvex nature of the
underlying optimization problems and the lack of exact gradient computation. In
this paper, we take a step towards demystifying the performance and efficiency
of such methods by focusing on the standard infinite-horizon linear quadratic
regulator problem for continuous-time systems with unknown state-space
parameters. We establish exponential stability for the ordinary differential
equation (ODE) that governs the gradient-flow dynamics over the set of
stabilizing feedback gains and show that a similar result holds for the
gradient descent method that arises from the forward Euler discretization of
the corresponding ODE. We also provide theoretical bounds on the convergence
rate and sample complexity of the random search method with two-point gradient
estimates. We prove that the required simulation time for achieving
$\epsilon$-accuracy in the model-free setup and the total number of function
evaluations both scale as $\log \, (1/\epsilon)$.
Related papers
- Trust-Region Sequential Quadratic Programming for Stochastic Optimization with Random Models [57.52124921268249]
We propose a Trust Sequential Quadratic Programming method to find both first and second-order stationary points.
To converge to first-order stationary points, our method computes a gradient step in each iteration defined by minimizing a approximation of the objective subject.
To converge to second-order stationary points, our method additionally computes an eigen step to explore the negative curvature the reduced Hessian matrix.
arXiv Detail & Related papers (2024-09-24T04:39:47Z) - Harmonic Path Integral Diffusion [0.4527270266697462]
We present a novel approach for sampling from a continuous multivariate probability distribution, which may either be explicitly known (up to a normalization factor) or represented via empirical samples.
Our method constructs a time-dependent bridge from a delta function centered at the origin of the state space at $t=0$, transforming it into the target distribution at $t=1$.
We contrast these algorithms with other sampling methods, particularly simulated and path integral sampling, highlighting their advantages in terms of analytical control, accuracy, and computational efficiency.
arXiv Detail & Related papers (2024-09-23T16:20:21Z) - Efficient Sampling for Data-Driven Frequency Stability Constraint via Forward-Mode Automatic Differentiation [5.603382086370097]
We propose a gradient-based data generation method via forward-mode automatic differentiation.
In this method, the original dynamic system is augmented with new states that represent the dynamic of sensitivities of the original states.
We demonstrate the superior performance of the proposed sampling algorithm, compared with the unrolling differentiation and finite difference.
arXiv Detail & Related papers (2024-07-21T03:50:11Z) - Hybrid algorithm simulating non-equilibrium steady states of an open
quantum system [10.752869788647802]
Non-equilibrium steady states are a focal point of research in the study of open quantum systems.
Previous variational algorithms for searching these steady states have suffered from resource-intensive implementations.
We present a novel variational quantum algorithm that efficiently searches for non-equilibrium steady states by simulating the operator-sum form of the Lindblad equation.
arXiv Detail & Related papers (2023-09-13T01:57:27Z) - An Optimization-based Deep Equilibrium Model for Hyperspectral Image
Deconvolution with Convergence Guarantees [71.57324258813675]
We propose a novel methodology for addressing the hyperspectral image deconvolution problem.
A new optimization problem is formulated, leveraging a learnable regularizer in the form of a neural network.
The derived iterative solver is then expressed as a fixed-point calculation problem within the Deep Equilibrium framework.
arXiv Detail & Related papers (2023-06-10T08:25:16Z) - Constrained Optimization via Exact Augmented Lagrangian and Randomized
Iterative Sketching [55.28394191394675]
We develop an adaptive inexact Newton method for equality-constrained nonlinear, nonIBS optimization problems.
We demonstrate the superior performance of our method on benchmark nonlinear problems, constrained logistic regression with data from LVM, and a PDE-constrained problem.
arXiv Detail & Related papers (2023-05-28T06:33:37Z) - A Priori Denoising Strategies for Sparse Identification of Nonlinear
Dynamical Systems: A Comparative Study [68.8204255655161]
We investigate and compare the performance of several local and global smoothing techniques to a priori denoise the state measurements.
We show that, in general, global methods, which use the entire measurement data set, outperform local methods, which employ a neighboring data subset around a local point.
arXiv Detail & Related papers (2022-01-29T23:31:25Z) - Breaking the Convergence Barrier: Optimization via Fixed-Time Convergent
Flows [4.817429789586127]
We introduce a Poly-based optimization framework for achieving acceleration, based on the notion of fixed-time stability dynamical systems.
We validate the accelerated convergence properties of the proposed schemes on a range of numerical examples against the state-of-the-art optimization algorithms.
arXiv Detail & Related papers (2021-12-02T16:04:40Z) - On the Convergence of Stochastic Extragradient for Bilinear Games with
Restarted Iteration Averaging [96.13485146617322]
We present an analysis of the ExtraGradient (SEG) method with constant step size, and present variations of the method that yield favorable convergence.
We prove that when augmented with averaging, SEG provably converges to the Nash equilibrium, and such a rate is provably accelerated by incorporating a scheduled restarting procedure.
arXiv Detail & Related papers (2021-06-30T17:51:36Z) - Gaussian Process-based Min-norm Stabilizing Controller for
Control-Affine Systems with Uncertain Input Effects and Dynamics [90.81186513537777]
We propose a novel compound kernel that captures the control-affine nature of the problem.
We show that this resulting optimization problem is convex, and we call it Gaussian Process-based Control Lyapunov Function Second-Order Cone Program (GP-CLF-SOCP)
arXiv Detail & Related papers (2020-11-14T01:27:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.