Model-Free Output Feedback Stabilization via Policy Gradient Methods
- URL: http://arxiv.org/abs/2601.19284v2
- Date: Thu, 29 Jan 2026 08:15:47 GMT
- Title: Model-Free Output Feedback Stabilization via Policy Gradient Methods
- Authors: Ankang Zhang, Ming Chi, Xiaoling Wang, Lintao Ye,
- Abstract summary: We take a step towards model-free learning for partially observable linear dynamical systems with output feedback.<n>We propose an algorithmic framework that stretches the boundary of PG methods to the problem without global convergence guarantees.<n>We show that by leveraging zeroth-order PG update based on system trajectories and its convergence to stationary points, the proposed algorithms return a stabilizing output feedback policy.
- Score: 20.783658838849426
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Stabilizing a dynamical system is a fundamental problem that serves as a cornerstone for many complex tasks in the field of control systems. The problem becomes challenging when the system model is unknown. Among the Reinforcement Learning (RL) algorithms that have been successfully applied to solve problems pertaining to unknown linear dynamical systems, the policy gradient (PG) method stands out due to its ease of implementation and can solve the problem in a model-free manner. However, most of the existing works on PG methods for unknown linear dynamical systems assume full-state feedback. In this paper, we take a step towards model-free learning for partially observable linear dynamical systems with output feedback and focus on the fundamental stabilization problem of the system. We propose an algorithmic framework that stretches the boundary of PG methods to the problem without global convergence guarantees. We show that by leveraging zeroth-order PG update based on system trajectories and its convergence to stationary points, the proposed algorithms return a stabilizing output feedback policy for discrete-time linear dynamical systems. We also explicitly characterize the sample complexity of our algorithm and verify the effectiveness of the algorithm using numerical examples.
Related papers
- Stochastic Reinforcement Learning with Stability Guarantees for Control of Unknown Nonlinear Systems [6.571209126567701]
We propose a reinforcement learning algorithm that stabilizes the system by learning a local linear representation ofthe dynamics.
We demonstrate the effectiveness of our algorithm on several challenging high-dimensional dynamical systems.
arXiv Detail & Related papers (2024-09-12T20:07:54Z) - The Power of Learned Locally Linear Models for Nonlinear Policy
Optimization [26.45568696453259]
This paper conducts a rigorous analysis of a simplified variant of this strategy for general nonlinear systems.
We analyze an algorithm which iterates between estimating local linear models of nonlinear system dynamics and performing $mathttiLQR$-like policy updates.
arXiv Detail & Related papers (2023-05-16T17:13:00Z) - A Priori Denoising Strategies for Sparse Identification of Nonlinear
Dynamical Systems: A Comparative Study [68.8204255655161]
We investigate and compare the performance of several local and global smoothing techniques to a priori denoise the state measurements.
We show that, in general, global methods, which use the entire measurement data set, outperform local methods, which employ a neighboring data subset around a local point.
arXiv Detail & Related papers (2022-01-29T23:31:25Z) - Learning over All Stabilizing Nonlinear Controllers for a
Partially-Observed Linear System [4.3012765978447565]
We propose a parameterization of nonlinear output feedback controllers for linear dynamical systems.
Our approach guarantees the closed-loop stability of partially observable linear dynamical systems without requiring any constraints to be satisfied.
arXiv Detail & Related papers (2021-12-08T10:43:47Z) - Stabilizing Dynamical Systems via Policy Gradient Methods [32.88312419270879]
We provide a model-free algorithm for stabilizing fully observed dynamical systems.
We prove that this method efficiently recovers a stabilizing controller for linear systems.
We empirically evaluate the effectiveness of our approach on common control benchmarks.
arXiv Detail & Related papers (2021-10-13T00:58:57Z) - Supervised DKRC with Images for Offline System Identification [77.34726150561087]
Modern dynamical systems are becoming increasingly non-linear and complex.
There is a need for a framework to model these systems in a compact and comprehensive representation for prediction and control.
Our approach learns these basis functions using a supervised learning approach.
arXiv Detail & Related papers (2021-09-06T04:39:06Z) - Gaussian Process-based Min-norm Stabilizing Controller for
Control-Affine Systems with Uncertain Input Effects and Dynamics [90.81186513537777]
We propose a novel compound kernel that captures the control-affine nature of the problem.
We show that this resulting optimization problem is convex, and we call it Gaussian Process-based Control Lyapunov Function Second-Order Cone Program (GP-CLF-SOCP)
arXiv Detail & Related papers (2020-11-14T01:27:32Z) - Reinforcement Learning with Fast Stabilization in Linear Dynamical
Systems [91.43582419264763]
We study model-based reinforcement learning (RL) in unknown stabilizable linear dynamical systems.
We propose an algorithm that certifies fast stabilization of the underlying system by effectively exploring the environment.
We show that the proposed algorithm attains $tildemathcalO(sqrtT)$ regret after $T$ time steps of agent-environment interaction.
arXiv Detail & Related papers (2020-07-23T23:06:40Z) - Active Learning for Nonlinear System Identification with Guarantees [102.43355665393067]
We study a class of nonlinear dynamical systems whose state transitions depend linearly on a known feature embedding of state-action pairs.
We propose an active learning approach that achieves this by repeating three steps: trajectory planning, trajectory tracking, and re-estimation of the system from all available data.
We show that our method estimates nonlinear dynamical systems at a parametric rate, similar to the statistical rate of standard linear regression.
arXiv Detail & Related papers (2020-06-18T04:54:11Z) - Adaptive Control and Regret Minimization in Linear Quadratic Gaussian
(LQG) Setting [91.43582419264763]
We propose LqgOpt, a novel reinforcement learning algorithm based on the principle of optimism in the face of uncertainty.
LqgOpt efficiently explores the system dynamics, estimates the model parameters up to their confidence interval, and deploys the controller of the most optimistic model.
arXiv Detail & Related papers (2020-03-12T19:56:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.