Optimal Control of Nonlinear Systems with Unknown Dynamics
- URL: http://arxiv.org/abs/2305.15188v2
- Date: Wed, 21 May 2025 21:30:51 GMT
- Title: Optimal Control of Nonlinear Systems with Unknown Dynamics
- Authors: Wenjian Hao, Paulo C. Heredia, Shaoshuai Mou,
- Abstract summary: This paper presents a data-driven method for finding a closed-loop optimal controller.<n>It minimizes a specified infinite-horizon cost function for systems with unknown dynamics given any arbitrary initial state.
- Score: 4.551160285910024
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: This paper presents a data-driven method for finding a closed-loop optimal controller, which minimizes a specified infinite-horizon cost function for systems with unknown dynamics given any arbitrary initial state. Suppose the closed-loop optimal controller can be parameterized by a given class of functions, hereafter referred to as the policy. The proposed method introduces a novel gradient estimation framework, which approximates the gradient of the cost function with respect to the policy parameters via integrating the Koopman operator with the classical concept of actor-critic. This enables the policy parameters to be tuned iteratively using gradient descent to achieve an optimal controller, leveraging the linearity of the Koopman operator. The convergence analysis of the proposed framework is provided. The effectiveness of the method is demonstrated through comparisons with a model-free reinforcement learning approach, and its control performance is further evaluated through simulations against model-based optimal control methods that solve the same optimal control problem utilizing the exact system dynamics.
Related papers
- RL-finetuning LLMs from on- and off-policy data with a single algorithm [53.70731390624718]
We introduce a novel reinforcement learning algorithm (AGRO) for fine-tuning large-language models.
AGRO leverages the concept of generation consistency, which states that the optimal policy satisfies the notion of consistency across any possible generation of the model.
We derive algorithms that find optimal solutions via the sample-based policy gradient and provide theoretical guarantees on their convergence.
arXiv Detail & Related papers (2025-03-25T12:52:38Z) - Understanding Inverse Reinforcement Learning under Overparameterization: Non-Asymptotic Analysis and Global Optimality [52.906438147288256]
We show that our algorithm can identify the globally optimal reward and policy under certain neural network structures.
This is the first IRL algorithm with a non-asymptotic convergence guarantee that provably achieves global optimality.
arXiv Detail & Related papers (2025-03-22T21:16:08Z) - Optimal Policy Adaptation under Covariate Shift [15.703626346971182]
We propose principled approaches for learning the optimal policy in the target domain by leveraging two datasets.
We derive the identifiability assumptions for the reward induced by a given policy.
We then learn the optimal policy by optimizing the estimated reward.
arXiv Detail & Related papers (2025-01-14T12:33:02Z) - Deterministic Trajectory Optimization through Probabilistic Optimal Control [3.2771631221674333]
We propose two new algorithms for discrete-time deterministic finite-horizon nonlinear optimal control problems.
Both algorithms are inspired by a novel theoretical paradigm known as probabilistic optimal control.
We show that the application of this algorithm results in a fixed point of probabilistic policies that converge to the deterministic optimal policy.
arXiv Detail & Related papers (2024-07-18T09:17:47Z) - Risk-Sensitive Soft Actor-Critic for Robust Deep Reinforcement Learning
under Distribution Shifts [11.765000124617186]
We study the robustness of deep reinforcement learning algorithms against distribution shifts within contextual multi-stage optimization problems.
We show that our algorithm is superior to risk-neutral Soft Actor-Critic as well as to two benchmark approaches for robust deep reinforcement learning.
arXiv Detail & Related papers (2024-02-15T14:55:38Z) - Probabilistic Reach-Avoid for Bayesian Neural Networks [71.67052234622781]
We show that an optimal synthesis algorithm can provide more than a four-fold increase in the number of certifiable states.
The algorithm is able to provide more than a three-fold increase in the average guaranteed reach-avoid probability.
arXiv Detail & Related papers (2023-10-03T10:52:21Z) - Acceleration in Policy Optimization [50.323182853069184]
We work towards a unifying paradigm for accelerating policy optimization methods in reinforcement learning (RL) by integrating foresight in the policy improvement step via optimistic and adaptive updates.
We define optimism as predictive modelling of the future behavior of a policy, and adaptivity as taking immediate and anticipatory corrective actions to mitigate errors from overshooting predictions or delayed responses to change.
We design an optimistic policy gradient algorithm, adaptive via meta-gradient learning, and empirically highlight several design choices pertaining to acceleration, in an illustrative task.
arXiv Detail & Related papers (2023-06-18T15:50:57Z) - Tuning Legged Locomotion Controllers via Safe Bayesian Optimization [47.87675010450171]
This paper presents a data-driven strategy to streamline the deployment of model-based controllers in legged robotic hardware platforms.
We leverage a model-free safe learning algorithm to automate the tuning of control gains, addressing the mismatch between the simplified model used in the control formulation and the real system.
arXiv Detail & Related papers (2023-06-12T13:10:14Z) - Introduction to Online Control [34.77535508151501]
In online nonstochastic control, both the cost functions as well as the perturbations from the assumed dynamical model are chosen by an adversary.<n>The target is to attain low regret against the best policy in hindsight from a benchmark class of policies.
arXiv Detail & Related papers (2022-11-17T16:12:45Z) - Neural ODEs as Feedback Policies for Nonlinear Optimal Control [1.8514606155611764]
We use Neural ordinary differential equations (Neural ODEs) to model continuous time dynamics as differential equations parametrized with neural networks.
We propose the use of a neural control policy posed as a Neural ODE to solve general nonlinear optimal control problems.
arXiv Detail & Related papers (2022-10-20T13:19:26Z) - Multi-Objective Policy Gradients with Topological Constraints [108.10241442630289]
We present a new algorithm for a policy gradient in TMDPs by a simple extension of the proximal policy optimization (PPO) algorithm.
We demonstrate this on a real-world multiple-objective navigation problem with an arbitrary ordering of objectives both in simulation and on a real robot.
arXiv Detail & Related papers (2022-09-15T07:22:58Z) - Learning Stochastic Parametric Differentiable Predictive Control
Policies [2.042924346801313]
We present a scalable alternative called parametric differentiable predictive control (SP-DPC) for unsupervised learning of neural control policies.
SP-DPC is formulated as a deterministic approximation to the parametric constrained optimal control problem.
We provide theoretical probabilistic guarantees for policies learned via the SP-DPC method on closed-loop constraints and chance satisfaction.
arXiv Detail & Related papers (2022-03-02T22:46:32Z) - Amortized Implicit Differentiation for Stochastic Bilevel Optimization [53.12363770169761]
We study a class of algorithms for solving bilevel optimization problems in both deterministic and deterministic settings.
We exploit a warm-start strategy to amortize the estimation of the exact gradient.
By using this framework, our analysis shows these algorithms to match the computational complexity of methods that have access to an unbiased estimate of the gradient.
arXiv Detail & Related papers (2021-11-29T15:10:09Z) - Learning Stochastic Optimal Policies via Gradient Descent [17.9807134122734]
We systematically develop a learning-based treatment of optimal control (SOC)
We propose a derivation of adjoint sensitivity results for differential equations through direct application of variational calculus.
We verify the performance of the proposed approach on a continuous-time, finite horizon portfolio optimization with proportional transaction costs.
arXiv Detail & Related papers (2021-06-07T16:43:07Z) - Improper Learning with Gradient-based Policy Optimization [62.50997487685586]
We consider an improper reinforcement learning setting where the learner is given M base controllers for an unknown Markov Decision Process.
We propose a gradient-based approach that operates over a class of improper mixtures of the controllers.
arXiv Detail & Related papers (2021-02-16T14:53:55Z) - Safe and Efficient Model-free Adaptive Control via Bayesian Optimization [39.962395119933596]
We propose a purely data-driven, model-free approach for adaptive control.
tuning low-level controllers based solely on system data raises concerns on the underlying algorithm safety and computational performance.
We numerically demonstrate for several types of disturbances that our approach is sample efficient, outperforms constrained Bayesian optimization in terms of safety, and achieves the performance optima computed by grid evaluation.
arXiv Detail & Related papers (2021-01-19T19:15:00Z) - Average-Reward Off-Policy Policy Evaluation with Function Approximation [66.67075551933438]
We consider off-policy policy evaluation with function approximation in average-reward MDPs.
bootstrapping is necessary and, along with off-policy learning and FA, results in the deadly triad.
We propose two novel algorithms, reproducing the celebrated success of Gradient TD algorithms in the average-reward setting.
arXiv Detail & Related papers (2021-01-08T00:43:04Z) - Risk-Sensitive Deep RL: Variance-Constrained Actor-Critic Provably Finds
Globally Optimal Policy [95.98698822755227]
We make the first attempt to study risk-sensitive deep reinforcement learning under the average reward setting with the variance risk criteria.
We propose an actor-critic algorithm that iteratively and efficiently updates the policy, the Lagrange multiplier, and the Fenchel dual variable.
arXiv Detail & Related papers (2020-12-28T05:02:26Z) - Gaussian Process-based Min-norm Stabilizing Controller for
Control-Affine Systems with Uncertain Input Effects and Dynamics [90.81186513537777]
We propose a novel compound kernel that captures the control-affine nature of the problem.
We show that this resulting optimization problem is convex, and we call it Gaussian Process-based Control Lyapunov Function Second-Order Cone Program (GP-CLF-SOCP)
arXiv Detail & Related papers (2020-11-14T01:27:32Z) - Logistic Q-Learning [87.00813469969167]
We propose a new reinforcement learning algorithm derived from a regularized linear-programming formulation of optimal control in MDPs.
The main feature of our algorithm is a convex loss function for policy evaluation that serves as a theoretically sound alternative to the widely used squared Bellman error.
arXiv Detail & Related papers (2020-10-21T17:14:31Z) - On the implementation of a global optimization method for mixed-variable
problems [0.30458514384586394]
The algorithm is based on the radial basis function of Gutmann and the metric response surface method of Regis and Shoemaker.
We propose several modifications aimed at generalizing and improving these two algorithms.
arXiv Detail & Related papers (2020-09-04T13:36:56Z) - Learning Control Barrier Functions from Expert Demonstrations [69.23675822701357]
We propose a learning based approach to safe controller synthesis based on control barrier functions (CBFs)
We analyze an optimization-based approach to learning a CBF that enjoys provable safety guarantees under suitable Lipschitz assumptions on the underlying dynamical system.
To the best of our knowledge, these are the first results that learn provably safe control barrier functions from data.
arXiv Detail & Related papers (2020-04-07T12:29:06Z) - Adaptive Control and Regret Minimization in Linear Quadratic Gaussian
(LQG) Setting [91.43582419264763]
We propose LqgOpt, a novel reinforcement learning algorithm based on the principle of optimism in the face of uncertainty.
LqgOpt efficiently explores the system dynamics, estimates the model parameters up to their confidence interval, and deploys the controller of the most optimistic model.
arXiv Detail & Related papers (2020-03-12T19:56:38Z) - Population-Guided Parallel Policy Search for Reinforcement Learning [17.360163137926]
A new population-guided parallel learning scheme is proposed to enhance the performance of off-policy reinforcement learning (RL)
In the proposed scheme, multiple identical learners with their own value-functions and policies share a common experience replay buffer, and search a good policy in collaboration with the guidance of the best policy information.
arXiv Detail & Related papers (2020-01-09T10:13:57Z) - A Nonparametric Off-Policy Policy Gradient [32.35604597324448]
Reinforcement learning (RL) algorithms still suffer from high sample complexity despite outstanding recent successes.
We build on the general sample efficiency of off-policy algorithms.
We show that our approach has better sample efficiency than state-of-the-art policy gradient methods.
arXiv Detail & Related papers (2020-01-08T10:13:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.