Deep $\mathcal{L}^1$ Stochastic Optimal Control Policies for Planetary
Soft-landing
- URL: http://arxiv.org/abs/2109.00183v1
- Date: Wed, 1 Sep 2021 04:28:38 GMT
- Title: Deep $\mathcal{L}^1$ Stochastic Optimal Control Policies for Planetary
Soft-landing
- Authors: Marcus A. Pereira, Camilo A. Duarte, Ioannis Exarchos, and Evangelos
A. Theodorou
- Abstract summary: We introduce a novel deep learning based solution to the Powered-Descent Guidance (PDG) problem.
Our SOC can handle practically useful $mathcalL1 constraints pre-specified for minimum fuel consumption.
We demonstrate that our controller can successfully and safely land all trajectories at the base of an inverted cone while minimizing fuel consumption.
- Score: 9.714390258486569
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: In this paper, we introduce a novel deep learning based solution to the
Powered-Descent Guidance (PDG) problem, grounded in principles of nonlinear
Stochastic Optimal Control (SOC) and Feynman-Kac theory. Our algorithm solves
the PDG problem by framing it as an $\mathcal{L}^1$ SOC problem for minimum
fuel consumption. Additionally, it can handle practically useful control
constraints, nonlinear dynamics and enforces state constraints as
soft-constraints. This is achieved by building off of recent work on deep
Forward-Backward Stochastic Differential Equations (FBSDEs) and differentiable
non-convex optimization neural-network layers based on stochastic search. In
contrast to previous approaches, our algorithm does not require convexification
of the constraints or linearization of the dynamics and is empirically shown to
be robust to stochastic disturbances and the initial position of the
spacecraft. After training offline, our controller can be activated once the
spacecraft is within a pre-specified radius of the landing zone and at a
pre-specified altitude i.e., the base of an inverted cone with the tip at the
landing zone. We demonstrate empirically that our controller can successfully
and safely land all trajectories initialized at the base of this cone while
minimizing fuel consumption.
Related papers
- Neural Policy Iteration for Stochastic Optimal Control: A Physics-Informed Approach [2.8988658640181826]
We propose a physics-informed neural network policy iteration framework (PINN-PI)<n>At each iteration, a neural network is trained to approximate the value function by minimizing the residual of a linear PDE induced by a fixed policy.<n>We demonstrate the effectiveness of our method on several benchmark problems, including gradient cartpole, pendulum high-dimensional linear quadratic regulation (LQR) problems in up to 10D.
arXiv Detail & Related papers (2025-08-03T11:02:25Z) - Sub-linear Regret in Adaptive Model Predictive Control [56.705978425244496]
We present STT-MPC (Self-Tuning Tube-based Model Predictive Control), an online oracle that combines the certainty-equivalence principle and polytopic tubes.
We analyze the regret of the algorithm, when compared to an algorithm initially aware of the system dynamics.
arXiv Detail & Related papers (2023-10-07T15:07:10Z) - Can Decentralized Stochastic Minimax Optimization Algorithms Converge
Linearly for Finite-Sum Nonconvex-Nonconcave Problems? [56.62372517641597]
Decentralized minimax optimization has been actively studied in the past few years due to its application in a wide range machine learning.
This paper develops two novel decentralized minimax optimization algorithms for the non-strongly-nonconcave problem.
arXiv Detail & Related papers (2023-04-24T02:19:39Z) - Convex Optimization-based Policy Adaptation to Compensate for
Distributional Shifts [0.991395455012393]
We show that we can learn policies that track the optimal trajectory with much better error performance, and faster computation times.
We demonstrate the efficacy of our approach on tracking an optimal path using a Dubin's car model, and collision avoidance using both a linear and nonlinear model for adaptive cruise control.
arXiv Detail & Related papers (2023-04-05T09:26:59Z) - CACTO: Continuous Actor-Critic with Trajectory Optimization -- Towards
global optimality [5.0915256711576475]
This paper presents a novel algorithm for the continuous control of dynamical systems that combines Trayy (TO) and Reinforcement Learning (RL) in a single trajectory.
arXiv Detail & Related papers (2022-11-12T10:16:35Z) - Deep Learning Approximation of Diffeomorphisms via Linear-Control
Systems [91.3755431537592]
We consider a control system of the form $dot x = sum_i=1lF_i(x)u_i$, with linear dependence in the controls.
We use the corresponding flow to approximate the action of a diffeomorphism on a compact ensemble of points.
arXiv Detail & Related papers (2021-10-24T08:57:46Z) - Regret Analysis of Learning-Based MPC with Partially-Unknown Cost
Function [5.601217969637838]
exploration/exploitation trade-off is an inherent challenge in data-driven and adaptive control.
We propose the use of a finitehorizon oracle controller with perfect knowledge of all system parameters as a reference for optimal control actions.
We develop learning-based policies that we prove achieve low regret with respect to this oracle finite-horizon controller.
arXiv Detail & Related papers (2021-08-04T22:43:51Z) - Regret-optimal Estimation and Control [52.28457815067461]
We show that the regret-optimal estimator and regret-optimal controller can be derived in state-space form.
We propose regret-optimal analogs of Model-Predictive Control (MPC) and the Extended KalmanFilter (EKF) for systems with nonlinear dynamics.
arXiv Detail & Related papers (2021-06-22T23:14:21Z) - Gaussian Process-based Min-norm Stabilizing Controller for
Control-Affine Systems with Uncertain Input Effects and Dynamics [90.81186513537777]
We propose a novel compound kernel that captures the control-affine nature of the problem.
We show that this resulting optimization problem is convex, and we call it Gaussian Process-based Control Lyapunov Function Second-Order Cone Program (GP-CLF-SOCP)
arXiv Detail & Related papers (2020-11-14T01:27:32Z) - Adaptive Control and Regret Minimization in Linear Quadratic Gaussian
(LQG) Setting [91.43582419264763]
We propose LqgOpt, a novel reinforcement learning algorithm based on the principle of optimism in the face of uncertainty.
LqgOpt efficiently explores the system dynamics, estimates the model parameters up to their confidence interval, and deploys the controller of the most optimistic model.
arXiv Detail & Related papers (2020-03-12T19:56:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.