Solving Reach-Avoid-Stay Problems Using Deep Deterministic Policy Gradients
- URL: http://arxiv.org/abs/2410.02898v2
- Date: Mon, 7 Oct 2024 19:00:47 GMT
- Title: Solving Reach-Avoid-Stay Problems Using Deep Deterministic Policy Gradients
- Authors: Gabriel Chenevert, Jingqi Li, Achyuta kannan, Sangjae Bae, Donggun Lee,
- Abstract summary: Reach-Avoid-Stay (RAS) optimal control enables systems such as robots and air taxis to reach their targets, avoid obstacles, and stay near the target.
Current methods for RAS often struggle with handling complex, dynamic environments and scaling to high-dimensional systems.
We propose a two-step deep deterministic policy gradient (DDPG) method to extend RL-based reachability method to solve RAS problems.
- Score: 3.4849272655643326
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Reach-Avoid-Stay (RAS) optimal control enables systems such as robots and air taxis to reach their targets, avoid obstacles, and stay near the target. However, current methods for RAS often struggle with handling complex, dynamic environments and scaling to high-dimensional systems. While reinforcement learning (RL)-based reachability analysis addresses these challenges, it has yet to tackle the RAS problem. In this paper, we propose a two-step deep deterministic policy gradient (DDPG) method to extend RL-based reachability method to solve RAS problems. First, we train a function that characterizes the maximal robust control invariant set within the target set, where the system can safely stay, along with its corresponding policy. Second, we train a function that defines the set of states capable of safely reaching the robust control invariant set, along with its corresponding policy. We prove that this method results in the maximal robust RAS set in the absence of training errors and demonstrate that it enables RAS in complex environments, scales to high-dimensional systems, and achieves higher success rates for the RAS task compared to previous methods, validated through one simulation and two high-dimensional experiments.
Related papers
- Robust Deterministic Policy Gradient for Disturbance Attenuation and Its Application to Quadrotor Control [5.084000938840218]
This paper proposes a reinforcement learning algorithm called Robust Deterministic Policy Gradient (RDPG)
RDPG formulates the $H_infty$ control problem as a two-player zero-sum dynamic game.
We then employ deterministic policy gradient (DPG) and its deep reinforcement learning counterpart to train a robust control policy with effective disturbance attenuation.
arXiv Detail & Related papers (2025-02-28T13:58:22Z) - Adversarial Robustness in Two-Stage Learning-to-Defer: Algorithms and Guarantees [3.6787328174619254]
Learning-to-Defer (L2D) facilitates optimal task allocation between AI systems and decision-makers.
This paper conducts the first comprehensive analysis of adversarial robustness in two-stage L2D frameworks.
We propose SARD, a robust, convex, deferral algorithm rooted in Bayes and $(mathcalR,mathcalG)$-consistency.
arXiv Detail & Related papers (2025-02-03T03:44:35Z) - Domain Adaptive Safety Filters via Deep Operator Learning [5.62479170374811]
We propose a self-supervised deep operator learning framework that learns the mapping from environmental parameters to the corresponding CBF.
We demonstrate the effectiveness of the method through numerical experiments on navigation tasks involving dynamic obstacles.
arXiv Detail & Related papers (2024-10-18T15:10:55Z) - Last-Iterate Global Convergence of Policy Gradients for Constrained Reinforcement Learning [62.81324245896717]
We introduce an exploration-agnostic algorithm, called C-PG, which exhibits global last-ite convergence guarantees under (weak) gradient domination assumptions.
We numerically validate our algorithms on constrained control problems, and compare them with state-of-the-art baselines.
arXiv Detail & Related papers (2024-07-15T14:54:57Z) - Two-Stage ML-Guided Decision Rules for Sequential Decision Making under Uncertainty [55.06411438416805]
Sequential Decision Making under Uncertainty (SDMU) is ubiquitous in many domains such as energy, finance, and supply chains.
Some SDMU are naturally modeled as Multistage Problems (MSPs) but the resulting optimizations are notoriously challenging from a computational standpoint.
This paper introduces a novel approach Two-Stage General Decision Rules (TS-GDR) to generalize the policy space beyond linear functions.
The effectiveness of TS-GDR is demonstrated through an instantiation using Deep Recurrent Neural Networks named Two-Stage Deep Decision Rules (TS-LDR)
arXiv Detail & Related papers (2024-05-23T18:19:47Z) - Learning Predictive Safety Filter via Decomposition of Robust Invariant
Set [6.94348936509225]
This paper presents advantages of both RMPC and RL RL to synthesize safety filters for nonlinear systems.
We propose a policy approach for robust reach problems and establish its complexity.
arXiv Detail & Related papers (2023-11-12T08:11:28Z) - In-Distribution Barrier Functions: Self-Supervised Policy Filters that
Avoid Out-of-Distribution States [84.24300005271185]
We propose a control filter that wraps any reference policy and effectively encourages the system to stay in-distribution with respect to offline-collected safe demonstrations.
Our method is effective for two different visuomotor control tasks in simulation environments, including both top-down and egocentric view settings.
arXiv Detail & Related papers (2023-01-27T22:28:19Z) - Robust Policy Learning over Multiple Uncertainty Sets [91.67120465453179]
Reinforcement learning (RL) agents need to be robust to variations in safety-critical environments.
We develop an algorithm that enjoys the benefits of both system identification and robust RL.
arXiv Detail & Related papers (2022-02-14T20:06:28Z) - Safe-Critical Modular Deep Reinforcement Learning with Temporal Logic
through Gaussian Processes and Control Barrier Functions [3.5897534810405403]
Reinforcement learning (RL) is a promising approach and has limited success towards real-world applications.
In this paper, we propose a learning-based control framework consisting of several aspects.
We show such an ECBF-based modular deep RL algorithm achieves near-perfect success rates and guard safety with a high probability.
arXiv Detail & Related papers (2021-09-07T00:51:12Z) - Derivative-Free Policy Optimization for Risk-Sensitive and Robust
Control Design: Implicit Regularization and Sample Complexity [15.940861063732608]
Direct policy search serves as one of the workhorses in modern reinforcement learning (RL)
We investigate the convergence theory of policy robustness (PG) methods for the linear risk-sensitive and robust controller.
One feature of our algorithms is that during the learning phase, a certain level complexity/risk-sensitivity controller is preserved.
arXiv Detail & Related papers (2021-01-04T16:00:46Z) - SUNRISE: A Simple Unified Framework for Ensemble Learning in Deep
Reinforcement Learning [102.78958681141577]
We present SUNRISE, a simple unified ensemble method, which is compatible with various off-policy deep reinforcement learning algorithms.
SUNRISE integrates two key ingredients: (a) ensemble-based weighted Bellman backups, which re-weight target Q-values based on uncertainty estimates from a Q-ensemble, and (b) an inference method that selects actions using the highest upper-confidence bounds for efficient exploration.
arXiv Detail & Related papers (2020-07-09T17:08:44Z) - Guided Constrained Policy Optimization for Dynamic Quadrupedal Robot
Locomotion [78.46388769788405]
We introduce guided constrained policy optimization (GCPO), an RL framework based upon our implementation of constrained policy optimization (CPPO)
We show that guided constrained RL offers faster convergence close to the desired optimum resulting in an optimal, yet physically feasible, robotic control behavior without the need for precise reward function tuning.
arXiv Detail & Related papers (2020-02-22T10:15:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.