Solving Parameter-Robust Avoid Problems with Unknown Feasibility using Reinforcement Learning
- URL: http://arxiv.org/abs/2602.15817v1
- Date: Tue, 17 Feb 2026 18:53:31 GMT
- Title: Solving Parameter-Robust Avoid Problems with Unknown Feasibility using Reinforcement Learning
- Authors: Oswin So, Eric Yang Yu, Songyuan Zhang, Matthew Cleaveland, Mitchell Black, Chuchu Fan,
- Abstract summary: We show that Feasibility-Guided Exploration (FGE) learns policies with over 50% more coverage than the best existing method for challenging initial conditions.<n>FGE simultaneously identifies a subset of feasible initial conditions under which a safe policy exists, and learns a policy to solve the reachability problem over this set of initial conditions.
- Score: 20.981639605930376
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent advances in deep reinforcement learning (RL) have achieved strong results on high-dimensional control tasks, but applying RL to reachability problems raises a fundamental mismatch: reachability seeks to maximize the set of states from which a system remains safe indefinitely, while RL optimizes expected returns over a user-specified distribution. This mismatch can result in policies that perform poorly on low-probability states that are still within the safe set. A natural alternative is to frame the problem as a robust optimization over a set of initial conditions that specify the initial state, dynamics and safe set, but whether this problem has a solution depends on the feasibility of the specified set, which is unknown a priori. We propose Feasibility-Guided Exploration (FGE), a method that simultaneously identifies a subset of feasible initial conditions under which a safe policy exists, and learns a policy to solve the reachability problem over this set of initial conditions. Empirical results demonstrate that FGE learns policies with over 50% more coverage than the best existing method for challenging initial conditions across tasks in the MuJoCo simulator and the Kinetix simulator with pixel observations.
Related papers
- Rectified Robust Policy Optimization for Model-Uncertain Constrained Reinforcement Learning without Strong Duality [53.525547349715595]
We propose a novel primal-only algorithm called Rectified Robust Policy Optimization (RRPO)<n>RRPO operates directly on the primal problem without relying on dual formulations.<n>We show convergence to an approximately optimal feasible policy with complexity matching the best-known lower bound.
arXiv Detail & Related papers (2025-08-24T16:59:38Z) - Convergence and Sample Complexity of First-Order Methods for Agnostic Reinforcement Learning [66.4260157478436]
We study reinforcement learning in the policy learning setting.<n>The goal is to find a policy whose performance is competitive with the best policy in a given class of interest.
arXiv Detail & Related papers (2025-07-06T14:40:05Z) - Last-Iterate Global Convergence of Policy Gradients for Constrained Reinforcement Learning [62.81324245896717]
We introduce an exploration-agnostic algorithm, called C-PG, which exhibits global last-ite convergence guarantees under (weak) gradient domination assumptions.
We numerically validate our algorithms on constrained control problems, and compare them with state-of-the-art baselines.
arXiv Detail & Related papers (2024-07-15T14:54:57Z) - Concurrent Learning of Policy and Unknown Safety Constraints in Reinforcement Learning [4.14360329494344]
Reinforcement learning (RL) has revolutionized decision-making across a wide range of domains over the past few decades.
Yet, deploying RL policies in real-world scenarios presents the crucial challenge of ensuring safety.
Traditional safe RL approaches have predominantly focused on incorporating predefined safety constraints into the policy learning process.
We propose a novel approach that concurrently learns a safe RL control policy and identifies the unknown safety constraint parameters of a given environment.
arXiv Detail & Related papers (2024-02-24T20:01:15Z) - Probabilistic Reach-Avoid for Bayesian Neural Networks [71.67052234622781]
We show that an optimal synthesis algorithm can provide more than a four-fold increase in the number of certifiable states.
The algorithm is able to provide more than a three-fold increase in the average guaranteed reach-avoid probability.
arXiv Detail & Related papers (2023-10-03T10:52:21Z) - Log Barriers for Safe Black-box Optimization with Application to Safe
Reinforcement Learning [72.97229770329214]
We introduce a general approach for seeking high dimensional non-linear optimization problems in which maintaining safety during learning is crucial.
Our approach called LBSGD is based on applying a logarithmic barrier approximation with a carefully chosen step size.
We demonstrate the effectiveness of our approach on minimizing violation in policy tasks in safe reinforcement learning.
arXiv Detail & Related papers (2022-07-21T11:14:47Z) - Reachability Constrained Reinforcement Learning [6.5158195776494]
This paper proposes a reachability CRL (RCRL) method by using reachability analysis to characterize the largest feasible sets.
We also use the multi-time scale approximation theory to prove that the proposed algorithm converges to a local optimum.
Empirical results on different benchmarks such as safe-control-gym and Safety-Gym validate the learned feasible set, the performance in optimal criteria, and constraint satisfaction of RCRL.
arXiv Detail & Related papers (2022-05-16T09:32:45Z) - Robust Policy Learning over Multiple Uncertainty Sets [91.67120465453179]
Reinforcement learning (RL) agents need to be robust to variations in safety-critical environments.
We develop an algorithm that enjoys the benefits of both system identification and robust RL.
arXiv Detail & Related papers (2022-02-14T20:06:28Z) - Feasible Actor-Critic: Constrained Reinforcement Learning for Ensuring
Statewise Safety [1.9573380763700712]
We introduce the feasible actor-critic (FAC) algorithm, which is the first model-free constrained safe reinforcement learning method.
We claim that some states are inherently unsafe no matter what policy we choose, while for other states there exist policies ensuring safety, where we say such states and policies are feasible.
We provide theoretical guarantees that FAC outperforms previous expectation-based constrained RL methods in terms of both constraint satisfaction and reward optimization.
arXiv Detail & Related papers (2021-05-22T10:40:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.