Integrating LTL Constraints into PPO for Safe Reinforcement Learning
- URL: http://arxiv.org/abs/2603.01292v1
- Date: Sun, 01 Mar 2026 21:55:24 GMT
- Title: Integrating LTL Constraints into PPO for Safe Reinforcement Learning
- Authors: Maifang Zhang, Hang Yu, Qian Zuo, Cheng Wang, Vaishak Belle, Fengxiang He,
- Abstract summary: This paper proposes a framework that integrates safety constraints written in PPO into PPO for safe reinforcement learning.<n>Our PPO-LTL can consistently reduce safety violations, while maintaining competitive performance, against the state-of-the-art methods.
- Score: 27.055056884492984
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper proposes Proximal Policy Optimization with Linear Temporal Logic Constraints (PPO-LTL), a framework that integrates safety constraints written in LTL into PPO for safe reinforcement learning. LTL constraints offer rigorous representations of complex safety requirements, such as regulations that broadly exist in robotics, enabling systematic monitoring of safety requirements. Violations against LTL constraints are monitored by limit-deterministic Büchi automata, and then translated by a logic-to-cost mechanism into penalty signals. The signals are further employed for guiding the policy optimization via the Lagrangian scheme. Extensive experiments on the Zones and CARLA environments show that our PPO-LTL can consistently reduce safety violations, while maintaining competitive performance, against the state-of-the-art methods. The code is at https://github.com/EVIEHub/PPO-LTL.
Related papers
- BarrierSteer: LLM Safety via Learning Barrier Steering [83.12893815611052]
BarrierSteer is a novel framework that formalizes safety by embedding learned non-linear safety constraints directly into the model's latent representation space.<n>We show that BarrierSteer substantially reduces adversarial success rates, decreases unsafe generations, and outperforms existing methods.
arXiv Detail & Related papers (2026-02-23T18:19:46Z) - Learning Robust and Correct Controllers Guided by Feasibility-Aware Signal Temporal Logic via BarrierNet [5.174839530270601]
Control Barrier Functions (CBFs) have emerged as a powerful tool for enforcing safety in optimization-based controllers.<n>We propose a feasibility-aware learning framework that embeds CBFs into a differentiable Quadratic Program (dQP)
arXiv Detail & Related papers (2025-12-07T19:52:27Z) - Exchange Policy Optimization Algorithm for Semi-Infinite Safe Reinforcement Learning [26.75757359001632]
We propose exchange policy optimization (EPO), an algorithmic framework that achieves optimal policy performance and deterministic bounded safety.<n>EPO works by iteratively solving safe RL subproblems with finite constraint sets and adaptively adjusting the active set through constraint expansion and deletion.<n>Our theoretical analysis demonstrates that, under mild assumptions, strategies trained via EPO achieve performance comparable to optimal solutions with global constraint violations strictly remaining within a prescribed bound.
arXiv Detail & Related papers (2025-11-06T07:51:58Z) - SafeDPO: A Simple Approach to Direct Preference Optimization with Enhanced Safety [57.14003339251827]
We introduce a new algorithm called SafeDPO, which is designed to directly optimize the safety alignment objective in a single stage of policy learning.<n>As a result, it eliminates the need to fit separate reward and cost models or to sample from the language model during fine-tuning.<n>We demonstrate that SafeDPO achieves competitive performance compared to state-of-the-art safety alignment algorithms.
arXiv Detail & Related papers (2025-05-26T14:50:01Z) - LTL-Constrained Policy Optimization with Cycle Experience Replay [19.43224037705577]
We introduce Cycle Replay (CyclER), a novel reward shaping technique that exploits the underlying structure of a constraint to guide a policy towards satisfaction.<n>We provide a theoretical guarantee that optimizing CyclER will achieve policies that satisfy the constraint with near-optimal probability.<n>Our experimental results show that optimizing CyclER in tandem with the existing scalar reward outperforms existing reward-shaping methods at finding performant-satisfying policies.
arXiv Detail & Related papers (2024-04-17T17:24:44Z) - Concurrent Learning of Policy and Unknown Safety Constraints in Reinforcement Learning [4.14360329494344]
Reinforcement learning (RL) has revolutionized decision-making across a wide range of domains over the past few decades.
Yet, deploying RL policies in real-world scenarios presents the crucial challenge of ensuring safety.
Traditional safe RL approaches have predominantly focused on incorporating predefined safety constraints into the policy learning process.
We propose a novel approach that concurrently learns a safe RL control policy and identifies the unknown safety constraint parameters of a given environment.
arXiv Detail & Related papers (2024-02-24T20:01:15Z) - Enforcing Hard Constraints with Soft Barriers: Safe Reinforcement
Learning in Unknown Stochastic Environments [84.3830478851369]
We propose a safe reinforcement learning approach that can jointly learn the environment and optimize the control policy.
Our approach can effectively enforce hard safety constraints and significantly outperform CMDP-based baseline methods in system safe rate measured via simulations.
arXiv Detail & Related papers (2022-09-29T20:49:25Z) - Shortest-Path Constrained Reinforcement Learning for Sparse Reward Tasks [59.419152768018506]
We show that any optimal policy necessarily satisfies the k-SP constraint.
We propose a novel cost function that penalizes the policy violating SP constraint, instead of completely excluding it.
Our experiments on MiniGrid, DeepMind Lab, Atari, and Fetch show that the proposed method significantly improves proximal policy optimization (PPO)
arXiv Detail & Related papers (2021-07-13T21:39:21Z) - Lyapunov Barrier Policy Optimization [15.364174084072872]
We propose a new method, LBPO, that uses a Lyapunov-based barrier function to restrict the policy update to a safe set for each training iteration.
Our method also allows the user to control the conservativeness of the agent with respect to the constraints in the environment.
arXiv Detail & Related papers (2021-03-16T17:58:27Z) - Guided Constrained Policy Optimization for Dynamic Quadrupedal Robot
Locomotion [78.46388769788405]
We introduce guided constrained policy optimization (GCPO), an RL framework based upon our implementation of constrained policy optimization (CPPO)
We show that guided constrained RL offers faster convergence close to the desired optimum resulting in an optimal, yet physically feasible, robotic control behavior without the need for precise reward function tuning.
arXiv Detail & Related papers (2020-02-22T10:15:53Z) - Certified Reinforcement Learning with Logic Guidance [78.2286146954051]
We propose a model-free RL algorithm that enables the use of Linear Temporal Logic (LTL) to formulate a goal for unknown continuous-state/action Markov Decision Processes (MDPs)
The algorithm is guaranteed to synthesise a control policy whose traces satisfy the specification with maximal probability.
arXiv Detail & Related papers (2019-02-02T20:09:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.