LTL-Constrained Policy Optimization with Cycle Experience Replay
- URL: http://arxiv.org/abs/2404.11578v2
- Date: Fri, 24 May 2024 22:57:06 GMT
- Title: LTL-Constrained Policy Optimization with Cycle Experience Replay
- Authors: Ameesh Shah, Cameron Voloshin, Chenxi Yang, Abhinav Verma, Swarat Chaudhuri, Sanjit A. Seshia,
- Abstract summary: Linear Logic (LTL) offers a precise means for constraining the behavior of reinforcement learning agents.
We present Cycle Experience (CyclER), a reward-shaping approach to this problem that allows continuous state and action spaces.
- Score: 19.43224037705577
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Linear Temporal Logic (LTL) offers a precise means for constraining the behavior of reinforcement learning agents. However, in many tasks, LTL is insufficient for task specification; LTL-constrained policy optimization, where the goal is to optimize a scalar reward under LTL constraints, is needed. Prior methods for this constrained problem are restricted to finite state spaces. In this work, we present Cycle Experience Replay (CyclER), a reward-shaping approach to this problem that allows continuous state and action spaces and the use of function approximations. CyclER guides a policy towards satisfaction by encouraging partial behaviors compliant with the LTL constraint, using the structure of the constraint. In doing so, it addresses the optimization challenges stemming from the sparse nature of LTL satisfaction. We evaluate CyclER in three continuous control domains. On these tasks, CyclER outperforms existing reward-shaping methods at finding performant and LTL-satisfying policies.
Related papers
- DeepLTL: Learning to Efficiently Satisfy Complex LTL Specifications [59.01527054553122]
Linear temporal logic (LTL) has recently been adopted as a powerful formalism for specifying complex, temporally extended tasks in reinforcement learning (RL)
Existing approaches suffer from several shortcomings: they are often only applicable to finite-horizon fragments, are restricted to suboptimal solutions, and do not adequately handle safety constraints.
In this work, we propose a novel learning approach to address these concerns.
Our method leverages the structure of B"uchia, which explicitly represent the semantics of automat- specifications, to learn policies conditioned on sequences of truth assignments that lead to satisfying the desired formulae.
arXiv Detail & Related papers (2024-10-06T21:30:38Z) - Directed Exploration in Reinforcement Learning from Linear Temporal Logic [59.707408697394534]
Linear temporal logic (LTL) is a powerful language for task specification in reinforcement learning.
We show that the synthesized reward signal remains fundamentally sparse, making exploration challenging.
We show how better exploration can be achieved by further leveraging the specification and casting its corresponding Limit Deterministic B"uchi Automaton (LDBA) as a Markov reward process.
arXiv Detail & Related papers (2024-08-18T14:25:44Z) - Trust the PRoC3S: Solving Long-Horizon Robotics Problems with LLMs and Constraint Satisfaction [38.683780057806516]
Recent developments in pretrained large language models (LLMs) applied robotics have demonstrated their capacity for sequencing a set of discrete skills to achieve open-ended goals in simple robotic tasks.
In this paper, we examine the topic of LLM planning for a set of continuously parameterized skills whose execution must avoid violations of a set of kinematic, geometric, and physical constraints.
Experiments across three different simulated 3D domains demonstrate that our proposed strategy, PRoC3S, is capable of solving a wide range of complex manipulation tasks with realistic constraints on continuous parameters much more efficiently and effectively than existing baselines.
arXiv Detail & Related papers (2024-06-08T20:56:14Z) - LTLDoG: Satisfying Temporally-Extended Symbolic Constraints for Safe Diffusion-based Planning [12.839846486863308]
In this work, we focus on generating long-horizon trajectories that adhere to novel static and temporally-extended constraints/instructions at test time.
We propose a data-driven diffusion-based framework, finiteDoG, that modifies the inference steps of the reverse process given an instruction specified using linear temporal logic.
Experiments in robot navigation and manipulation illustrate that the method is able to generate trajectories that satisfy formulae that specify obstacle avoidance and visitation sequences.
arXiv Detail & Related papers (2024-05-07T11:54:22Z) - Constrained Reinforcement Learning with Smoothed Log Barrier Function [27.216122901635018]
We propose a new constrained RL method called CSAC-LB (Constrained Soft Actor-Critic with Log Barrier Function)
It achieves competitive performance without any pre-training by applying a linear smoothed log barrier function to an additional safety critic.
We show that with CSAC-LB, we achieve state-of-the-art performance on several constrained control tasks with different levels of difficulty.
arXiv Detail & Related papers (2024-03-21T16:02:52Z) - Penalized Proximal Policy Optimization for Safe Reinforcement Learning [68.86485583981866]
We propose Penalized Proximal Policy Optimization (P3O), which solves the cumbersome constrained policy iteration via a single minimization of an equivalent unconstrained problem.
P3O utilizes a simple-yet-effective penalty function to eliminate cost constraints and removes the trust-region constraint by the clipped surrogate objective.
We show that P3O outperforms state-of-the-art algorithms with respect to both reward improvement and constraint satisfaction on a set of constrained locomotive tasks.
arXiv Detail & Related papers (2022-05-24T06:15:51Z) - Shortest-Path Constrained Reinforcement Learning for Sparse Reward Tasks [59.419152768018506]
We show that any optimal policy necessarily satisfies the k-SP constraint.
We propose a novel cost function that penalizes the policy violating SP constraint, instead of completely excluding it.
Our experiments on MiniGrid, DeepMind Lab, Atari, and Fetch show that the proposed method significantly improves proximal policy optimization (PPO)
arXiv Detail & Related papers (2021-07-13T21:39:21Z) - Guided Constrained Policy Optimization for Dynamic Quadrupedal Robot
Locomotion [78.46388769788405]
We introduce guided constrained policy optimization (GCPO), an RL framework based upon our implementation of constrained policy optimization (CPPO)
We show that guided constrained RL offers faster convergence close to the desired optimum resulting in an optimal, yet physically feasible, robotic control behavior without the need for precise reward function tuning.
arXiv Detail & Related papers (2020-02-22T10:15:53Z) - Certified Reinforcement Learning with Logic Guidance [78.2286146954051]
We propose a model-free RL algorithm that enables the use of Linear Temporal Logic (LTL) to formulate a goal for unknown continuous-state/action Markov Decision Processes (MDPs)
The algorithm is guaranteed to synthesise a control policy whose traces satisfy the specification with maximal probability.
arXiv Detail & Related papers (2019-02-02T20:09:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.