Funnel-based Reward Shaping for Signal Temporal Logic Tasks in
Reinforcement Learning
- URL: http://arxiv.org/abs/2212.03181v3
- Date: Sun, 3 Dec 2023 05:50:22 GMT
- Title: Funnel-based Reward Shaping for Signal Temporal Logic Tasks in
Reinforcement Learning
- Authors: Naman Saxena, Gorantla Sandeep, Pushpak Jagtap
- Abstract summary: We propose a tractable reinforcement learning algorithm to learn a controller that enforces Signal Temporal Logic (STL) specifications.
We demonstrate the utility of our approach on several STL tasks using different environments.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Signal Temporal Logic (STL) is a powerful framework for describing the
complex temporal and logical behaviour of the dynamical system. Numerous
studies have attempted to employ reinforcement learning to learn a controller
that enforces STL specifications; however, they have been unable to effectively
tackle the challenges of ensuring robust satisfaction in continuous state space
and maintaining tractability. In this paper, leveraging the concept of funnel
functions, we propose a tractable reinforcement learning algorithm to learn a
time-dependent policy for robust satisfaction of STL specification in
continuous state space. We demonstrate the utility of our approach on several
STL tasks using different environments.
Related papers
- LTLDoG: Satisfying Temporally-Extended Symbolic Constraints for Safe Diffusion-based Planning [12.839846486863308]
In this work, we focus on generating long-horizon trajectories that adhere to novel static and temporally-extended constraints/instructions at test time.
We propose a data-driven diffusion-based framework, which modifies the inference steps of the reverse process given an instruction specified.
Experiments in robot navigation and manipulation illustrate that the method is able to generate trajectories that satisfy formulae that specify obstacle avoidance and visitation sequences.
arXiv Detail & Related papers (2024-05-07T11:54:22Z) - Signal Temporal Logic Neural Predictive Control [15.540490027770621]
We propose a method to learn a neural network controller to satisfy the requirements specified in Signal temporal logic (STL)
Our controller learns to roll out trajectories to maximize the STL robustness score in training.
A backup policy is designed to ensure safety when our controller fails.
arXiv Detail & Related papers (2023-09-10T20:31:25Z) - Diagnostic Spatio-temporal Transformer with Faithful Encoding [54.02712048973161]
This paper addresses the task of anomaly diagnosis when the underlying data generation process has a complex-temporal (ST) dependency.
We formalize the problem as supervised dependency discovery, where the ST dependency is learned as a side product of time-series classification.
We show that temporal positional encoding used in existing ST transformer works has a serious limitation capturing frequencies in higher frequencies (short time scales)
We also propose a new ST dependency discovery framework, which can provide readily consumable diagnostic information in both spatial and temporal directions.
arXiv Detail & Related papers (2023-05-26T05:31:23Z) - STL-Based Synthesis of Feedback Controllers Using Reinforcement Learning [8.680676599607125]
Deep Reinforcement Learning (DRL) has the potential to be used for synthesizing feedback controllers (agents) for various complex systems with unknown dynamics.
In RL, the reward function plays a crucial role in specifying the desired behaviour of these agents.
We provide a systematic way of generating rewards in real-time by using the quantitative semantics of Signal Temporal Logic (STL)
We evaluate our STL-based reinforcement learning mechanism on several complex continuous control benchmarks and compare our STL semantics with those available in the literature in terms of their efficacy in synthesizing the controller agent.
arXiv Detail & Related papers (2022-12-02T08:31:46Z) - Accelerated Policy Learning with Parallel Differentiable Simulation [59.665651562534755]
We present a differentiable simulator and a new policy learning algorithm (SHAC)
Our algorithm alleviates problems with local minima through a smooth critic function.
We show substantial improvements in sample efficiency and wall-clock time over state-of-the-art RL and differentiable simulation-based algorithms.
arXiv Detail & Related papers (2022-04-14T17:46:26Z) - Deep reinforcement learning under signal temporal logic constraints
using Lagrangian relaxation [0.0]
In general, a constraint may be imposed on the decision making.
We consider the optimal decision making problems with constraints to complete temporal high-level tasks.
We propose a two-phase constrained DRL algorithm using the Lagrangian relaxation method.
arXiv Detail & Related papers (2022-01-21T00:56:25Z) - Learning Robust Policy against Disturbance in Transition Dynamics via
State-Conservative Policy Optimization [63.75188254377202]
Deep reinforcement learning algorithms can perform poorly in real-world tasks due to discrepancy between source and target environments.
We propose a novel model-free actor-critic algorithm to learn robust policies without modeling the disturbance in advance.
Experiments in several robot control tasks demonstrate that SCPO learns robust policies against the disturbance in transition dynamics.
arXiv Detail & Related papers (2021-12-20T13:13:05Z) - Learning from Demonstrations using Signal Temporal Logic [1.2182193687133713]
Learning-from-demonstrations is an emerging paradigm to obtain effective robot control policies.
We use Signal Temporal Logic to evaluate and rank the quality of demonstrations.
We show that our approach outperforms the state-of-the-art Maximum Causal Entropy Inverse Reinforcement Learning.
arXiv Detail & Related papers (2021-02-15T18:28:36Z) - Multi-Agent Reinforcement Learning with Temporal Logic Specifications [65.79056365594654]
We study the problem of learning to satisfy temporal logic specifications with a group of agents in an unknown environment.
We develop the first multi-agent reinforcement learning technique for temporal logic specifications.
We provide correctness and convergence guarantees for our main algorithm.
arXiv Detail & Related papers (2021-02-01T01:13:03Z) - Online Reinforcement Learning Control by Direct Heuristic Dynamic
Programming: from Time-Driven to Event-Driven [80.94390916562179]
Time-driven learning refers to the machine learning method that updates parameters in a prediction model continuously as new data arrives.
It is desirable to prevent the time-driven dHDP from updating due to insignificant system event such as noise.
We show how the event-driven dHDP algorithm works in comparison to the original time-driven dHDP.
arXiv Detail & Related papers (2020-06-16T05:51:25Z) - Certified Reinforcement Learning with Logic Guidance [78.2286146954051]
We propose a model-free RL algorithm that enables the use of Linear Temporal Logic (LTL) to formulate a goal for unknown continuous-state/action Markov Decision Processes (MDPs)
The algorithm is guaranteed to synthesise a control policy whose traces satisfy the specification with maximal probability.
arXiv Detail & Related papers (2019-02-02T20:09:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.