Reinforcement Learning Under Probabilistic Spatio-Temporal Constraints
with Time Windows
- URL: http://arxiv.org/abs/2307.15910v1
- Date: Sat, 29 Jul 2023 06:47:14 GMT
- Title: Reinforcement Learning Under Probabilistic Spatio-Temporal Constraints
with Time Windows
- Authors: Xiaoshan Lin, Abbasali Koochakzadeh, Yasin Yazicioglu, Derya Aksaray
- Abstract summary: We propose an automata-theoretic approach for reinforcement learning (RL) under complex-temporal constraints with time windows.
We provide theoretical guarantees on the resulting probability of constraint satisfaction.
We also provide numerical results in a scenario where a robot explores the environment to discover high-reward regions.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose an automata-theoretic approach for reinforcement learning (RL)
under complex spatio-temporal constraints with time windows. The problem is
formulated using a Markov decision process under a bounded temporal logic
constraint. Different from existing RL methods that can eventually learn
optimal policies satisfying such constraints, our proposed approach enforces a
desired probability of constraint satisfaction throughout learning. This is
achieved by translating the bounded temporal logic constraint into a total
automaton and avoiding "unsafe" actions based on the available prior
information regarding the transition probabilities, i.e., a pair of upper and
lower bounds for each transition probability. We provide theoretical guarantees
on the resulting probability of constraint satisfaction. We also provide
numerical results in a scenario where a robot explores the environment to
discover high-reward regions while fulfilling some periodic pick-up and
delivery tasks that are encoded as temporal logic constraints.
Related papers
- DeepLTL: Learning to Efficiently Satisfy Complex LTL Specifications [59.01527054553122]
Linear temporal logic (LTL) has recently been adopted as a powerful formalism for specifying complex, temporally extended tasks in reinforcement learning (RL)
Existing approaches suffer from several shortcomings: they are often only applicable to finite-horizon fragments, are restricted to suboptimal solutions, and do not adequately handle safety constraints.
In this work, we propose a novel learning approach to address these concerns.
Our method leverages the structure of B"uchia, which explicitly represent the semantics of automat- specifications, to learn policies conditioned on sequences of truth assignments that lead to satisfying the desired formulae.
arXiv Detail & Related papers (2024-10-06T21:30:38Z) - Directed Exploration in Reinforcement Learning from Linear Temporal Logic [59.707408697394534]
Linear temporal logic (LTL) is a powerful language for task specification in reinforcement learning.
We show that the synthesized reward signal remains fundamentally sparse, making exploration challenging.
We show how better exploration can be achieved by further leveraging the specification and casting its corresponding Limit Deterministic B"uchi Automaton (LDBA) as a Markov reward process.
arXiv Detail & Related papers (2024-08-18T14:25:44Z) - CaT: Constraints as Terminations for Legged Locomotion Reinforcement Learning [23.76366118253271]
Current solvers fail to produce efficient policies respecting hard constraints.
We present Constraints as terminations (CaT), a novel constrained RL algorithm.
Videos and code are available at https://constraints-as-terminations.io.
arXiv Detail & Related papers (2024-03-27T17:03:31Z) - Robust Stochastically-Descending Unrolled Networks [85.6993263983062]
Deep unrolling is an emerging learning-to-optimize method that unrolls a truncated iterative algorithm in the layers of a trainable neural network.
We show that convergence guarantees and generalizability of the unrolled networks are still open theoretical problems.
We numerically assess unrolled architectures trained under the proposed constraints in two different applications.
arXiv Detail & Related papers (2023-12-25T18:51:23Z) - Optimal Control of Logically Constrained Partially Observable and Multi-Agent Markov Decision Processes [5.471640959988549]
We first introduce an optimal control theory for partially observable Markov decision processes.
We provide a structured methodology for synthesizing policies that maximize a cumulative reward.
We then build on this approach to design an optimal control framework for logically constrained multi-agent settings.
arXiv Detail & Related papers (2023-05-24T05:15:36Z) - Log Barriers for Safe Black-box Optimization with Application to Safe
Reinforcement Learning [72.97229770329214]
We introduce a general approach for seeking high dimensional non-linear optimization problems in which maintaining safety during learning is crucial.
Our approach called LBSGD is based on applying a logarithmic barrier approximation with a carefully chosen step size.
We demonstrate the effectiveness of our approach on minimizing violation in policy tasks in safe reinforcement learning.
arXiv Detail & Related papers (2022-07-21T11:14:47Z) - Safe Exploration Incurs Nearly No Additional Sample Complexity for
Reward-free RL [43.672794342894946]
Reward-free reinforcement learning (RF-RL) relies on random action-taking to explore the unknown environment without any reward feedback information.
It remains unclear how such safe exploration requirement would affect the corresponding sample complexity in order to achieve the desired optimality of the obtained policy in planning.
We propose a unified Safe reWard-frEe ExploraTion (SWEET) framework, and develop algorithms coined Tabular-SWEET and Low-rank-SWEET, respectively.
arXiv Detail & Related papers (2022-06-28T15:00:45Z) - Deep reinforcement learning under signal temporal logic constraints
using Lagrangian relaxation [0.0]
In general, a constraint may be imposed on the decision making.
We consider the optimal decision making problems with constraints to complete temporal high-level tasks.
We propose a two-phase constrained DRL algorithm using the Lagrangian relaxation method.
arXiv Detail & Related papers (2022-01-21T00:56:25Z) - Multi-Agent Reinforcement Learning with Temporal Logic Specifications [65.79056365594654]
We study the problem of learning to satisfy temporal logic specifications with a group of agents in an unknown environment.
We develop the first multi-agent reinforcement learning technique for temporal logic specifications.
We provide correctness and convergence guarantees for our main algorithm.
arXiv Detail & Related papers (2021-02-01T01:13:03Z) - Constrained Reinforcement Learning for Dynamic Optimization under
Uncertainty [1.5797349391370117]
Dynamic real-time optimization (DRTO) is a challenging task due to the fact that optimal operating conditions must be computed in real time.
The main bottleneck in the industrial application of DRTO is the presence of uncertainty.
We present a constrained reinforcement learning (RL) based approach to accommodate these difficulties.
arXiv Detail & Related papers (2020-06-04T10:17:35Z) - Certified Reinforcement Learning with Logic Guidance [78.2286146954051]
We propose a model-free RL algorithm that enables the use of Linear Temporal Logic (LTL) to formulate a goal for unknown continuous-state/action Markov Decision Processes (MDPs)
The algorithm is guaranteed to synthesise a control policy whose traces satisfy the specification with maximal probability.
arXiv Detail & Related papers (2019-02-02T20:09:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.