Related papers: Reinforcement Learning for Task Specifications with Action-Constraints

Reinforcement Learning for Task Specifications with Action-Constraints

URL: http://arxiv.org/abs/2201.00286v1
Date: Sun, 2 Jan 2022 04:22:01 GMT
Title: Reinforcement Learning for Task Specifications with Action-Constraints
Authors: Arun Raman, Keerthan Shagrithaya and Shalabh Bhatnagar
Abstract summary: We propose a method to learn optimal control policies for a finite-state Markov Decision Process. We assume that the set of action sequences that are deemed unsafe and/or safe are given in terms of a finite-state automaton. We present a version of the Q-learning algorithm for learning optimal policies in the presence of non-Markovian action-sequence and state constraints.
Score: 4.046919218061427
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In this paper, we use concepts from supervisory control theory of discrete event systems to propose a method to learn optimal control policies for a finite-state Markov Decision Process (MDP) in which (only) certain sequences of actions are deemed unsafe (respectively safe). We assume that the set of action sequences that are deemed unsafe and/or safe are given in terms of a finite-state automaton; and propose a supervisor that disables a subset of actions at every state of the MDP so that the constraints on action sequence are satisfied. Then we present a version of the Q-learning algorithm for learning optimal policies in the presence of non-Markovian action-sequence and state constraints, where we use the development of reward machines to handle the state constraints. We illustrate the method using an example that captures the utility of automata-based methods for non-Markovian state and action specifications for reinforcement learning and show the results of simulations in this setting.

Related papers

Last-Iterate Global Convergence of Policy Gradients for Constrained Reinforcement Learning [62.81324245896717]
We introduce an exploration-agnostic algorithm, called C-PG, which exhibits global last-ite convergence guarantees under (weak) gradient domination assumptions. We numerically validate our algorithms on constrained control problems, and compare them with state-of-the-art baselines.
arXiv Detail & Related papers (2024-07-15T14:54:57Z)
Code Models are Zero-shot Precondition Reasoners [83.8561159080672]
We use code representations to reason about action preconditions for sequential decision making tasks. We propose a precondition-aware action sampling strategy that ensures actions predicted by a policy are consistent with preconditions.
arXiv Detail & Related papers (2023-11-16T06:19:27Z)
Learning non-Markovian Decision-Making from State-only Sequences [57.20193609153983]
We develop a model-based imitation of state-only sequences with non-Markov Decision Process (nMDP) We demonstrate the efficacy of the proposed method in a path planning task with non-Markovian constraints.
arXiv Detail & Related papers (2023-06-27T02:26:01Z)
Robust Control for Dynamical Systems With Non-Gaussian Noise via Formal Abstractions [59.605246463200736]
We present a novel controller synthesis method that does not rely on any explicit representation of the noise distributions. First, we abstract the continuous control system into a finite-state model that captures noise by probabilistic transitions between discrete states. We use state-of-the-art verification techniques to provide guarantees on the interval Markov decision process and compute a controller for which these guarantees carry over to the original control system.
arXiv Detail & Related papers (2023-01-04T10:40:30Z)
Automatic Exploration Process Adjustment for Safe Reinforcement Learning with Joint Chance Constraint Satisfaction [2.127049691404299]
We propose an automatic exploration process adjustment method for safe reinforcement learning algorithms. Our proposed method automatically selects whether the exploratory input is used or not at each time depending on the state and its predicted value. Our method theoretically guarantees the satisfaction of the constraints with the pre-specified probability, that is, the satisfaction of a joint chance constraint at every time.
arXiv Detail & Related papers (2021-03-05T13:30:53Z)
Modular Deep Reinforcement Learning for Continuous Motion Planning with Temporal Logic [59.94347858883343]
This paper investigates the motion planning of autonomous dynamical systems modeled by Markov decision processes (MDP) The novelty is to design an embedded product MDP (EP-MDP) between the LDGBA and the MDP. The proposed LDGBA-based reward shaping and discounting schemes for the model-free reinforcement learning (RL) only depend on the EP-MDP states.
arXiv Detail & Related papers (2021-02-24T01:11:25Z)
An Abstraction-based Method to Verify Multi-Agent Deep Reinforcement-Learning Behaviours [8.95294551927446]
Multi-agent reinforcement learning (RL) often struggles to ensure the safe behaviours of the learning agents. We present a methodology that combines formal verification with (deep) RL algorithms to guarantee the satisfaction of formally-specified safety constraints.
arXiv Detail & Related papers (2021-02-02T11:12:30Z)
Learning to Satisfy Unknown Constraints in Iterative MPC [3.306595429364865]
We propose a control design method for linear time-invariant systems that iteratively learns to satisfy unknown polyhedral state constraints. At each iteration of a repetitive task, the method constructs an estimate of the unknown environment constraints using collected closed-loop trajectory data. An MPC controller is then designed to robustly satisfy the estimated constraint set.
arXiv Detail & Related papers (2020-06-09T05:19:40Z)
Learning Constrained Adaptive Differentiable Predictive Control Policies With Guarantees [1.1086440815804224]
We present differentiable predictive control (DPC), a method for learning constrained neural control policies for linear systems. We employ automatic differentiation to obtain direct policy gradients by backpropagating the model predictive control (MPC) loss function and constraints penalties through a differentiable closed-loop system dynamics model.
arXiv Detail & Related papers (2020-04-23T14:24:44Z)
Deep Constrained Q-learning [15.582910645906145]
In many real world applications, reinforcement learning agents have to optimize multiple objectives while following certain rules or satisfying a set of constraints. We propose Constrained Q-learning, a novel off-policy reinforcement learning framework restricting the action space directly in the Q-update to learn the optimal Q-function for the induced constrained MDP and the corresponding safe policy.
arXiv Detail & Related papers (2020-03-20T17:26:03Z)
Certified Reinforcement Learning with Logic Guidance [78.2286146954051]
We propose a model-free RL algorithm that enables the use of Linear Temporal Logic (LTL) to formulate a goal for unknown continuous-state/action Markov Decision Processes (MDPs) The algorithm is guaranteed to synthesise a control policy whose traces satisfy the specification with maximal probability.
arXiv Detail & Related papers (2019-02-02T20:09:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.