Verifiable Reinforcement Learning Systems via Compositionality
- URL: http://arxiv.org/abs/2309.06420v1
- Date: Sat, 9 Sep 2023 17:11:44 GMT
- Title: Verifiable Reinforcement Learning Systems via Compositionality
- Authors: Cyrus Neary, Aryaman Singh Samyal, Christos Verginis, Murat Cubuktepe,
Ufuk Topcu
- Abstract summary: We propose a framework for verifiable and compositional reinforcement learning (RL) in which a collection of RL subsystems are composed to achieve an overall task.
We present theoretical results guaranteeing that if each subsystem learns a policy satisfying its subtask specification, then its composition is guaranteed to satisfy the overall task specification.
We present a method, formulated as the problem of finding an optimal set of parameters in the high-level model, to automatically update the subtask specifications to account for the observed shortcomings.
- Score: 19.316487056356298
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose a framework for verifiable and compositional reinforcement
learning (RL) in which a collection of RL subsystems, each of which learns to
accomplish a separate subtask, are composed to achieve an overall task. The
framework consists of a high-level model, represented as a parametric Markov
decision process, which is used to plan and analyze compositions of subsystems,
and of the collection of low-level subsystems themselves. The subsystems are
implemented as deep RL agents operating under partial observability. By
defining interfaces between the subsystems, the framework enables automatic
decompositions of task specifications, e.g., reach a target set of states with
a probability of at least 0.95, into individual subtask specifications, i.e.
achieve the subsystem's exit conditions with at least some minimum probability,
given that its entry conditions are met. This in turn allows for the
independent training and testing of the subsystems. We present theoretical
results guaranteeing that if each subsystem learns a policy satisfying its
subtask specification, then their composition is guaranteed to satisfy the
overall task specification. Conversely, if the subtask specifications cannot
all be satisfied by the learned policies, we present a method, formulated as
the problem of finding an optimal set of parameters in the high-level model, to
automatically update the subtask specifications to account for the observed
shortcomings. The result is an iterative procedure for defining subtask
specifications, and for training the subsystems to meet them. Experimental
results demonstrate the presented framework's novel capabilities in
environments with both full and partial observability, discrete and continuous
state and action spaces, as well as deterministic and stochastic dynamics.
Related papers
- Logical Specifications-guided Dynamic Task Sampling for Reinforcement Learning Agents [9.529492371336286]
Reinforcement Learning (RL) has made significant strides in enabling artificial agents to learn diverse behaviors.
We propose a novel approach, called Logical Specifications-guided Dynamic Task Sampling (LSTS)
LSTS learns a set of RL policies to guide an agent from an initial state to a goal state based on a high-level task specification.
arXiv Detail & Related papers (2024-02-06T04:00:21Z) - Efficient Reactive Synthesis Using Mode Decomposition [0.0]
We propose a novel decomposition algorithm based on modes.
The input to our algorithm is the original specification and the description of the modes.
We show how to generate sub-specifications automatically and we prove that if all sub-problems are realizable the full specification is realizable.
arXiv Detail & Related papers (2023-12-14T08:01:35Z) - Compositional Policy Learning in Stochastic Control Systems with Formal
Guarantees [0.0]
Reinforcement learning has shown promising results in learning neural network policies for complicated control tasks.
We propose a novel method for learning a composition of neural network policies in environments.
A formal certificate guarantees that a specification over the policy's behavior is satisfied with the desired probability.
arXiv Detail & Related papers (2023-12-03T17:04:18Z) - Verified Compositional Neuro-Symbolic Control for Stochastic Systems
with Temporal Logic Tasks [11.614036749291216]
Several methods have been proposed recently to learn neural network (NN) controllers for autonomous agents.
A key challenge within these approaches is that they often lack safety guarantees or the provided guarantees are impractical.
This paper aims to address this challenge by checking if there exists a temporal composition of the trained NN controllers.
arXiv Detail & Related papers (2023-11-17T20:51:24Z) - Hybrid Rule-Neural Coreference Resolution System based on Actor-Critic
Learning [53.73316523766183]
Coreference resolution systems need to tackle two main tasks.
One task is to detect all of the potential mentions, the other is to learn the linking of an antecedent for each possible mention.
We propose a hybrid rule-neural coreference resolution system based on actor-critic learning.
arXiv Detail & Related papers (2022-12-20T08:55:47Z) - Combining Modular Skills in Multitask Learning [149.8001096811708]
A modular design encourages neural models to disentangle and recombine different facets of knowledge to generalise more systematically to new tasks.
In this work, we assume each task is associated with a subset of latent discrete skills from a (potentially small) inventory.
We find that the modular design of a network significantly increases sample efficiency in reinforcement learning and few-shot generalisation in supervised learning.
arXiv Detail & Related papers (2022-02-28T16:07:19Z) - Learning Multi-Objective Curricula for Deep Reinforcement Learning [55.27879754113767]
Various automatic curriculum learning (ACL) methods have been proposed to improve the sample efficiency and final performance of deep reinforcement learning (DRL)
In this paper, we propose a unified automatic curriculum learning framework to create multi-objective but coherent curricula.
In addition to existing hand-designed curricula paradigms, we further design a flexible memory mechanism to learn an abstract curriculum.
arXiv Detail & Related papers (2021-10-06T19:30:25Z) - Verifiable and Compositional Reinforcement Learning Systems [19.614913673879474]
The framework consists of a high-level model, represented as a parametric Markov decision process (pMDP)
By defining interfaces between the sub-systems, the framework enables automatic decompositons of task specifications.
We present a method, formulated as the problem of finding an optimal set of parameters in the pMDP, to automatically update the sub-task specifications.
arXiv Detail & Related papers (2021-06-07T17:05:14Z) - Learning Task Decomposition with Ordered Memory Policy Network [73.3813423684999]
We propose Ordered Memory Policy Network (OMPN) to discover subtask hierarchy by learning from demonstration.
OMPN can be applied to partially observable environments and still achieve higher task decomposition performance.
Our visualization confirms that the subtask hierarchy can emerge in our model.
arXiv Detail & Related papers (2021-03-19T18:13:35Z) - CausalWorld: A Robotic Manipulation Benchmark for Causal Structure and
Transfer Learning [138.40338621974954]
CausalWorld is a benchmark for causal structure and transfer learning in a robotic manipulation environment.
Tasks consist of constructing 3D shapes from a given set of blocks - inspired by how children learn to build complex structures.
arXiv Detail & Related papers (2020-10-08T23:01:13Z) - Meta Reinforcement Learning with Autonomous Inference of Subtask
Dependencies [57.27944046925876]
We propose and address a novel few-shot RL problem, where a task is characterized by a subtask graph.
Instead of directly learning a meta-policy, we develop a Meta-learner with Subtask Graph Inference.
Our experiment results on two grid-world domains and StarCraft II environments show that the proposed method is able to accurately infer the latent task parameter.
arXiv Detail & Related papers (2020-01-01T17:34:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.