Related papers: Verifiable Reinforcement Learning Systems via Compositionality

Verifiable Reinforcement Learning Systems via Compositionality

URL: http://arxiv.org/abs/2309.06420v1
Date: Sat, 9 Sep 2023 17:11:44 GMT
Title: Verifiable Reinforcement Learning Systems via Compositionality
Authors: Cyrus Neary, Aryaman Singh Samyal, Christos Verginis, Murat Cubuktepe, Ufuk Topcu
Abstract summary: We propose a framework for verifiable and compositional reinforcement learning (RL) in which a collection of RL subsystems are composed to achieve an overall task. We present theoretical results guaranteeing that if each subsystem learns a policy satisfying its subtask specification, then its composition is guaranteed to satisfy the overall task specification. We present a method, formulated as the problem of finding an optimal set of parameters in the high-level model, to automatically update the subtask specifications to account for the observed shortcomings.
Score: 19.316487056356298
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We propose a framework for verifiable and compositional reinforcement learning (RL) in which a collection of RL subsystems, each of which learns to accomplish a separate subtask, are composed to achieve an overall task. The framework consists of a high-level model, represented as a parametric Markov decision process, which is used to plan and analyze compositions of subsystems, and of the collection of low-level subsystems themselves. The subsystems are implemented as deep RL agents operating under partial observability. By defining interfaces between the subsystems, the framework enables automatic decompositions of task specifications, e.g., reach a target set of states with a probability of at least 0.95, into individual subtask specifications, i.e. achieve the subsystem's exit conditions with at least some minimum probability, given that its entry conditions are met. This in turn allows for the independent training and testing of the subsystems. We present theoretical results guaranteeing that if each subsystem learns a policy satisfying its subtask specification, then their composition is guaranteed to satisfy the overall task specification. Conversely, if the subtask specifications cannot all be satisfied by the learned policies, we present a method, formulated as the problem of finding an optimal set of parameters in the high-level model, to automatically update the subtask specifications to account for the observed shortcomings. The result is an iterative procedure for defining subtask specifications, and for training the subsystems to meet them. Experimental results demonstrate the presented framework's novel capabilities in environments with both full and partial observability, discrete and continuous state and action spaces, as well as deterministic and stochastic dynamics.

Related papers

Platform-Aware Mission Planning [50.56223680851687]
We introduce the problem of Platform-Aware Mission Planning (PAMP), addressing it in the setting of temporal durative actions. The first baseline approach amalgamates the mission and platform levels, while the second is based on an abstraction-refinement loop. We prove the soundness and completeness of the proposed approaches and validate them experimentally.
arXiv Detail & Related papers (2025-01-16T16:20:37Z)
SLA Management in Reconfigurable Multi-Agent RAG: A Systems Approach to Question Answering [0.0]
Real-world applications impose diverse Service Level Agreements (SLAs) and Quality of Service (QoS) requirements. We present a systems-oriented approach to multi-agent RAG tailored for real-world Question Answering (QA) applications.
arXiv Detail & Related papers (2024-12-07T01:32:13Z)
Logical Specifications-guided Dynamic Task Sampling for Reinforcement Learning Agents [9.529492371336286]
Reinforcement Learning (RL) has made significant strides in enabling artificial agents to learn diverse behaviors. We propose a novel approach, called Logical Specifications-guided Dynamic Task Sampling (LSTS) LSTS learns a set of RL policies to guide an agent from an initial state to a goal state based on a high-level task specification.
arXiv Detail & Related papers (2024-02-06T04:00:21Z)
Efficient Reactive Synthesis Using Mode Decomposition [0.0]
We propose a novel decomposition algorithm based on modes. The input to our algorithm is the original specification and the description of the modes. We show how to generate sub-specifications automatically and we prove that if all sub-problems are realizable the full specification is realizable.
arXiv Detail & Related papers (2023-12-14T08:01:35Z)
Compositional Policy Learning in Stochastic Control Systems with Formal Guarantees [0.0]
Reinforcement learning has shown promising results in learning neural network policies for complicated control tasks. We propose a novel method for learning a composition of neural network policies in environments. A formal certificate guarantees that a specification over the policy's behavior is satisfied with the desired probability.
arXiv Detail & Related papers (2023-12-03T17:04:18Z)
Verified Compositional Neuro-Symbolic Control for Stochastic Systems with Temporal Logic Tasks [11.614036749291216]
Several methods have been proposed recently to learn neural network (NN) controllers for autonomous agents. A key challenge within these approaches is that they often lack safety guarantees or the provided guarantees are impractical. This paper aims to address this challenge by checking if there exists a temporal composition of the trained NN controllers.
arXiv Detail & Related papers (2023-11-17T20:51:24Z)
Hybrid Rule-Neural Coreference Resolution System based on Actor-Critic Learning [53.73316523766183]
Coreference resolution systems need to tackle two main tasks. One task is to detect all of the potential mentions, the other is to learn the linking of an antecedent for each possible mention. We propose a hybrid rule-neural coreference resolution system based on actor-critic learning.
arXiv Detail & Related papers (2022-12-20T08:55:47Z)
Combining Modular Skills in Multitask Learning [149.8001096811708]
A modular design encourages neural models to disentangle and recombine different facets of knowledge to generalise more systematically to new tasks. In this work, we assume each task is associated with a subset of latent discrete skills from a (potentially small) inventory. We find that the modular design of a network significantly increases sample efficiency in reinforcement learning and few-shot generalisation in supervised learning.
arXiv Detail & Related papers (2022-02-28T16:07:19Z)
Learning Multi-Objective Curricula for Deep Reinforcement Learning [55.27879754113767]
Various automatic curriculum learning (ACL) methods have been proposed to improve the sample efficiency and final performance of deep reinforcement learning (DRL) In this paper, we propose a unified automatic curriculum learning framework to create multi-objective but coherent curricula. In addition to existing hand-designed curricula paradigms, we further design a flexible memory mechanism to learn an abstract curriculum.
arXiv Detail & Related papers (2021-10-06T19:30:25Z)
Verifiable and Compositional Reinforcement Learning Systems [19.614913673879474]
The framework consists of a high-level model, represented as a parametric Markov decision process (pMDP) By defining interfaces between the sub-systems, the framework enables automatic decompositons of task specifications. We present a method, formulated as the problem of finding an optimal set of parameters in the pMDP, to automatically update the sub-task specifications.
arXiv Detail & Related papers (2021-06-07T17:05:14Z)
Learning Task Decomposition with Ordered Memory Policy Network [73.3813423684999]
We propose Ordered Memory Policy Network (OMPN) to discover subtask hierarchy by learning from demonstration. OMPN can be applied to partially observable environments and still achieve higher task decomposition performance. Our visualization confirms that the subtask hierarchy can emerge in our model.
arXiv Detail & Related papers (2021-03-19T18:13:35Z)
CausalWorld: A Robotic Manipulation Benchmark for Causal Structure and Transfer Learning [138.40338621974954]
CausalWorld is a benchmark for causal structure and transfer learning in a robotic manipulation environment. Tasks consist of constructing 3D shapes from a given set of blocks - inspired by how children learn to build complex structures.
arXiv Detail & Related papers (2020-10-08T23:01:13Z)
Meta Reinforcement Learning with Autonomous Inference of Subtask Dependencies [57.27944046925876]
We propose and address a novel few-shot RL problem, where a task is characterized by a subtask graph. Instead of directly learning a meta-policy, we develop a Meta-learner with Subtask Graph Inference. Our experiment results on two grid-world domains and StarCraft II environments show that the proposed method is able to accurately infer the latent task parameter.
arXiv Detail & Related papers (2020-01-01T17:34:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.