Verifiable and Compositional Reinforcement Learning Systems
- URL: http://arxiv.org/abs/2106.05864v1
- Date: Mon, 7 Jun 2021 17:05:14 GMT
- Title: Verifiable and Compositional Reinforcement Learning Systems
- Authors: Cyrus Neary, Christos Verginis, Murat Cubuktepe, Ufuk Topcu
- Abstract summary: The framework consists of a high-level model, represented as a parametric Markov decision process (pMDP)
By defining interfaces between the sub-systems, the framework enables automatic decompositons of task specifications.
We present a method, formulated as the problem of finding an optimal set of parameters in the pMDP, to automatically update the sub-task specifications.
- Score: 19.614913673879474
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose a novel framework for verifiable and compositional reinforcement
learning (RL) in which a collection of RL sub-systems, each of which learns to
accomplish a separate sub-task, are composed to achieve an overall task. The
framework consists of a high-level model, represented as a parametric Markov
decision process (pMDP) which is used to plan and to analyze compositions of
sub-systems, and of the collection of low-level sub-systems themselves. By
defining interfaces between the sub-systems, the framework enables automatic
decompositons of task specifications, e.g., reach a target set of states with a
probability of at least 0.95, into individual sub-task specifications, i.e.
achieve the sub-system's exit conditions with at least some minimum
probability, given that its entry conditions are met. This in turn allows for
the independent training and testing of the sub-systems; if they each learn a
policy satisfying the appropriate sub-task specification, then their
composition is guaranteed to satisfy the overall task specification.
Conversely, if the sub-task specifications cannot all be satisfied by the
learned policies, we present a method, formulated as the problem of finding an
optimal set of parameters in the pMDP, to automatically update the sub-task
specifications to account for the observed shortcomings. The result is an
iterative procedure for defining sub-task specifications, and for training the
sub-systems to meet them. As an additional benefit, this procedure allows for
particularly challenging or important components of an overall task to be
determined automatically, and focused on, during training. Experimental results
demonstrate the presented framework's novel capabilities.
Related papers
- Task-Aware Harmony Multi-Task Decision Transformer for Offline Reinforcement Learning [70.96345405979179]
The purpose of offline multi-task reinforcement learning (MTRL) is to develop a unified policy applicable to diverse tasks without the need for online environmental interaction.
variations in task content and complexity pose significant challenges in policy formulation.
We introduce the Harmony Multi-Task Decision Transformer (HarmoDT), a novel solution designed to identify an optimal harmony subspace of parameters for each task.
arXiv Detail & Related papers (2024-11-02T05:49:14Z) - Efficient Reactive Synthesis Using Mode Decomposition [0.0]
We propose a novel decomposition algorithm based on modes.
The input to our algorithm is the original specification and the description of the modes.
We show how to generate sub-specifications automatically and we prove that if all sub-problems are realizable the full specification is realizable.
arXiv Detail & Related papers (2023-12-14T08:01:35Z) - Verifiable Reinforcement Learning Systems via Compositionality [19.316487056356298]
We propose a framework for verifiable and compositional reinforcement learning (RL) in which a collection of RL subsystems are composed to achieve an overall task.
We present theoretical results guaranteeing that if each subsystem learns a policy satisfying its subtask specification, then its composition is guaranteed to satisfy the overall task specification.
We present a method, formulated as the problem of finding an optimal set of parameters in the high-level model, to automatically update the subtask specifications to account for the observed shortcomings.
arXiv Detail & Related papers (2023-09-09T17:11:44Z) - Reinforcement Learning with Success Induced Task Prioritization [68.8204255655161]
We introduce Success Induced Task Prioritization (SITP), a framework for automatic curriculum learning.
The algorithm selects the order of tasks that provide the fastest learning for agents.
We demonstrate that SITP matches or surpasses the results of other curriculum design methods.
arXiv Detail & Related papers (2022-12-30T12:32:43Z) - Combining Modular Skills in Multitask Learning [149.8001096811708]
A modular design encourages neural models to disentangle and recombine different facets of knowledge to generalise more systematically to new tasks.
In this work, we assume each task is associated with a subset of latent discrete skills from a (potentially small) inventory.
We find that the modular design of a network significantly increases sample efficiency in reinforcement learning and few-shot generalisation in supervised learning.
arXiv Detail & Related papers (2022-02-28T16:07:19Z) - Quantifying Adaptability in Pre-trained Language Models with 500 Tasks [60.0364822929442]
We present a large-scale empirical study of the features and limits of LM adaptability using a new benchmark, TaskBench500.
We evaluate three facets of adaptability, finding that adaptation procedures differ dramatically in their ability to memorize small datasets.
Our experiments show that adaptability to new tasks, like generalization to new examples, can be systematically described and understood.
arXiv Detail & Related papers (2021-12-06T18:00:25Z) - Exploring Relational Context for Multi-Task Dense Prediction [76.86090370115]
We consider a multi-task environment for dense prediction tasks, represented by a common backbone and independent task-specific heads.
We explore various attention-based contexts, such as global and local, in the multi-task setting.
We propose an Adaptive Task-Relational Context module, which samples the pool of all available contexts for each task pair.
arXiv Detail & Related papers (2021-04-28T16:45:56Z) - CausalWorld: A Robotic Manipulation Benchmark for Causal Structure and
Transfer Learning [138.40338621974954]
CausalWorld is a benchmark for causal structure and transfer learning in a robotic manipulation environment.
Tasks consist of constructing 3D shapes from a given set of blocks - inspired by how children learn to build complex structures.
arXiv Detail & Related papers (2020-10-08T23:01:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.