Compositional Reinforcement Learning from Logical Specifications
- URL: http://arxiv.org/abs/2106.13906v1
- Date: Fri, 25 Jun 2021 22:54:28 GMT
- Title: Compositional Reinforcement Learning from Logical Specifications
- Authors: Kishor Jothimurugan, Suguman Bansal, Osbert Bastani and Rajeev Alur
- Abstract summary: Recent approaches automatically generate a reward function from a given specification and use a suitable reinforcement learning algorithm to learn a policy.
We develop a compositional learning approach, called DiRL, that interleaves high-level planning and reinforcement learning.
Our approach then incorporates reinforcement learning to learn neural network policies for each edge (sub-task) within a Dijkstra-style planning algorithm to compute a high-level plan in the graph.
- Score: 21.193231846438895
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study the problem of learning control policies for complex tasks given by
logical specifications. Recent approaches automatically generate a reward
function from a given specification and use a suitable reinforcement learning
algorithm to learn a policy that maximizes the expected reward. These
approaches, however, scale poorly to complex tasks that require high-level
planning. In this work, we develop a compositional learning approach, called
DiRL, that interleaves high-level planning and reinforcement learning. First,
DiRL encodes the specification as an abstract graph; intuitively, vertices and
edges of the graph correspond to regions of the state space and simpler
sub-tasks, respectively. Our approach then incorporates reinforcement learning
to learn neural network policies for each edge (sub-task) within a
Dijkstra-style planning algorithm to compute a high-level plan in the graph. An
evaluation of the proposed approach on a set of challenging control benchmarks
with continuous state and action spaces demonstrates that it outperforms
state-of-the-art baselines.
Related papers
- Can Graph Learning Improve Planning in LLM-based Agents? [61.47027387839096]
Task planning in language agents is emerging as an important research topic alongside the development of large language models (LLMs)
In this paper, we explore graph learning-based methods for task planning, a direction that is to the prevalent focus on prompt design.
Our interest in graph learning stems from a theoretical discovery: the biases of attention and auto-regressive loss impede LLMs' ability to effectively navigate decision-making on graphs.
arXiv Detail & Related papers (2024-05-29T14:26:24Z) - Learning Logic Specifications for Policy Guidance in POMDPs: an
Inductive Logic Programming Approach [57.788675205519986]
We learn high-quality traces from POMDP executions generated by any solver.
We exploit data- and time-efficient Indu Logic Programming (ILP) to generate interpretable belief-based policy specifications.
We show that learneds expressed in Answer Set Programming (ASP) yield performance superior to neural networks and similar to optimal handcrafted task-specifics within lower computational time.
arXiv Detail & Related papers (2024-02-29T15:36:01Z) - Learning High-level Semantic-Relational Concepts for SLAM [10.528810470934781]
We propose an algorithm for learning high-level semantic-relational concepts that can be inferred from the low-level factor graph.
We validate our method in both simulated and real datasets demonstrating improved performance over two baseline approaches.
arXiv Detail & Related papers (2023-09-30T14:54:31Z) - Feudal Graph Reinforcement Learning [18.069747511100132]
Graph-based representations and message-passing modular policies constitute prominent approaches to tackling composable control problems in reinforcement learning (RL)
We propose a novel methodology, named Feudal Graph Reinforcement Learning (FGRL), that addresses such challenges by relying on hierarchical RL and a pyramidal message-passing architecture.
In particular, FGRL defines a hierarchy of policies where high-level commands are propagated from the top of the hierarchy down through a layered graph structure.
arXiv Detail & Related papers (2023-04-11T09:51:13Z) - Imitating Graph-Based Planning with Goal-Conditioned Policies [72.61631088613048]
We present a self-imitation scheme which distills a subgoal-conditioned policy into the target-goal-conditioned policy.
We empirically show that our method can significantly boost the sample-efficiency of the existing goal-conditioned RL methods.
arXiv Detail & Related papers (2023-03-20T14:51:10Z) - Hierarchical Imitation Learning with Vector Quantized Models [77.67190661002691]
We propose to use reinforcement learning to identify subgoals in expert trajectories.
We build a vector-quantized generative model for the identified subgoals to perform subgoal-level planning.
In experiments, the algorithm excels at solving complex, long-horizon decision-making problems outperforming state-of-the-art.
arXiv Detail & Related papers (2023-01-30T15:04:39Z) - Goal Agnostic Planning using Maximum Likelihood Paths in Hypergraph
World Models [1.370633147306388]
We present a hypergraph-based machine learning algorithm, a datastructure--driven maintenance method, and a planning algorithm based on a probabilistic application of Dijkstra's algorithm.
We prove that the algorithm determines optimal solutions within the problem space, mathematically bound learning performance, and supply a mathematical model analyzing system state progression through time.
arXiv Detail & Related papers (2021-10-18T16:22:33Z) - Abstract Value Iteration for Hierarchical Reinforcement Learning [23.08652058034536]
We propose a novel hierarchical reinforcement learning framework for control with continuous state and action spaces.
A key challenge is that the ADP may not be Markov, which we address by proposing two algorithms for planning in the ADP.
Our approach outperforms state-of-the-art hierarchical reinforcement learning algorithms on several challenging benchmarks.
arXiv Detail & Related papers (2020-10-29T14:41:42Z) - Divide-and-Conquer Monte Carlo Tree Search For Goal-Directed Planning [78.65083326918351]
We consider alternatives to an implicit sequential planning assumption.
We propose Divide-and-Conquer Monte Carlo Tree Search (DC-MCTS) for approximating the optimal plan.
We show that this algorithmic flexibility over planning order leads to improved results in navigation tasks in grid-worlds.
arXiv Detail & Related papers (2020-04-23T18:08:58Z) - Graph Ordering: Towards the Optimal by Learning [69.72656588714155]
Graph representation learning has achieved a remarkable success in many graph-based applications, such as node classification, prediction, and community detection.
However, for some kind of graph applications, such as graph compression and edge partition, it is very hard to reduce them to some graph representation learning tasks.
In this paper, we propose to attack the graph ordering problem behind such applications by a novel learning approach.
arXiv Detail & Related papers (2020-01-18T09:14:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.