Induction and Exploitation of Subgoal Automata for Reinforcement
Learning
- URL: http://arxiv.org/abs/2009.03855v2
- Date: Tue, 16 Mar 2021 15:25:10 GMT
- Title: Induction and Exploitation of Subgoal Automata for Reinforcement
Learning
- Authors: Daniel Furelos-Blanco, Mark Law, Anders Jonsson, Krysia Broda and
Alessandra Russo
- Abstract summary: We present ISA, an approach for learning and exploiting subgoals in episodic reinforcement learning (RL) tasks.
ISA interleaves reinforcement learning with the induction of a subgoal automaton, an automaton whose edges are labeled by the task's subgoals.
A subgoal automaton also consists of two special states: a state indicating the successful completion of the task, and a state indicating that the task has finished without succeeding.
- Score: 75.55324974788475
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper we present ISA, an approach for learning and exploiting
subgoals in episodic reinforcement learning (RL) tasks. ISA interleaves
reinforcement learning with the induction of a subgoal automaton, an automaton
whose edges are labeled by the task's subgoals expressed as propositional logic
formulas over a set of high-level events. A subgoal automaton also consists of
two special states: a state indicating the successful completion of the task,
and a state indicating that the task has finished without succeeding. A
state-of-the-art inductive logic programming system is used to learn a subgoal
automaton that covers the traces of high-level events observed by the RL agent.
When the currently exploited automaton does not correctly recognize a trace,
the automaton learner induces a new automaton that covers that trace. The
interleaving process guarantees the induction of automata with the minimum
number of states, and applies a symmetry breaking mechanism to shrink the
search space whilst remaining complete. We evaluate ISA in several gridworld
and continuous state space problems using different RL algorithms that leverage
the automaton structures. We provide an in-depth empirical analysis of the
automaton learning performance in terms of the traces, the symmetry breaking
and specific restrictions imposed on the final learnable automaton. For each
class of RL problem, we show that the learned automata can be successfully
exploited to learn policies that reach the goal, achieving an average reward
comparable to the case where automata are not learned but handcrafted and given
beforehand.
Related papers
- The Foundations of Computational Management: A Systematic Approach to
Task Automation for the Integration of Artificial Intelligence into Existing
Workflows [55.2480439325792]
This article introduces Computational Management, a systematic approach to task automation.
The article offers three easy step-by-step procedures to begin the process of implementing AI within a workflow.
arXiv Detail & Related papers (2024-02-07T01:45:14Z) - Logical Specifications-guided Dynamic Task Sampling for Reinforcement Learning Agents [9.529492371336286]
Reinforcement Learning (RL) has made significant strides in enabling artificial agents to learn diverse behaviors.
We propose a novel approach, called Logical Specifications-guided Dynamic Task Sampling (LSTS)
LSTS learns a set of RL policies to guide an agent from an initial state to a goal state based on a high-level task specification.
arXiv Detail & Related papers (2024-02-06T04:00:21Z) - AutoAct: Automatic Agent Learning from Scratch for QA via Self-Planning [54.47116888545878]
AutoAct is an automatic agent learning framework for QA.
It does not rely on large-scale annotated data and synthetic planning trajectories from closed-source models.
arXiv Detail & Related papers (2024-01-10T16:57:24Z) - TaskBench: Benchmarking Large Language Models for Task Automation [82.2932794189585]
We introduce TaskBench, a framework to evaluate the capability of large language models (LLMs) in task automation.
Specifically, task decomposition, tool selection, and parameter prediction are assessed.
Our approach combines automated construction with rigorous human verification, ensuring high consistency with human evaluation.
arXiv Detail & Related papers (2023-11-30T18:02:44Z) - Stabilizing Contrastive RL: Techniques for Robotic Goal Reaching from
Offline Data [101.43350024175157]
Self-supervised learning has the potential to decrease the amount of human annotation and engineering effort required to learn control strategies.
Our work builds on prior work showing that the reinforcement learning (RL) itself can be cast as a self-supervised problem.
We demonstrate that a self-supervised RL algorithm based on contrastive learning can solve real-world, image-based robotic manipulation tasks.
arXiv Detail & Related papers (2023-06-06T01:36:56Z) - Reward-Machine-Guided, Self-Paced Reinforcement Learning [30.42334205249944]
We develop a self-paced reinforcement learning algorithm guided by reward machines.
The proposed algorithm achieves optimal behavior reliably even in cases in which existing baselines cannot make any meaningful progress.
It also decreases the curriculum length and reduces the variance in the curriculum generation process by up to one-fourth and four orders of magnitude, respectively.
arXiv Detail & Related papers (2023-05-25T22:13:37Z) - Automaton-Guided Curriculum Generation for Reinforcement Learning Agents [14.20447398253189]
Automaton-guided Curriculum Learning (AGCL) is a novel method for automatically generating curricula for the target task in the form of Directed Acyclic Graphs (DAGs)
AGCL encodes the specification in the form of a deterministic finite automaton (DFA), and then uses the DFA along with the Object-Oriented MDP representation to generate a curriculum as a DAG.
Experiments in gridworld and physics-based simulated robotics domains show that the curricula produced by AGCL achieve improved time-to-threshold performance.
arXiv Detail & Related papers (2023-04-11T15:14:31Z) - Learning Task Automata for Reinforcement Learning using Hidden Markov
Models [37.69303106863453]
This paper proposes a novel pipeline for learning non-Markovian task specifications as succinct finite-state task automata'
We learn a product MDP, a model composed of the specification's automaton and the environment's MDP, by treating the product MDP as a partially observable MDP and using the well-known Baum-Welch algorithm for learning hidden Markov models.
Our learnt task automaton enables the decomposition of a task into its constituent sub-tasks, which improves the rate at which an RL agent can later synthesise an optimal policy.
arXiv Detail & Related papers (2022-08-25T02:58:23Z) - Continuous Motion Planning with Temporal Logic Specifications using Deep
Neural Networks [16.296473750342464]
We propose a model-free reinforcement learning method to synthesize control policies for motion planning problems.
The robot is modelled as a discrete Markovtime decision process (MDP) with continuous state and action spaces.
We train deep neural networks to approximate the value function and policy using an actorcritic reinforcement learning method.
arXiv Detail & Related papers (2020-04-02T17:58:03Z) - Certified Reinforcement Learning with Logic Guidance [78.2286146954051]
We propose a model-free RL algorithm that enables the use of Linear Temporal Logic (LTL) to formulate a goal for unknown continuous-state/action Markov Decision Processes (MDPs)
The algorithm is guaranteed to synthesise a control policy whose traces satisfy the specification with maximal probability.
arXiv Detail & Related papers (2019-02-02T20:09:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.