Hierarchies of Reward Machines
- URL: http://arxiv.org/abs/2205.15752v2
- Date: Sun, 4 Jun 2023 09:07:56 GMT
- Title: Hierarchies of Reward Machines
- Authors: Daniel Furelos-Blanco, Mark Law, Anders Jonsson, Krysia Broda,
Alessandra Russo
- Abstract summary: Reward machines (RMs) are a recent formalism for representing the reward function of a reinforcement learning task through a finite-state machine.
We propose a formalism for further abstracting the subtask structure by endowing an RM with the ability to call other RMs.
- Score: 75.55324974788475
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Reward machines (RMs) are a recent formalism for representing the reward
function of a reinforcement learning task through a finite-state machine whose
edges encode subgoals of the task using high-level events. The structure of RMs
enables the decomposition of a task into simpler and independently solvable
subtasks that help tackle long-horizon and/or sparse reward tasks. We propose a
formalism for further abstracting the subtask structure by endowing an RM with
the ability to call other RMs, thus composing a hierarchy of RMs (HRM). We
exploit HRMs by treating each call to an RM as an independently solvable
subtask using the options framework, and describe a curriculum-based method to
learn HRMs from traces observed by the agent. Our experiments reveal that
exploiting a handcrafted HRM leads to faster convergence than with a flat HRM,
and that learning an HRM is feasible in cases where its equivalent flat
representation is not.
Related papers
- Learning Robust Reward Machines from Noisy Labels [46.18428376996514]
PROB-IRM is an approach that learns robust reward machines (RMs) for reinforcement learning (RL) agents from noisy execution traces.
We show that PROB-IRM can learn (potentially imperfect) RMs from noisy traces and exploit them to train an RL agent to solve its tasks successfully.
arXiv Detail & Related papers (2024-08-27T08:41:42Z) - MetaRM: Shifted Distributions Alignment via Meta-Learning [52.94381279744458]
Reinforcement Learning from Human Feedback (RLHF) in language model alignment is critically dependent on the capability of the reward model (RM)
We introduce MetaRM, a method leveraging meta-learning to align the RM with the shifted environment distribution.
Extensive experiments demonstrate that MetaRM significantly improves the RM's distinguishing ability in iterative RLHF optimization.
arXiv Detail & Related papers (2024-05-01T10:43:55Z) - Multi-Agent Reinforcement Learning with a Hierarchy of Reward Machines [5.600971575680638]
We study the cooperative Multi-Agent Reinforcement Learning (MARL) problems using Reward Machines (RMs)
We present Multi-Agent Reinforcement Learning with a hierarchy of RMs (MAHRM) that is capable of dealing with more complex scenarios.
Experimental results in three cooperative MARL domains show that MAHRM outperforms other MARL methods using the same prior knowledge of high-level events.
arXiv Detail & Related papers (2024-03-08T06:38:22Z) - Semantically Aligned Task Decomposition in Multi-Agent Reinforcement
Learning [56.26889258704261]
We propose a novel "disentangled" decision-making method, Semantically Aligned task decomposition in MARL (SAMA)
SAMA prompts pretrained language models with chain-of-thought that can suggest potential goals, provide suitable goal decomposition and subgoal allocation as well as self-reflection-based replanning.
SAMA demonstrates considerable advantages in sample efficiency compared to state-of-the-art ASG methods.
arXiv Detail & Related papers (2023-05-18T10:37:54Z) - Learning Reward Machines in Cooperative Multi-Agent Tasks [75.79805204646428]
This paper presents a novel approach to Multi-Agent Reinforcement Learning (MARL)
It combines cooperative task decomposition with the learning of reward machines (RMs) encoding the structure of the sub-tasks.
The proposed method helps deal with the non-Markovian nature of the rewards in partially observable environments.
arXiv Detail & Related papers (2023-03-24T15:12:28Z) - From Cloze to Comprehension: Retrofitting Pre-trained Masked Language
Model to Pre-trained Machine Reader [130.45769668885487]
Pre-trained Machine Reader (PMR) is a novel method for retrofitting masked language models (MLMs) to pre-trained machine reading comprehension (MRC) models without acquiring labeled data.
To build the proposed PMR, we constructed a large volume of general-purpose and high-quality MRC-style training data.
PMR has the potential to serve as a unified model for tackling various extraction and classification tasks in the MRC formulation.
arXiv Detail & Related papers (2022-12-09T10:21:56Z) - Decentralized Graph-Based Multi-Agent Reinforcement Learning Using
Reward Machines [5.34590273802424]
We use a reward machine to encode each agent's task and expose reward function internal structures.
We propose a decentralized graph-based reinforcement learning algorithm that equips each agent with a localized policy.
The effectiveness of the proposed DGRM algorithm is evaluated by two case studies, UAV package delivery and COVID-19 pandemic mitigation.
arXiv Detail & Related papers (2021-09-30T21:41:55Z) - Reward Machines for Cooperative Multi-Agent Reinforcement Learning [30.84689303706561]
In cooperative multi-agent reinforcement learning, a collection of agents learns to interact in a shared environment to achieve a common goal.
We propose the use of reward machines (RM) -- Mealy machines used as structured representations of reward functions -- to encode the team's task.
The proposed novel interpretation of RMs in the multi-agent setting explicitly encodes required teammate interdependencies, allowing the team-level task to be decomposed into sub-tasks for individual agents.
arXiv Detail & Related papers (2020-07-03T23:08:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.