Logic-based Reward Shaping for Multi-Agent Reinforcement Learning
- URL: http://arxiv.org/abs/2206.08881v1
- Date: Fri, 17 Jun 2022 16:30:27 GMT
- Title: Logic-based Reward Shaping for Multi-Agent Reinforcement Learning
- Authors: Ingy ElSayed-Aly and Lu Feng
- Abstract summary: Reinforcement learning relies heavily on exploration to learn from its environment and maximize observed rewards.
Previous work has combined automata and logic based reward shaping with environment assumptions to provide an automatic mechanism to synthesize the reward function based on the task.
This project explores how logic-based reward shaping for Multi-Agent Reinforcement Learning can be designed for different scenarios and tasks.
- Score: 1.5483078145498084
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Reinforcement learning (RL) relies heavily on exploration to learn from its
environment and maximize observed rewards. Therefore, it is essential to design
a reward function that guarantees optimal learning from the received
experience. Previous work has combined automata and logic based reward shaping
with environment assumptions to provide an automatic mechanism to synthesize
the reward function based on the task. However, there is limited work on how to
expand logic-based reward shaping to Multi-Agent Reinforcement Learning (MARL).
The environment will need to consider the joint state in order to keep track of
other agents if the task requires cooperation, thus suffering from the curse of
dimensionality with respect to the number of agents. This project explores how
logic-based reward shaping for MARL can be designed for different scenarios and
tasks. We present a novel method for semi-centralized logic-based MARL reward
shaping that is scalable in the number of agents and evaluate it in multiple
scenarios.
Related papers
- Guiding Multi-agent Multi-task Reinforcement Learning by a Hierarchical Framework with Logical Reward Shaping [16.5526277899717]
This study aims to design a multi-agent cooperative algorithm with logic reward shaping.
Experiments have been conducted on various types of tasks in the Minecraft-like environment.
arXiv Detail & Related papers (2024-11-02T09:03:23Z) - OCALM: Object-Centric Assessment with Language Models [33.10137796492542]
We propose Object-Centric Assessment with Language Models (OCALM) to derive inherently interpretable reward functions for reinforcement learning agents.
OCALM uses the extensive world-knowledge of language models to derive reward functions focused on relational concepts.
arXiv Detail & Related papers (2024-06-24T15:57:48Z) - Entropy-Regularized Token-Level Policy Optimization for Language Agent Reinforcement [67.1393112206885]
Large Language Models (LLMs) have shown promise as intelligent agents in interactive decision-making tasks.
We introduce Entropy-Regularized Token-level Policy Optimization (ETPO), an entropy-augmented RL method tailored for optimizing LLMs at the token level.
We assess the effectiveness of ETPO within a simulated environment that models data science code generation as a series of multi-step interactive tasks.
arXiv Detail & Related papers (2024-02-09T07:45:26Z) - Let's reward step by step: Step-Level reward model as the Navigators for
Reasoning [64.27898739929734]
Process-Supervised Reward Model (PRM) furnishes LLMs with step-by-step feedback during the training phase.
We propose a greedy search algorithm that employs the step-level feedback from PRM to optimize the reasoning pathways explored by LLMs.
To explore the versatility of our approach, we develop a novel method to automatically generate step-level reward dataset for coding tasks and observed similar improved performance in the code generation tasks.
arXiv Detail & Related papers (2023-10-16T05:21:50Z) - Semantically Aligned Task Decomposition in Multi-Agent Reinforcement
Learning [56.26889258704261]
We propose a novel "disentangled" decision-making method, Semantically Aligned task decomposition in MARL (SAMA)
SAMA prompts pretrained language models with chain-of-thought that can suggest potential goals, provide suitable goal decomposition and subgoal allocation as well as self-reflection-based replanning.
SAMA demonstrates considerable advantages in sample efficiency compared to state-of-the-art ASG methods.
arXiv Detail & Related papers (2023-05-18T10:37:54Z) - Learning Reward Machines in Cooperative Multi-Agent Tasks [75.79805204646428]
This paper presents a novel approach to Multi-Agent Reinforcement Learning (MARL)
It combines cooperative task decomposition with the learning of reward machines (RMs) encoding the structure of the sub-tasks.
The proposed method helps deal with the non-Markovian nature of the rewards in partially observable environments.
arXiv Detail & Related papers (2023-03-24T15:12:28Z) - Distributional Reward Estimation for Effective Multi-Agent Deep
Reinforcement Learning [19.788336796981685]
We propose a novel Distributional Reward Estimation framework for effective Multi-Agent Reinforcement Learning (DRE-MARL)
Our main idea is to design the multi-action-branch reward estimation and policy-weighted reward aggregation for stabilized training.
The superiority of the DRE-MARL is demonstrated using benchmark multi-agent scenarios, compared with the SOTA baselines in terms of both effectiveness and robustness.
arXiv Detail & Related papers (2022-10-14T08:31:45Z) - Multitask Adaptation by Retrospective Exploration with Learned World
Models [77.34726150561087]
We propose a meta-learned addressing model called RAMa that provides training samples for the MBRL agent taken from task-agnostic storage.
The model is trained to maximize the expected agent's performance by selecting promising trajectories solving prior tasks from the storage.
arXiv Detail & Related papers (2021-10-25T20:02:57Z) - REMAX: Relational Representation for Multi-Agent Exploration [13.363887960136102]
We propose a learning-based exploration strategy to generate the initial states of a game.
We demonstrate that our method improves the training and performance of the MARL model more than the existing exploration methods.
arXiv Detail & Related papers (2020-08-12T10:23:35Z) - Reward Machines for Cooperative Multi-Agent Reinforcement Learning [30.84689303706561]
In cooperative multi-agent reinforcement learning, a collection of agents learns to interact in a shared environment to achieve a common goal.
We propose the use of reward machines (RM) -- Mealy machines used as structured representations of reward functions -- to encode the team's task.
The proposed novel interpretation of RMs in the multi-agent setting explicitly encodes required teammate interdependencies, allowing the team-level task to be decomposed into sub-tasks for individual agents.
arXiv Detail & Related papers (2020-07-03T23:08:14Z) - Meta Reinforcement Learning with Autonomous Inference of Subtask
Dependencies [57.27944046925876]
We propose and address a novel few-shot RL problem, where a task is characterized by a subtask graph.
Instead of directly learning a meta-policy, we develop a Meta-learner with Subtask Graph Inference.
Our experiment results on two grid-world domains and StarCraft II environments show that the proposed method is able to accurately infer the latent task parameter.
arXiv Detail & Related papers (2020-01-01T17:34:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.