Agent-Temporal Attention for Reward Redistribution in Episodic
Multi-Agent Reinforcement Learning
- URL: http://arxiv.org/abs/2201.04612v1
- Date: Wed, 12 Jan 2022 18:35:46 GMT
- Title: Agent-Temporal Attention for Reward Redistribution in Episodic
Multi-Agent Reinforcement Learning
- Authors: Baicen Xiao, Bhaskar Ramasubramanian, Radha Poovendran
- Abstract summary: This paper focuses on developing methods to learn a temporal redistribution of the episodic reward to obtain a dense reward signal.
We introduce Agent-Temporal Attention for Reward Redistribution in Episodic Multi-Agent Reinforcement Learning (AREL) to address these two challenges.
AREL results in higher rewards in Particle World, and improved win rates in StarCraft compared to three state-of-the-art reward redistribution methods.
- Score: 9.084006156825632
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper considers multi-agent reinforcement learning (MARL) tasks where
agents receive a shared global reward at the end of an episode. The delayed
nature of this reward affects the ability of the agents to assess the quality
of their actions at intermediate time-steps. This paper focuses on developing
methods to learn a temporal redistribution of the episodic reward to obtain a
dense reward signal. Solving such MARL problems requires addressing two
challenges: identifying (1) relative importance of states along the length of
an episode (along time), and (2) relative importance of individual agents'
states at any single time-step (among agents). In this paper, we introduce
Agent-Temporal Attention for Reward Redistribution in Episodic Multi-Agent
Reinforcement Learning (AREL) to address these two challenges. AREL uses
attention mechanisms to characterize the influence of actions on state
transitions along trajectories (temporal attention), and how each agent is
affected by other agents at each time-step (agent attention). The redistributed
rewards predicted by AREL are dense, and can be integrated with any given MARL
algorithm. We evaluate AREL on challenging tasks from the Particle World
environment and the StarCraft Multi-Agent Challenge. AREL results in higher
rewards in Particle World, and improved win rates in StarCraft compared to
three state-of-the-art reward redistribution methods. Our code is available at
https://github.com/baicenxiao/AREL.
Related papers
- Agent-Time Attention for Sparse Rewards Multi-Agent Reinforcement
Learning [36.93626032028901]
Sparse and delayed rewards pose a challenge to single agent reinforcement learning.
We propose Agent-Time Attention (ATA), a neural network model with auxiliary losses for redistributing sparse and delayed rewards.
arXiv Detail & Related papers (2022-10-31T17:54:51Z) - ELIGN: Expectation Alignment as a Multi-Agent Intrinsic Reward [29.737986509769808]
We propose a self-supervised intrinsic reward ELIGN - expectation alignment.
Similar to how animals collaborate in a decentralized manner with those in their vicinity, agents trained with expectation alignment learn behaviors that match their neighbors' expectations.
We show that agent coordination improves through expectation alignment because agents learn to divide tasks amongst themselves, break coordination symmetries, and confuse adversaries.
arXiv Detail & Related papers (2022-10-09T22:24:44Z) - Off-Beat Multi-Agent Reinforcement Learning [62.833358249873704]
We investigate model-free multi-agent reinforcement learning (MARL) in environments where off-beat actions are prevalent.
We propose a novel episodic memory, LeGEM, for model-free MARL algorithms.
We evaluate LeGEM on various multi-agent scenarios with off-beat actions, including Stag-Hunter Game, Quarry Game, Afforestation Game, and StarCraft II micromanagement tasks.
arXiv Detail & Related papers (2022-05-27T02:21:04Z) - LDSA: Learning Dynamic Subtask Assignment in Cooperative Multi-Agent
Reinforcement Learning [122.47938710284784]
We propose a novel framework for learning dynamic subtask assignment (LDSA) in cooperative MARL.
To reasonably assign agents to different subtasks, we propose an ability-based subtask selection strategy.
We show that LDSA learns reasonable and effective subtask assignment for better collaboration.
arXiv Detail & Related papers (2022-05-05T10:46:16Z) - On the Use and Misuse of Absorbing States in Multi-agent Reinforcement
Learning [55.95253619768565]
Current MARL algorithms assume that the number of agents within a group remains fixed throughout an experiment.
In many practical problems, an agent may terminate before their teammates.
We present a novel architecture for an existing state-of-the-art MARL algorithm which uses attention instead of a fully connected layer with absorbing states.
arXiv Detail & Related papers (2021-11-10T23:45:08Z) - AoI-Aware Resource Allocation for Platoon-Based C-V2X Networks via
Multi-Agent Multi-Task Reinforcement Learning [22.890835786710316]
This paper investigates the problem of age of information (AoI) aware radio resource management for a platooning system.
Multiple autonomous platoons exploit the cellular wireless vehicle-to-everything (C-V2X) communication technology to disseminate the cooperative awareness messages (CAMs) to their followers.
We exploit a distributed resource allocation framework based on multi-agent reinforcement learning (MARL), where each platoon leader (PL) acts as an agent and interacts with the environment to learn its optimal policy.
arXiv Detail & Related papers (2021-05-10T08:39:56Z) - Multi-Agent Collaboration via Reward Attribution Decomposition [75.36911959491228]
We propose Collaborative Q-learning (CollaQ) that achieves state-of-the-art performance in the StarCraft multi-agent challenge.
CollaQ is evaluated on various StarCraft Attribution maps and shows that it outperforms existing state-of-the-art techniques.
arXiv Detail & Related papers (2020-10-16T17:42:11Z) - UneVEn: Universal Value Exploration for Multi-Agent Reinforcement
Learning [53.73686229912562]
We propose a novel MARL approach called Universal Value Exploration (UneVEn)
UneVEn learns a set of related tasks simultaneously with a linear decomposition of universal successor features.
Empirical results on a set of exploration games, challenging cooperative predator-prey tasks requiring significant coordination among agents, and StarCraft II micromanagement benchmarks show that UneVEn can solve tasks where other state-of-the-art MARL methods fail.
arXiv Detail & Related papers (2020-10-06T19:08:47Z) - Learning to Incentivize Other Learning Agents [73.03133692589532]
We show how to equip RL agents with the ability to give rewards directly to other agents, using a learned incentive function.
Such agents significantly outperform standard RL and opponent-shaping agents in challenging general-sum Markov games.
Our work points toward more opportunities and challenges along the path to ensure the common good in a multi-agent future.
arXiv Detail & Related papers (2020-06-10T20:12:38Z) - Reward Design in Cooperative Multi-agent Reinforcement Learning for
Packet Routing [8.021402935358488]
We study reward design problem in cooperative multi-agent reinforcement learning (MARL) based on packet routing environments.
We show that the above two reward signals are prone to produce suboptimal policies.
We design some mixed reward signals, which are off-the-shelf to learn better policies.
arXiv Detail & Related papers (2020-03-05T02:27:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.