Related papers: Agent-Temporal Attention for Reward Redistribution in Episodic Multi-Agent Reinforcement Learning

Agent-Temporal Attention for Reward Redistribution in Episodic Multi-Agent Reinforcement Learning

URL: http://arxiv.org/abs/2201.04612v1
Date: Wed, 12 Jan 2022 18:35:46 GMT
Title: Agent-Temporal Attention for Reward Redistribution in Episodic Multi-Agent Reinforcement Learning
Authors: Baicen Xiao, Bhaskar Ramasubramanian, Radha Poovendran
Abstract summary: This paper focuses on developing methods to learn a temporal redistribution of the episodic reward to obtain a dense reward signal. We introduce Agent-Temporal Attention for Reward Redistribution in Episodic Multi-Agent Reinforcement Learning (AREL) to address these two challenges. AREL results in higher rewards in Particle World, and improved win rates in StarCraft compared to three state-of-the-art reward redistribution methods.
Score: 9.084006156825632
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This paper considers multi-agent reinforcement learning (MARL) tasks where agents receive a shared global reward at the end of an episode. The delayed nature of this reward affects the ability of the agents to assess the quality of their actions at intermediate time-steps. This paper focuses on developing methods to learn a temporal redistribution of the episodic reward to obtain a dense reward signal. Solving such MARL problems requires addressing two challenges: identifying (1) relative importance of states along the length of an episode (along time), and (2) relative importance of individual agents' states at any single time-step (among agents). In this paper, we introduce Agent-Temporal Attention for Reward Redistribution in Episodic Multi-Agent Reinforcement Learning (AREL) to address these two challenges. AREL uses attention mechanisms to characterize the influence of actions on state transitions along trajectories (temporal attention), and how each agent is affected by other agents at each time-step (agent attention). The redistributed rewards predicted by AREL are dense, and can be integrated with any given MARL algorithm. We evaluate AREL on challenging tasks from the Particle World environment and the StarCraft Multi-Agent Challenge. AREL results in higher rewards in Particle World, and improved win rates in StarCraft compared to three state-of-the-art reward redistribution methods. Our code is available at https://github.com/baicenxiao/AREL.

Related papers

$TAR^2$: Temporal-Agent Reward Redistribution for Optimal Policy Preservation in Multi-Agent Reinforcement Learning [7.97295726921338]
Temporal-Agent Reward Redistribution $TAR2$ is a novel approach that decomposes sparse global rewards into agent-specific, time-step-specific components. We show that $TAR2$ aligns with potential-based reward shaping, preserving the same optimal policies as the original environment.
arXiv Detail & Related papers (2025-02-07T12:07:57Z)
Agent-Temporal Credit Assignment for Optimal Policy Preservation in Sparse Multi-Agent Reinforcement Learning [14.003793644193605]
In multi-agent environments, agents often struggle to learn optimal policies due to sparse or delayed global rewards. We introduce Temporal-Agent Reward Redistribution (TAR$2$), a novel approach designed to address the agent-temporal credit assignment problem. TAR$2$ decomposes sparse global rewards into time-step-specific rewards and calculates agent-specific contributions to these rewards.
arXiv Detail & Related papers (2024-12-19T12:05:13Z)
Agent-Time Attention for Sparse Rewards Multi-Agent Reinforcement Learning [36.93626032028901]
Sparse and delayed rewards pose a challenge to single agent reinforcement learning. We propose Agent-Time Attention (ATA), a neural network model with auxiliary losses for redistributing sparse and delayed rewards.
arXiv Detail & Related papers (2022-10-31T17:54:51Z)
ELIGN: Expectation Alignment as a Multi-Agent Intrinsic Reward [29.737986509769808]
We propose a self-supervised intrinsic reward ELIGN - expectation alignment. Similar to how animals collaborate in a decentralized manner with those in their vicinity, agents trained with expectation alignment learn behaviors that match their neighbors' expectations. We show that agent coordination improves through expectation alignment because agents learn to divide tasks amongst themselves, break coordination symmetries, and confuse adversaries.
arXiv Detail & Related papers (2022-10-09T22:24:44Z)
Off-Beat Multi-Agent Reinforcement Learning [62.833358249873704]
We investigate model-free multi-agent reinforcement learning (MARL) in environments where off-beat actions are prevalent. We propose a novel episodic memory, LeGEM, for model-free MARL algorithms. We evaluate LeGEM on various multi-agent scenarios with off-beat actions, including Stag-Hunter Game, Quarry Game, Afforestation Game, and StarCraft II micromanagement tasks.
arXiv Detail & Related papers (2022-05-27T02:21:04Z)
LDSA: Learning Dynamic Subtask Assignment in Cooperative Multi-Agent Reinforcement Learning [122.47938710284784]
We propose a novel framework for learning dynamic subtask assignment (LDSA) in cooperative MARL. To reasonably assign agents to different subtasks, we propose an ability-based subtask selection strategy. We show that LDSA learns reasonable and effective subtask assignment for better collaboration.
arXiv Detail & Related papers (2022-05-05T10:46:16Z)
On the Use and Misuse of Absorbing States in Multi-agent Reinforcement Learning [55.95253619768565]
Current MARL algorithms assume that the number of agents within a group remains fixed throughout an experiment. In many practical problems, an agent may terminate before their teammates. We present a novel architecture for an existing state-of-the-art MARL algorithm which uses attention instead of a fully connected layer with absorbing states.
arXiv Detail & Related papers (2021-11-10T23:45:08Z)
AoI-Aware Resource Allocation for Platoon-Based C-V2X Networks via Multi-Agent Multi-Task Reinforcement Learning [22.890835786710316]
This paper investigates the problem of age of information (AoI) aware radio resource management for a platooning system. Multiple autonomous platoons exploit the cellular wireless vehicle-to-everything (C-V2X) communication technology to disseminate the cooperative awareness messages (CAMs) to their followers. We exploit a distributed resource allocation framework based on multi-agent reinforcement learning (MARL), where each platoon leader (PL) acts as an agent and interacts with the environment to learn its optimal policy.
arXiv Detail & Related papers (2021-05-10T08:39:56Z)
Multi-Agent Collaboration via Reward Attribution Decomposition [75.36911959491228]
We propose Collaborative Q-learning (CollaQ) that achieves state-of-the-art performance in the StarCraft multi-agent challenge. CollaQ is evaluated on various StarCraft Attribution maps and shows that it outperforms existing state-of-the-art techniques.
arXiv Detail & Related papers (2020-10-16T17:42:11Z)
UneVEn: Universal Value Exploration for Multi-Agent Reinforcement Learning [53.73686229912562]
We propose a novel MARL approach called Universal Value Exploration (UneVEn) UneVEn learns a set of related tasks simultaneously with a linear decomposition of universal successor features. Empirical results on a set of exploration games, challenging cooperative predator-prey tasks requiring significant coordination among agents, and StarCraft II micromanagement benchmarks show that UneVEn can solve tasks where other state-of-the-art MARL methods fail.
arXiv Detail & Related papers (2020-10-06T19:08:47Z)
Learning to Incentivize Other Learning Agents [73.03133692589532]
We show how to equip RL agents with the ability to give rewards directly to other agents, using a learned incentive function. Such agents significantly outperform standard RL and opponent-shaping agents in challenging general-sum Markov games. Our work points toward more opportunities and challenges along the path to ensure the common good in a multi-agent future.
arXiv Detail & Related papers (2020-06-10T20:12:38Z)
Reward Design in Cooperative Multi-agent Reinforcement Learning for Packet Routing [8.021402935358488]
We study reward design problem in cooperative multi-agent reinforcement learning (MARL) based on packet routing environments. We show that the above two reward signals are prone to produce suboptimal policies. We design some mixed reward signals, which are off-the-shelf to learn better policies.
arXiv Detail & Related papers (2020-03-05T02:27:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.