Agent-Time Attention for Sparse Rewards Multi-Agent Reinforcement
Learning
- URL: http://arxiv.org/abs/2210.17540v1
- Date: Mon, 31 Oct 2022 17:54:51 GMT
- Title: Agent-Time Attention for Sparse Rewards Multi-Agent Reinforcement
Learning
- Authors: Jennifer She, Jayesh K. Gupta, Mykel J. Kochenderfer
- Abstract summary: Sparse and delayed rewards pose a challenge to single agent reinforcement learning.
We propose Agent-Time Attention (ATA), a neural network model with auxiliary losses for redistributing sparse and delayed rewards.
- Score: 36.93626032028901
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Sparse and delayed rewards pose a challenge to single agent reinforcement
learning. This challenge is amplified in multi-agent reinforcement learning
(MARL) where credit assignment of these rewards needs to happen not only across
time, but also across agents. We propose Agent-Time Attention (ATA), a neural
network model with auxiliary losses for redistributing sparse and delayed
rewards in collaborative MARL. We provide a simple example that demonstrates
how providing agents with their own local redistributed rewards and shared
global redistributed rewards motivate different policies. We extend several
MiniGrid environments, specifically MultiRoom and DoorKey, to the multi-agent
sparse delayed rewards setting. We demonstrate that ATA outperforms various
baselines on many instances of these environments. Source code of the
experiments is available at https://github.com/jshe/agent-time-attention.
Related papers
- GOV-REK: Governed Reward Engineering Kernels for Designing Robust Multi-Agent Reinforcement Learning Systems [2.867517731896504]
We propose GOVerned Reward Engineering Kernels (GOV-REK), which dynamically assign reward distributions to agents in multi-agent reinforcement learning systems.
We also introduce governance kernels, which exploit the underlying structure in either state or joint action space for assigning meaningful agent reward distributions.
Our experiments demonstrate that our meaningful reward priors robustly jumpstart the learning process for effectively learning different MARL problems.
arXiv Detail & Related papers (2024-04-01T14:19:00Z) - ELIGN: Expectation Alignment as a Multi-Agent Intrinsic Reward [29.737986509769808]
We propose a self-supervised intrinsic reward ELIGN - expectation alignment.
Similar to how animals collaborate in a decentralized manner with those in their vicinity, agents trained with expectation alignment learn behaviors that match their neighbors' expectations.
We show that agent coordination improves through expectation alignment because agents learn to divide tasks amongst themselves, break coordination symmetries, and confuse adversaries.
arXiv Detail & Related papers (2022-10-09T22:24:44Z) - LDSA: Learning Dynamic Subtask Assignment in Cooperative Multi-Agent
Reinforcement Learning [122.47938710284784]
We propose a novel framework for learning dynamic subtask assignment (LDSA) in cooperative MARL.
To reasonably assign agents to different subtasks, we propose an ability-based subtask selection strategy.
We show that LDSA learns reasonable and effective subtask assignment for better collaboration.
arXiv Detail & Related papers (2022-05-05T10:46:16Z) - Agent-Temporal Attention for Reward Redistribution in Episodic
Multi-Agent Reinforcement Learning [9.084006156825632]
This paper focuses on developing methods to learn a temporal redistribution of the episodic reward to obtain a dense reward signal.
We introduce Agent-Temporal Attention for Reward Redistribution in Episodic Multi-Agent Reinforcement Learning (AREL) to address these two challenges.
AREL results in higher rewards in Particle World, and improved win rates in StarCraft compared to three state-of-the-art reward redistribution methods.
arXiv Detail & Related papers (2022-01-12T18:35:46Z) - The Effects of Reward Misspecification: Mapping and Mitigating
Misaligned Models [85.68751244243823]
Reward hacking -- where RL agents exploit gaps in misspecified reward functions -- has been widely observed, but not yet systematically studied.
We investigate reward hacking as a function of agent capabilities: model capacity, action space resolution, observation space noise, and training time.
We find instances of phase transitions: capability thresholds at which the agent's behavior qualitatively shifts, leading to a sharp decrease in the true reward.
arXiv Detail & Related papers (2022-01-10T18:58:52Z) - PsiPhi-Learning: Reinforcement Learning with Demonstrations using
Successor Features and Inverse Temporal Difference Learning [102.36450942613091]
We propose an inverse reinforcement learning algorithm, called emphinverse temporal difference learning (ITD)
We show how to seamlessly integrate ITD with learning from online environment interactions, arriving at a novel algorithm for reinforcement learning with demonstrations, called $Psi Phi$-learning.
arXiv Detail & Related papers (2021-02-24T21:12:09Z) - Align-RUDDER: Learning From Few Demonstrations by Reward Redistribution [6.396567712417841]
We introduce Align-RUDDER, which employs reward redistribution effectively and drastically improves learning on few demonstrations.
On the Minecraft ObtainDiamond task, Align-RUDDER is able to mine a diamond, though not frequently.
arXiv Detail & Related papers (2020-09-29T15:48:02Z) - Learning to Incentivize Other Learning Agents [73.03133692589532]
We show how to equip RL agents with the ability to give rewards directly to other agents, using a learned incentive function.
Such agents significantly outperform standard RL and opponent-shaping agents in challenging general-sum Markov games.
Our work points toward more opportunities and challenges along the path to ensure the common good in a multi-agent future.
arXiv Detail & Related papers (2020-06-10T20:12:38Z) - Reward Design in Cooperative Multi-agent Reinforcement Learning for
Packet Routing [8.021402935358488]
We study reward design problem in cooperative multi-agent reinforcement learning (MARL) based on packet routing environments.
We show that the above two reward signals are prone to produce suboptimal policies.
We design some mixed reward signals, which are off-the-shelf to learn better policies.
arXiv Detail & Related papers (2020-03-05T02:27:46Z) - Scalable Multi-Agent Inverse Reinforcement Learning via
Actor-Attention-Critic [54.2180984002807]
Multi-agent adversarial inverse reinforcement learning (MA-AIRL) is a recent approach that applies single-agent AIRL to multi-agent problems.
We propose a multi-agent inverse RL algorithm that is more sample-efficient and scalable than previous works.
arXiv Detail & Related papers (2020-02-24T20:30:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.