DCIR: Dynamic Consistency Intrinsic Reward for Multi-Agent Reinforcement
Learning
- URL: http://arxiv.org/abs/2312.05783v1
- Date: Sun, 10 Dec 2023 06:03:57 GMT
- Title: DCIR: Dynamic Consistency Intrinsic Reward for Multi-Agent Reinforcement
Learning
- Authors: Kunyang Lin, Yufeng Wang, Peihao Chen, Runhao Zeng, Siyuan Zhou,
Mingkui Tan, Chuang Gan
- Abstract summary: We propose a new approach that enables agents to learn whether their behaviors should be consistent with that of other agents.
We evaluate DCIR in multiple environments including Multi-agent Particle, Google Research Football and StarCraft II Micromanagement.
- Score: 84.22561239481901
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Learning optimal behavior policy for each agent in multi-agent systems is an
essential yet difficult problem. Despite fruitful progress in multi-agent
reinforcement learning, the challenge of addressing the dynamics of whether two
agents should exhibit consistent behaviors is still under-explored. In this
paper, we propose a new approach that enables agents to learn whether their
behaviors should be consistent with that of other agents by utilizing intrinsic
rewards to learn the optimal policy for each agent. We begin by defining
behavior consistency as the divergence in output actions between two agents
when provided with the same observation. Subsequently, we introduce dynamic
consistency intrinsic reward (DCIR) to stimulate agents to be aware of others'
behaviors and determine whether to be consistent with them. Lastly, we devise a
dynamic scale network (DSN) that provides learnable scale factors for the agent
at every time step to dynamically ascertain whether to award consistent
behavior and the magnitude of rewards. We evaluate DCIR in multiple
environments including Multi-agent Particle, Google Research Football and
StarCraft II Micromanagement, demonstrating its efficacy.
Related papers
- On Multi-Agent Inverse Reinforcement Learning [8.284137254112848]
We extend the Inverse Reinforcement Learning (IRL) framework to the multi-agent setting, assuming to observe agents who are following Nash Equilibrium (NE) policies.
We provide an explicit characterization of the feasible reward set and analyze how errors in estimating the transition dynamics and expert behavior impact the recovered rewards.
arXiv Detail & Related papers (2024-11-22T16:31:36Z) - Joint Intrinsic Motivation for Coordinated Exploration in Multi-Agent
Deep Reinforcement Learning [0.0]
We propose an approach for rewarding strategies where agents collectively exhibit novel behaviors.
Jim rewards joint trajectories based on a centralized measure of novelty designed to function in continuous environments.
Results show that joint exploration is crucial for solving tasks where the optimal strategy requires a high level of coordination.
arXiv Detail & Related papers (2024-02-06T13:02:00Z) - Learning From Good Trajectories in Offline Multi-Agent Reinforcement
Learning [98.07495732562654]
offline multi-agent reinforcement learning (MARL) aims to learn effective multi-agent policies from pre-collected datasets.
One agent learned by offline MARL often inherits this random policy, jeopardizing the performance of the entire team.
We propose a novel framework called Shared Individual Trajectories (SIT) to address this problem.
arXiv Detail & Related papers (2022-11-28T18:11:26Z) - Credit-cognisant reinforcement learning for multi-agent cooperation [0.0]
We introduce the concept of credit-cognisant rewards, which allows an agent to perceive the effect its actions had on the environment as well as on its co-agents.
We show that by manipulating these experiences and constructing the reward contained within them to include the rewards received by all the agents within the same action sequence, we are able to improve significantly on the performance of independent deep Q-learning.
arXiv Detail & Related papers (2022-11-18T09:00:25Z) - Influencing Long-Term Behavior in Multiagent Reinforcement Learning [59.98329270954098]
We propose a principled framework for considering the limiting policies of other agents as the time approaches infinity.
Specifically, we develop a new optimization objective that maximizes each agent's average reward by directly accounting for the impact of its behavior on the limiting set of policies that other agents will take on.
Thanks to our farsighted evaluation, we demonstrate better long-term performance than state-of-the-art baselines in various domains.
arXiv Detail & Related papers (2022-03-07T17:32:35Z) - Learning Latent Representations to Influence Multi-Agent Interaction [65.44092264843538]
We propose a reinforcement learning-based framework for learning latent representations of an agent's policy.
We show that our approach outperforms the alternatives and learns to influence the other agent.
arXiv Detail & Related papers (2020-11-12T19:04:26Z) - A Policy Gradient Algorithm for Learning to Learn in Multiagent
Reinforcement Learning [47.154539984501895]
We propose a novel meta-multiagent policy gradient theorem that accounts for the non-stationary policy dynamics inherent to multiagent learning settings.
This is achieved by modeling our gradient updates to consider both an agent's own non-stationary policy dynamics and the non-stationary policy dynamics of other agents in the environment.
arXiv Detail & Related papers (2020-10-31T22:50:21Z) - Learning to Incentivize Other Learning Agents [73.03133692589532]
We show how to equip RL agents with the ability to give rewards directly to other agents, using a learned incentive function.
Such agents significantly outperform standard RL and opponent-shaping agents in challenging general-sum Markov games.
Our work points toward more opportunities and challenges along the path to ensure the common good in a multi-agent future.
arXiv Detail & Related papers (2020-06-10T20:12:38Z) - Individual specialization in multi-task environments with multiagent
reinforcement learners [0.0]
There is a growing interest in Multi-Agent Reinforcement Learning (MARL) as the first steps towards building general intelligent agents.
Previous results point us towards increased conditions for coordination, efficiency/fairness, and common-pool resource sharing.
We further study coordination in multi-task environments where several rewarding tasks can be performed and thus agents don't necessarily need to perform well in all tasks, but under certain conditions may specialize.
arXiv Detail & Related papers (2019-12-29T15:20:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.