Related papers: A Policy Gradient Algorithm for Learning to Learn in Multiagent Reinforcement Learning

A Policy Gradient Algorithm for Learning to Learn in Multiagent Reinforcement Learning

URL: http://arxiv.org/abs/2011.00382v5
Date: Fri, 11 Jun 2021 22:21:49 GMT
Title: A Policy Gradient Algorithm for Learning to Learn in Multiagent Reinforcement Learning
Authors: Dong-Ki Kim, Miao Liu, Matthew Riemer, Chuangchuang Sun, Marwa Abdulhai, Golnaz Habibi, Sebastian Lopez-Cot, Gerald Tesauro, Jonathan P. How
Abstract summary: We propose a novel meta-multiagent policy gradient theorem that accounts for the non-stationary policy dynamics inherent to multiagent learning settings. This is achieved by modeling our gradient updates to consider both an agent's own non-stationary policy dynamics and the non-stationary policy dynamics of other agents in the environment.
Score: 47.154539984501895
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: A fundamental challenge in multiagent reinforcement learning is to learn beneficial behaviors in a shared environment with other simultaneously learning agents. In particular, each agent perceives the environment as effectively non-stationary due to the changing policies of other agents. Moreover, each agent is itself constantly learning, leading to natural non-stationarity in the distribution of experiences encountered. In this paper, we propose a novel meta-multiagent policy gradient theorem that directly accounts for the non-stationary policy dynamics inherent to multiagent learning settings. This is achieved by modeling our gradient updates to consider both an agent's own non-stationary policy dynamics and the non-stationary policy dynamics of other agents in the environment. We show that our theoretically grounded approach provides a general solution to the multiagent learning problem, which inherently comprises all key aspects of previous state of the art approaches on this topic. We test our method on a diverse suite of multiagent benchmarks and demonstrate a more efficient ability to adapt to new agents as they learn than baseline methods across the full spectrum of mixed incentive, competitive, and cooperative domains.

Related papers

From Novice to Expert: LLM Agent Policy Optimization via Step-wise Reinforcement Learning [62.54484062185869]
We introduce StepAgent, which utilizes step-wise reward to optimize the agent's reinforcement learning process. We propose implicit-reward and inverse reinforcement learning techniques to facilitate agent reflection and policy adjustment.
arXiv Detail & Related papers (2024-11-06T10:35:11Z)
Contrastive learning-based agent modeling for deep reinforcement learning [31.293496061727932]
Agent modeling is essential when designing adaptive policies for intelligent machine agents in multiagent systems. We devised a Contrastive Learning-based Agent Modeling (CLAM) method that relies only on the local observations from the ego agent during training and execution. CLAM is capable of generating consistent high-quality policy representations in real-time right from the beginning of each episode.
arXiv Detail & Related papers (2023-12-30T03:44:12Z)
DCIR: Dynamic Consistency Intrinsic Reward for Multi-Agent Reinforcement Learning [84.22561239481901]
We propose a new approach that enables agents to learn whether their behaviors should be consistent with that of other agents. We evaluate DCIR in multiple environments including Multi-agent Particle, Google Research Football and StarCraft II Micromanagement.
arXiv Detail & Related papers (2023-12-10T06:03:57Z)
RPM: Generalizable Behaviors for Multi-Agent Reinforcement Learning [90.43925357575543]
We propose ranked policy memory ( RPM) to collect diverse multi-agent trajectories for training MARL policies with good generalizability. RPM enables MARL agents to interact with unseen agents in multi-agent generalization evaluation scenarios and complete given tasks, and it significantly boosts the performance up to 402% on average.
arXiv Detail & Related papers (2022-10-18T07:32:43Z)
Influencing Long-Term Behavior in Multiagent Reinforcement Learning [59.98329270954098]
We propose a principled framework for considering the limiting policies of other agents as the time approaches infinity. Specifically, we develop a new optimization objective that maximizes each agent's average reward by directly accounting for the impact of its behavior on the limiting set of policies that other agents will take on. Thanks to our farsighted evaluation, we demonstrate better long-term performance than state-of-the-art baselines in various domains.
arXiv Detail & Related papers (2022-03-07T17:32:35Z)
Informative Policy Representations in Multi-Agent Reinforcement Learning via Joint-Action Distributions [17.129962954873587]
In multi-agent reinforcement learning, the inherent non-stationarity of the environment caused by other agents' actions posed significant difficulties for an agent to learn a good policy independently. We propose a general method to learn representations of other agents' policies via the joint-action distributions sampled in interactions. We empirically demonstrate that our method outperforms existing work in multi-agent tasks when facing unseen agents.
arXiv Detail & Related papers (2021-06-10T15:09:33Z)
Deep Interactive Bayesian Reinforcement Learning via Meta-Learning [63.96201773395921]
The optimal adaptive behaviour under uncertainty over the other agents' strategies can be computed using the Interactive Bayesian Reinforcement Learning framework. We propose to meta-learn approximate belief inference and Bayes-optimal behaviour for a given prior. We show empirically that our approach outperforms existing methods that use a model-free approach, sample from the approximate posterior, maintain memory-free models of others, or do not fully utilise the known structure of the environment.
arXiv Detail & Related papers (2021-01-11T13:25:13Z)
Consolidation via Policy Information Regularization in Deep RL for Multi-Agent Games [21.46148507577606]
This paper introduces an information-theoretic constraint on learned policy complexity in the Multi-Agent Deep Deterministic Policy Gradient (MADDPG) reinforcement learning algorithm. Results from experimentation in multi-agent cooperative and competitive tasks demonstrate that the capacity-limited approach is a good candidate for improving learning performance in these environments.
arXiv Detail & Related papers (2020-11-23T16:28:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.