Informative Policy Representations in Multi-Agent Reinforcement Learning
via Joint-Action Distributions
- URL: http://arxiv.org/abs/2106.05802v1
- Date: Thu, 10 Jun 2021 15:09:33 GMT
- Title: Informative Policy Representations in Multi-Agent Reinforcement Learning
via Joint-Action Distributions
- Authors: Yifan Yu, Haobin Jiang, Zongqing Lu
- Abstract summary: In multi-agent reinforcement learning, the inherent non-stationarity of the environment caused by other agents' actions posed significant difficulties for an agent to learn a good policy independently.
We propose a general method to learn representations of other agents' policies via the joint-action distributions sampled in interactions.
We empirically demonstrate that our method outperforms existing work in multi-agent tasks when facing unseen agents.
- Score: 17.129962954873587
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In multi-agent reinforcement learning, the inherent non-stationarity of the
environment caused by other agents' actions posed significant difficulties for
an agent to learn a good policy independently. One way to deal with
non-stationarity is agent modeling, by which the agent takes into consideration
the influence of other agents' policies. Most existing work relies on
predicting other agents' actions or goals, or discriminating between their
policies. However, such modeling fails to capture the similarities and
differences between policies simultaneously and thus cannot provide useful
information when generalizing to unseen policies. To address this, we propose a
general method to learn representations of other agents' policies via the
joint-action distributions sampled in interactions. The similarities and
differences between policies are naturally captured by the policy distance
inferred from the joint-action distributions and deliberately reflected in the
learned representations. Agents conditioned on the policy representations can
well generalize to unseen agents. We empirically demonstrate that our method
outperforms existing work in multi-agent tasks when facing unseen agents.
Related papers
- Contrastive learning-based agent modeling for deep reinforcement
learning [31.293496061727932]
Agent modeling is essential when designing adaptive policies for intelligent machine agents in multiagent systems.
We devised a Contrastive Learning-based Agent Modeling (CLAM) method that relies only on the local observations from the ego agent during training and execution.
CLAM is capable of generating consistent high-quality policy representations in real-time right from the beginning of each episode.
arXiv Detail & Related papers (2023-12-30T03:44:12Z) - Fact-based Agent modeling for Multi-Agent Reinforcement Learning [6.431977627644292]
Fact-based Agent modeling (FAM) method is proposed in which fact-based belief inference (FBI) network models other agents in partially observable environment only based on its local information.
We evaluate FAM on various Multiagent Particle Environment (MPE) and compare the results with several state-of-the-art MARL algorithms.
arXiv Detail & Related papers (2023-10-18T19:43:38Z) - Policy Dispersion in Non-Markovian Environment [53.05904889617441]
This paper tries to learn the diverse policies from the history of state-action pairs under a non-Markovian environment.
We first adopt a transformer-based method to learn policy embeddings.
Then, we stack the policy embeddings to construct a dispersion matrix to induce a set of diverse policies.
arXiv Detail & Related papers (2023-02-28T11:58:39Z) - Influencing Long-Term Behavior in Multiagent Reinforcement Learning [59.98329270954098]
We propose a principled framework for considering the limiting policies of other agents as the time approaches infinity.
Specifically, we develop a new optimization objective that maximizes each agent's average reward by directly accounting for the impact of its behavior on the limiting set of policies that other agents will take on.
Thanks to our farsighted evaluation, we demonstrate better long-term performance than state-of-the-art baselines in various domains.
arXiv Detail & Related papers (2022-03-07T17:32:35Z) - Explaining Reinforcement Learning Policies through Counterfactual
Trajectories [147.7246109100945]
A human developer must validate that an RL agent will perform well at test-time.
Our method conveys how the agent performs under distribution shifts by showing the agent's behavior across a wider trajectory distribution.
In a user study, we demonstrate that our method enables users to score better than baseline methods on one of two agent validation tasks.
arXiv Detail & Related papers (2022-01-29T00:52:37Z) - Learning Latent Representations to Influence Multi-Agent Interaction [65.44092264843538]
We propose a reinforcement learning-based framework for learning latent representations of an agent's policy.
We show that our approach outperforms the alternatives and learns to influence the other agent.
arXiv Detail & Related papers (2020-11-12T19:04:26Z) - A Policy Gradient Algorithm for Learning to Learn in Multiagent
Reinforcement Learning [47.154539984501895]
We propose a novel meta-multiagent policy gradient theorem that accounts for the non-stationary policy dynamics inherent to multiagent learning settings.
This is achieved by modeling our gradient updates to consider both an agent's own non-stationary policy dynamics and the non-stationary policy dynamics of other agents in the environment.
arXiv Detail & Related papers (2020-10-31T22:50:21Z) - Revisiting Parameter Sharing in Multi-Agent Deep Reinforcement Learning [14.017603575774361]
We formalize the notion of agent indication and prove that it enables convergence to optimal policies for the first time.
Next, we formally introduce methods to extend parameter sharing to learning in heterogeneous observation and action spaces.
arXiv Detail & Related papers (2020-05-27T20:14:28Z) - Multi-Agent Interactions Modeling with Correlated Policies [53.38338964628494]
In this paper, we cast the multi-agent interactions modeling problem into a multi-agent imitation learning framework.
We develop a Decentralized Adrial Imitation Learning algorithm with Correlated policies (CoDAIL)
Various experiments demonstrate that CoDAIL can better regenerate complex interactions close to the demonstrators.
arXiv Detail & Related papers (2020-01-04T17:31:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.