Related papers: IDRL: Identifying Identities in Multi-Agent Reinforcement Learning with Ambiguous Identities

IDRL: Identifying Identities in Multi-Agent Reinforcement Learning with Ambiguous Identities

URL: http://arxiv.org/abs/2210.12896v1
Date: Mon, 24 Oct 2022 00:54:59 GMT
Title: IDRL: Identifying Identities in Multi-Agent Reinforcement Learning with Ambiguous Identities
Authors: Shijie Han, Peng liu, Siyuan Li
Abstract summary: We develop a novel MARL framework: IDRL, which identifies the identities of the agents dynamically and then chooses the corresponding policy to perform in the task. Taking the poker game textitred-10 as the experiment environment, experiments show that the IDRL can achieve superior performance compared to the other MARL methods.
Score: 14.440273322731446
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Multi-agent reinforcement learning(MARL) is a prevalent learning paradigm for solving stochastic games. In previous studies, agents in a game are defined to be teammates or enemies beforehand, and the relation of the agents is fixed throughout the game. Those works can hardly work in the games where the competitive and collaborative relationships are not public and dynamically changing, which is decided by the \textit{identities} of the agents. How to learn a successful policy in such a situation where the identities of agents are ambiguous is still a problem. Focusing on this problem, in this work, we develop a novel MARL framework: IDRL, which identifies the identities of the agents dynamically and then chooses the corresponding policy to perform in the task. In the IDRL framework, a relation network is constructed to deduce the identities of the multi-agents through feeling the kindness and hostility unleashed by other agents; a dangerous network is built to estimate the risk of the identification. We also propose an intrinsic reward to help train the relation network and the dangerous network to get a trade-off between the need to maximize external reward and the accuracy of identification. After identifying the cooperation-competition pattern among the agents, the proposed method IDRL applies one of the off-the-shelf MARL methods to learn the policy. Taking the poker game \textit{red-10} as the experiment environment, experiments show that the IDRL can achieve superior performance compared to the other MARL methods. Significantly, the relation network has the par performance to identify the identities of agents with top human players; the dangerous network reasonably avoids the risk of imperfect identification.

Related papers

MARTI-MARS$^2$: Scaling Multi-Agent Self-Search via Reinforcement Learning for Code Generation [64.2621682259008]
Multi-Agent Reinforced Training and Inference Framework with Self-Search Scaling (MARTI-MARS2)<n>We propose a Multi-Agent Reinforced Training and Inference Framework with Self-Search Scaling (MARTI-MARS2) to integrate policy learning with multi-agent tree search.<n>We show that MARTI-MARS2 achieves 77.7%, outperforming strong baselines like GPT-5.1 on challenging code generation benchmarks.
arXiv Detail & Related papers (2026-02-08T07:28:44Z)
ID-RAG: Identity Retrieval-Augmented Generation for Long-Horizon Persona Coherence in Generative Agents [3.7798414207779296]
Identity Retrieval-Augmented Generation (ID-RAG) is a novel mechanism designed to ground an agent's persona and persistent preferences in a structured identity model.<n>During the agent's decision loop, this model is queried to retrieve relevant identity context, which directly informs action selection.<n>We demonstrate this approach by introducing and implementing a new class of ID-RAG enabled agents called Human-AI Agents.
arXiv Detail & Related papers (2025-09-29T16:54:51Z)
Generalizable Agent Modeling for Agent Collaboration-Competition Adaptation with Multi-Retrieval and Dynamic Generation [19.74776726500979]
Adapting a single agent to a new multi-agent system brings challenges, necessitating adjustments across various tasks, environments, and interactions with unknown teammates and opponents.<n>We propose a more comprehensive setting, Agent Collaborative-Competitive Adaptation, which evaluates an agent to generalize across diverse scenarios.<n>In ACCA, agents adjust to task and environmental changes, collaborate with unseen teammates, and compete against unknown opponents.
arXiv Detail & Related papers (2025-06-20T03:28:18Z)
Collaborative AI Teaming in Unknown Environments via Active Goal Deduction [22.842601384114058]
Existing approaches for training collaborative agents often require defined and known reward signals. We propose teaming with unknown agents framework, which leverages kernel density Bayesian inverse learning method for active goal deduction. We prove that unbiased reward estimates in our framework are sufficient for optimal teaming with unknown agents.
arXiv Detail & Related papers (2024-03-22T16:50:56Z)
Joint Intrinsic Motivation for Coordinated Exploration in Multi-Agent Deep Reinforcement Learning [0.0]
We propose an approach for rewarding strategies where agents collectively exhibit novel behaviors. Jim rewards joint trajectories based on a centralized measure of novelty designed to function in continuous environments. Results show that joint exploration is crucial for solving tasks where the optimal strategy requires a high level of coordination.
arXiv Detail & Related papers (2024-02-06T13:02:00Z)
DCIR: Dynamic Consistency Intrinsic Reward for Multi-Agent Reinforcement Learning [84.22561239481901]
We propose a new approach that enables agents to learn whether their behaviors should be consistent with that of other agents. We evaluate DCIR in multiple environments including Multi-agent Particle, Google Research Football and StarCraft II Micromanagement.
arXiv Detail & Related papers (2023-12-10T06:03:57Z)
Deep Multi-Agent Reinforcement Learning for Decentralized Active Hypothesis Testing [11.639503711252663]
We tackle the multi-agent active hypothesis testing (AHT) problem by introducing a novel algorithm rooted in the framework of deep multi-agent reinforcement learning. We present a comprehensive set of experimental results that effectively showcase the agents' ability to learn collaborative strategies and enhance performance.
arXiv Detail & Related papers (2023-09-14T01:18:04Z)
ProAgent: Building Proactive Cooperative Agents with Large Language Models [89.53040828210945]
ProAgent is a novel framework that harnesses large language models to create proactive agents. ProAgent can analyze the present state, and infer the intentions of teammates from observations. ProAgent exhibits a high degree of modularity and interpretability, making it easily integrated into various coordination scenarios.
arXiv Detail & Related papers (2023-08-22T10:36:56Z)
Unsupervised Domain Adaptation on Person Re-Identification via Dual-level Asymmetric Mutual Learning [108.86940401125649]
This paper proposes a Dual-level Asymmetric Mutual Learning method (DAML) to learn discriminative representations from a broader knowledge scope with diverse embedding spaces. The knowledge transfer between two networks is based on an asymmetric mutual learning manner. Experiments in Market-1501, CUHK-SYSU, and MSMT17 public datasets verified the superiority of DAML over state-of-the-arts.
arXiv Detail & Related papers (2023-01-29T12:36:17Z)
Learning From Good Trajectories in Offline Multi-Agent Reinforcement Learning [98.07495732562654]
offline multi-agent reinforcement learning (MARL) aims to learn effective multi-agent policies from pre-collected datasets. One agent learned by offline MARL often inherits this random policy, jeopardizing the performance of the entire team. We propose a novel framework called Shared Individual Trajectories (SIT) to address this problem.
arXiv Detail & Related papers (2022-11-28T18:11:26Z)
Contrastive Identity-Aware Learning for Multi-Agent Value Decomposition [31.877237996738252]
Value Decomposition (VD) aims to deduce the contributions of agents for decentralized policies in the presence of only global rewards. One of the main challenges in VD is to promote diverse behaviors among agents, while existing methods directly encourage the diversity of learned agent networks. We propose a novel Contrastive Identity-Aware learning (CIA) method, explicitly boosting the credit-level distinguishability of the VD network.
arXiv Detail & Related papers (2022-11-23T05:18:42Z)
On the Use and Misuse of Absorbing States in Multi-agent Reinforcement Learning [55.95253619768565]
Current MARL algorithms assume that the number of agents within a group remains fixed throughout an experiment. In many practical problems, an agent may terminate before their teammates. We present a novel architecture for an existing state-of-the-art MARL algorithm which uses attention instead of a fully connected layer with absorbing states.
arXiv Detail & Related papers (2021-11-10T23:45:08Z)
Explore and Control with Adversarial Surprise [78.41972292110967]
Reinforcement learning (RL) provides a framework for learning goal-directed policies given user-specified rewards. We propose a new unsupervised RL technique based on an adversarial game which pits two policies against each other to compete over the amount of surprise an RL agent experiences. We show that our method leads to the emergence of complex skills by exhibiting clear phase transitions.
arXiv Detail & Related papers (2021-07-12T17:58:40Z)
UneVEn: Universal Value Exploration for Multi-Agent Reinforcement Learning [53.73686229912562]
We propose a novel MARL approach called Universal Value Exploration (UneVEn) UneVEn learns a set of related tasks simultaneously with a linear decomposition of universal successor features. Empirical results on a set of exploration games, challenging cooperative predator-prey tasks requiring significant coordination among agents, and StarCraft II micromanagement benchmarks show that UneVEn can solve tasks where other state-of-the-art MARL methods fail.
arXiv Detail & Related papers (2020-10-06T19:08:47Z)
Networked Multi-Agent Reinforcement Learning with Emergent Communication [18.47483427884452]
Multi-Agent Reinforcement Learning (MARL) methods find optimal policies for agents that operate in the presence of other learning agents. One way to coordinate is by learning to communicate with each other. Can the agents develop a language while learning to perform a common task?
arXiv Detail & Related papers (2020-04-06T16:13:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.