Efficient Reinforcement Learning for Zero-Shot Coordination in Evolving Games
- URL: http://arxiv.org/abs/2511.11083v3
- Date: Tue, 18 Nov 2025 10:20:13 GMT
- Title: Efficient Reinforcement Learning for Zero-Shot Coordination in Evolving Games
- Authors: Bingyu Hui, Lebin Yu, Quanming Yao, Yunpeng Qu, Xudong Zhang, Jian Wang,
- Abstract summary: Zero-shot coordination is a key challenge in multi-agent game theory.<n>Population-based training has been proven to provide good zero-shot coordination performance.
- Score: 30.01934395713042
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Zero-shot coordination(ZSC), a key challenge in multi-agent game theory, has become a hot topic in reinforcement learning (RL) research recently, especially in complex evolving games. It focuses on the generalization ability of agents, requiring them to coordinate well with collaborators from a diverse, potentially evolving, pool of partners that are not seen before without any fine-tuning. Population-based training, which approximates such an evolving partner pool, has been proven to provide good zero-shot coordination performance; nevertheless, existing methods are limited by computational resources, mainly focusing on optimizing diversity in small populations while neglecting the potential performance gains from scaling population size. To address this issue, this paper proposes the Scalable Population Training (ScaPT), an efficient RL training framework comprising two key components: a meta-agent that efficiently realizes a population by selectively sharing parameters across agents, and a mutual information regularizer that guarantees population diversity. To empirically validate the effectiveness of ScaPT, this paper evaluates it along with representational frameworks in Hanabi cooperative game and confirms its superiority.
Related papers
- Heterogeneous Agent Collaborative Reinforcement Learning [52.99813668995983]
Heterogeneous Agent Collaborative Reinforcement Learning (HACRL)<n>Building on this paradigm, we propose HACPO, a collaborative RL algorithm that enables principled rollout sharing to maximize sample utilization and cross-agent knowledge transfer.<n>Experiments across diverse heterogeneous model combinations and reasoning benchmarks show that HACPO consistently improves all participating agents, outperforming GSPO by an average of 3.3% while using only half the rollout cost.
arXiv Detail & Related papers (2026-03-03T05:09:49Z) - CoMAS: Co-Evolving Multi-Agent Systems via Interaction Rewards [80.78748457530718]
Self-evolution is a central research topic in enabling large language model (LLM)-based agents to continually improve their capabilities after pretraining.<n>We introduce Co-Evolving Multi-Agent Systems (CoMAS), a novel framework that enables agents to improve autonomously by learning from inter-agent interactions.
arXiv Detail & Related papers (2025-10-09T17:50:26Z) - C2AL: Cohort-Contrastive Auxiliary Learning for Large-scale Recommendation Systems [7.548682352355034]
We show how the attention mechanism can play a key role in factorization machines for shared embedding selection.<n>We propose to address this challenge by analyzing the substructures in the dataset and exposing those with strong distributional contrast through auxiliary learning.<n>This approach customizes the learning process of attention layers to preserve mutual information with minority cohorts while improving global performance.
arXiv Detail & Related papers (2025-10-02T17:00:17Z) - Enhancing Multi-Agent Collaboration with Attention-Based Actor-Critic Policies [0.0]
Team-Attention-Actor-Critic (TAAC) is a learning algorithm designed to enhance multi-agent collaboration in cooperative environments.<n>We evaluate TAAC in a simulated soccer environment against benchmark algorithms.
arXiv Detail & Related papers (2025-07-30T15:48:38Z) - Decentralized and Lifelong-Adaptive Multi-Agent Collaborative Learning [57.652899266553035]
Decentralized and lifelong-adaptive multi-agent collaborative learning aims to enhance collaboration among multiple agents without a central server.
We propose DeLAMA, a decentralized multi-agent lifelong collaborative learning algorithm with dynamic collaboration graphs.
arXiv Detail & Related papers (2024-03-11T09:21:11Z) - Cooperation Dynamics in Multi-Agent Systems: Exploring Game-Theoretic Scenarios with Mean-Field Equilibria [0.0]
This paper investigates strategies to invoke cooperation in game-theoretic scenarios, namely the Iterated Prisoner's Dilemma.
Existing cooperative strategies are analyzed for their effectiveness in promoting group-oriented behavior in repeated games.
The study extends to scenarios with exponentially growing agent populations.
arXiv Detail & Related papers (2023-09-28T08:57:01Z) - ProAgent: Building Proactive Cooperative Agents with Large Language
Models [89.53040828210945]
ProAgent is a novel framework that harnesses large language models to create proactive agents.
ProAgent can analyze the present state, and infer the intentions of teammates from observations.
ProAgent exhibits a high degree of modularity and interpretability, making it easily integrated into various coordination scenarios.
arXiv Detail & Related papers (2023-08-22T10:36:56Z) - AgentVerse: Facilitating Multi-Agent Collaboration and Exploring
Emergent Behaviors [93.38830440346783]
We propose a multi-agent framework framework that can collaboratively adjust its composition as a greater-than-the-sum-of-its-parts system.
Our experiments demonstrate that framework framework can effectively deploy multi-agent groups that outperform a single agent.
In view of these behaviors, we discuss some possible strategies to leverage positive ones and mitigate negative ones for improving the collaborative potential of multi-agent groups.
arXiv Detail & Related papers (2023-08-21T16:47:11Z) - Towards Few-shot Coordination: Revisiting Ad-hoc Teamplay Challenge In
the Game of Hanabi [15.917861586043813]
We show that state-of-the-art ZSC algorithms have poor performance when paired with agents trained with different learning methods.
We create a framework based on a popular cooperative multi-agent game called Hanabi to evaluate the adaptability of MARL methods.
arXiv Detail & Related papers (2023-08-20T14:44:50Z) - Serverless Federated AUPRC Optimization for Multi-Party Collaborative
Imbalanced Data Mining [119.89373423433804]
Area Under Precision-Recall (AUPRC) was introduced as an effective metric.
Serverless multi-party collaborative training can cut down the communications cost by avoiding the server node bottleneck.
We propose a new ServerLess biAsed sTochastic gradiEnt (SLATE) algorithm to directly optimize the AUPRC.
arXiv Detail & Related papers (2023-08-06T06:51:32Z) - Enhancing Worker Recruitment in Collaborative Mobile Crowdsourcing: A Graph Neural Network Trust Evaluation Approach [7.883218966932225]
Collaborative Mobile Crowdsourcing (CMCS) allows platforms to recruit worker teams to collaboratively execute complex sensing tasks.
To obtain the asymmetric trust values among all workers in the social network, the Trust Reinforcement Evaluation Framework (TREF) is proposed in this paper.
arXiv Detail & Related papers (2023-06-07T11:59:45Z) - Reweighted Mixup for Subpopulation Shift [63.1315456651771]
Subpopulation shift exists in many real-world applications, which refers to the training and test distributions that contain the same subpopulation groups but with different subpopulation proportions.
Importance reweighting is a classical and effective way to handle the subpopulation shift.
We propose a simple yet practical framework, called reweighted mixup, to mitigate the overfitting issue.
arXiv Detail & Related papers (2023-04-09T03:44:50Z) - A Reinforcement Learning-assisted Genetic Programming Algorithm for Team
Formation Problem Considering Person-Job Matching [70.28786574064694]
A reinforcement learning-assisted genetic programming algorithm (RL-GP) is proposed to enhance the quality of solutions.
The hyper-heuristic rules obtained through efficient learning can be utilized as decision-making aids when forming project teams.
arXiv Detail & Related papers (2023-04-08T14:32:12Z) - GENIUS: A Novel Solution for Subteam Replacement with Clustering-based
Graph Neural Network [34.510076775330795]
Subteam replacement is defined as finding the optimal candidate set of people who can best function as an unavailable subset of members.
We propose GENIUS, a novel clustering-based graph neural network (GNN) framework that can capture team network knowledge for flexible subteam replacement.
arXiv Detail & Related papers (2022-11-08T09:02:59Z) - Contextual Squeeze-and-Excitation for Efficient Few-Shot Image
Classification [57.36281142038042]
We present a new adaptive block called Contextual Squeeze-and-Excitation (CaSE) that adjusts a pretrained neural network on a new task to significantly improve performance.
We also present a new training protocol based on Coordinate-Descent called UpperCaSE that exploits meta-trained CaSE blocks and fine-tuning routines for efficient adaptation.
arXiv Detail & Related papers (2022-06-20T15:25:08Z) - Group-Agent Reinforcement Learning [12.915860504511523]
It can largely benefit the reinforcement learning process of each agent if multiple geographically distributed agents perform their separate RL tasks cooperatively.
We propose a distributed RL framework called DDAL (Decentralised Distributed Asynchronous Learning) designed for group-agent reinforcement learning (GARL)
arXiv Detail & Related papers (2022-02-10T16:40:59Z) - Evaluating Generalization and Transfer Capacity of Multi-Agent
Reinforcement Learning Across Variable Number of Agents [0.0]
Multi-agent Reinforcement Learning (MARL) problems often require cooperation among agents in order to solve a task.
Centralization and decentralization are two approaches used for cooperation in MARL.
We adopt centralized training with decentralized execution paradigm and investigate the generalization and transfer capacity of the trained models across variable number of agents.
arXiv Detail & Related papers (2021-11-28T15:29:46Z) - ACP++: Action Co-occurrence Priors for Human-Object Interaction
Detection [102.9428507180728]
A common problem in the task of human-object interaction (HOI) detection is that numerous HOI classes have only a small number of labeled examples.
We observe that there exist natural correlations and anti-correlations among human-object interactions.
We present techniques to learn these priors and leverage them for more effective training, especially on rare classes.
arXiv Detail & Related papers (2021-09-09T06:02:50Z) - Adaptive Serverless Learning [114.36410688552579]
We propose a novel adaptive decentralized training approach, which can compute the learning rate from data dynamically.
Our theoretical results reveal that the proposed algorithm can achieve linear speedup with respect to the number of workers.
To reduce the communication-efficient overhead, we further propose a communication-efficient adaptive decentralized training approach.
arXiv Detail & Related papers (2020-08-24T13:23:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.