Generative Evolutionary Meta-Solver (GEMS): Scalable Surrogate-Free Multi-Agent Learning
- URL: http://arxiv.org/abs/2509.23462v1
- Date: Sat, 27 Sep 2025 19:23:38 GMT
- Title: Generative Evolutionary Meta-Solver (GEMS): Scalable Surrogate-Free Multi-Agent Learning
- Authors: Alakh Sharma, Gaurish Trivedi, Kartikey Bhandari, Yash Sinha, Dhruv Kumar, Pratik Narang, Jagat Sesh Challa,
- Abstract summary: We present Generative Evolutionary Meta-r (GEMS), a surrogate-free framework that replaces explicit populations with a compact set of latent anchors and a single amortized generator.<n>GEMS relies on unbiased Monte Carlo rollouts, multiplicative-weights meta-dynamics, and a model-free empirical oracle to adaptively expand the policy set.<n>We find that GEMS is up to 6x faster, has 1.3x less memory usage than PSRO, while also reaps rewards simultaneously.
- Score: 5.217618511306204
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Scalable multi-agent reinforcement learning (MARL) remains a central challenge for AI. Existing population-based methods, like Policy-Space Response Oracles, PSRO, require storing explicit policy populations and constructing full payoff matrices, incurring quadratic computation and linear memory costs. We present Generative Evolutionary Meta-Solver (GEMS), a surrogate-free framework that replaces explicit populations with a compact set of latent anchors and a single amortized generator. Instead of exhaustively constructing the payoff matrix, GEMS relies on unbiased Monte Carlo rollouts, multiplicative-weights meta-dynamics, and a model-free empirical-Bernstein UCB oracle to adaptively expand the policy set. Best responses are trained within the generator using an advantage-based trust-region objective, eliminating the need to store and train separate actors. We evaluated GEMS in a variety of Two-player and Multi-Player games such as the Deceptive Messages Game, Kuhn Poker and Multi-Particle environment. We find that GEMS is up to ~6x faster, has 1.3x less memory usage than PSRO, while also reaps higher rewards simultaneously. These results demonstrate that GEMS retains the game theoretic guarantees of PSRO, while overcoming its fundamental inefficiencies, hence enabling scalable multi-agent learning in multiple domains.
Related papers
- Controllable Exploration in Hybrid-Policy RLVR for Multi-Modal Reasoning [88.42566960813438]
CalibRL is a hybrid-policy RLVR framework that supports controllable exploration with expert guidance.<n>CalibRL increases policy entropy in a guided manner and clarifies the target distribution.<n>Experiments across eight benchmarks, including both in-domain and out-of-domain settings, demonstrate consistent improvements.
arXiv Detail & Related papers (2026-02-22T07:23:36Z) - Discovering Multiagent Learning Algorithms with Large Language Models [8.649235365712004]
We propose the use of AlphaEvolve, an evolutionary coding agent powered by large language models, to automatically discover new multiagent learning algorithms.<n>We demonstrate the generality of this framework by evolving novel variants for two distinct paradigms of game-theoretic learning.
arXiv Detail & Related papers (2026-02-18T22:41:00Z) - MARTI-MARS$^2$: Scaling Multi-Agent Self-Search via Reinforcement Learning for Code Generation [64.2621682259008]
Multi-Agent Reinforced Training and Inference Framework with Self-Search Scaling (MARTI-MARS2)<n>We propose a Multi-Agent Reinforced Training and Inference Framework with Self-Search Scaling (MARTI-MARS2) to integrate policy learning with multi-agent tree search.<n>We show that MARTI-MARS2 achieves 77.7%, outperforming strong baselines like GPT-5.1 on challenging code generation benchmarks.
arXiv Detail & Related papers (2026-02-08T07:28:44Z) - Game-Theoretic Co-Evolution for LLM-Based Heuristic Discovery [37.96481049421407]
Large language models (LLMs) have enabled rapid progress in automatic discovery.<n>We propose a game-theoretic framework that reframes discovery as a program level co-evolution between solver and instance generator.
arXiv Detail & Related papers (2026-01-30T12:14:52Z) - scMRDR: A scalable and flexible framework for unpaired single-cell multi-omics data integration [53.683726781791385]
We introduce a scalable and flexible generative framework called single-cell Multi-omics Regularized Disentangled Representations (scMRDR) for unpaired multi-omics integration.<n>Our method achieves excellent performance on benchmark datasets in terms of batch correction, modality alignment, and biological signal preservation.
arXiv Detail & Related papers (2025-10-28T21:28:39Z) - Multi-Agent Tool-Integrated Policy Optimization [67.12841355267678]
Large language models (LLMs) increasingly rely on multi-turn tool-integrated planning for knowledge-intensive and complex reasoning tasks.<n>Existing implementations typically rely on a single agent, but they suffer from limited context length and noisy tool responses.<n>No existing methods support effective reinforcement learning post-training of tool-integrated multi-agent frameworks.
arXiv Detail & Related papers (2025-10-06T10:44:04Z) - JoyAgents-R1: Joint Evolution Dynamics for Versatile Multi-LLM Agents with Reinforcement Learning [6.81021875668872]
We propose JoyAgents-R1, which first applies Group Relative Policy Optimization to the joint training of heterogeneous multi-agents.<n>We show that JoyAgents-R1 achieves performance comparable to that of larger LLMs while built on smaller open-source models.
arXiv Detail & Related papers (2025-06-24T17:59:31Z) - Breaking the Curse of Multiagency in Robust Multi-Agent Reinforcement Learning [37.80275600302316]
distributionally robust Markov games (RMGs) have been proposed to enhance robustness in MARL.<n>Two notorious and open challenges are the formulation of the uncertainty set and whether the corresponding RMGs can overcome the curse of multiagency.<n>In this work, we propose a natural class of RMGs inspired by behavioral economics, where each agent's uncertainty set is shaped by both the environment and the integrated behavior of other agents.
arXiv Detail & Related papers (2024-09-30T08:09:41Z) - Independent Policy Mirror Descent for Markov Potential Games: Scaling to Large Number of Players [17.55330497310932]
Markov Potential Games (MPGs) form an important sub-class of Markov games.
MPGs include as a special case the identical-interest setting where all the agents share the same reward function.
Scaling the performance of Nash equilibrium learning algorithms to a large number of agents is crucial for multi-agent systems.
arXiv Detail & Related papers (2024-08-15T11:02:05Z) - Fleet of Agents: Coordinated Problem Solving with Large Language Models [10.167121757937062]
Fleet of Agents (FoA) is a principled framework utilizing large language models as agents to navigate through dynamic tree searches.<n>FoA spawns a multitude of agents, each exploring the search space autonomously, followed by a selection phase.<n>FoA achieves the best cost-quality trade-off among all benchmarked methods and FoA + LMA3.2-11B surpasses the Llama3.2-90B model.
arXiv Detail & Related papers (2024-05-07T09:36:23Z) - Meta-Learning Adversarial Bandit Algorithms [55.72892209124227]
We study online meta-learning with bandit feedback.
We learn to tune online mirror descent generalization (OMD) with self-concordant barrier regularizers.
arXiv Detail & Related papers (2023-07-05T13:52:10Z) - Learning From Good Trajectories in Offline Multi-Agent Reinforcement
Learning [98.07495732562654]
offline multi-agent reinforcement learning (MARL) aims to learn effective multi-agent policies from pre-collected datasets.
One agent learned by offline MARL often inherits this random policy, jeopardizing the performance of the entire team.
We propose a novel framework called Shared Individual Trajectories (SIT) to address this problem.
arXiv Detail & Related papers (2022-11-28T18:11:26Z) - Efficient Model-Based Multi-Agent Mean-Field Reinforcement Learning [89.31889875864599]
We propose an efficient model-based reinforcement learning algorithm for learning in multi-agent systems.
Our main theoretical contributions are the first general regret bounds for model-based reinforcement learning for MFC.
We provide a practical parametrization of the core optimization problem.
arXiv Detail & Related papers (2021-07-08T18:01:02Z) - Coded Stochastic ADMM for Decentralized Consensus Optimization with Edge
Computing [113.52575069030192]
Big data, including applications with high security requirements, are often collected and stored on multiple heterogeneous devices, such as mobile devices, drones and vehicles.
Due to the limitations of communication costs and security requirements, it is of paramount importance to extract information in a decentralized manner instead of aggregating data to a fusion center.
We consider the problem of learning model parameters in a multi-agent system with data locally processed via distributed edge nodes.
A class of mini-batch alternating direction method of multipliers (ADMM) algorithms is explored to develop the distributed learning model.
arXiv Detail & Related papers (2020-10-02T10:41:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.