Discovering Multiagent Learning Algorithms with Large Language Models
- URL: http://arxiv.org/abs/2602.16928v2
- Date: Sat, 21 Feb 2026 19:59:56 GMT
- Title: Discovering Multiagent Learning Algorithms with Large Language Models
- Authors: Zun Li, John Schultz, Daniel Hennes, Marc Lanctot,
- Abstract summary: We propose the use of AlphaEvolve, an evolutionary coding agent powered by large language models, to automatically discover new multiagent learning algorithms.<n>We demonstrate the generality of this framework by evolving novel variants for two distinct paradigms of game-theoretic learning.
- Score: 8.649235365712004
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Much of the advancement of Multi-Agent Reinforcement Learning (MARL) in imperfect-information games has historically depended on manual iterative refinement of baselines. While foundational families like Counterfactual Regret Minimization (CFR) and Policy Space Response Oracles (PSRO) rest on solid theoretical ground, the design of their most effective variants often relies on human intuition to navigate a vast algorithmic design space. In this work, we propose the use of AlphaEvolve, an evolutionary coding agent powered by large language models, to automatically discover new multiagent learning algorithms. We demonstrate the generality of this framework by evolving novel variants for two distinct paradigms of game-theoretic learning. First, in the domain of iterative regret minimization, we evolve the logic governing regret accumulation and policy derivation, discovering a new algorithm, Volatility-Adaptive Discounted (VAD-)CFR. VAD-CFR employs novel, non-intuitive mechanisms-including volatility-sensitive discounting, consistency-enforced optimism, and a hard warm-start policy accumulation schedule-to outperform state-of-the-art baselines like Discounted Predictive CFR+. Second, in the regime of population based training algorithms, we evolve training-time and evaluation-time meta strategy solvers for PSRO, discovering a new variant, Smoothed Hybrid Optimistic Regret (SHOR-)PSRO. SHOR-PSRO introduces a hybrid meta-solver that linearly blends Optimistic Regret Matching with a smoothed, temperature-controlled distribution over best pure strategies. By dynamically annealing this blending factor and diversity bonuses during training, the algorithm automates the transition from population diversity to rigorous equilibrium finding, yielding superior empirical convergence compared to standard static meta-solvers.
Related papers
- AdaEvolve: Adaptive LLM Driven Zeroth-Order Optimization [61.535567824938205]
We introduce AdaEvolve, a framework that reformulates LLM-driven evolution as a hierarchical adaptive optimization problem.<n>AdaEvolve consistently outperforms the open-ended baselines across 185 different open-ended optimization problems.
arXiv Detail & Related papers (2026-02-23T18:45:31Z) - Game-Theoretic Co-Evolution for LLM-Based Heuristic Discovery [37.96481049421407]
Large language models (LLMs) have enabled rapid progress in automatic discovery.<n>We propose a game-theoretic framework that reframes discovery as a program level co-evolution between solver and instance generator.
arXiv Detail & Related papers (2026-01-30T12:14:52Z) - Beyond Algorithm Evolution: An LLM-Driven Framework for the Co-Evolution of Swarm Intelligence Optimization Algorithms and Prompts [2.7320188728052064]
This paper proposes a novel framework for the collaborative evolution of both swarm intelligence algorithms and guiding prompts.<n>The framework was rigorously evaluated on a range of NP problems, where it demonstrated superior performance.<n>Our work establishes a new paradigm for swarm intelligence optimization algorithms, underscoring the indispensable role of prompt evolution.
arXiv Detail & Related papers (2025-12-10T00:37:16Z) - A Hierarchical Hybrid AI Approach: Integrating Deep Reinforcement Learning and Scripted Agents in Combat Simulations [0.0]
This paper introduces a novel hierarchical hybrid artificial intelligence (AI) approach that synergizes the reliability and predictability of scripted agents with the dynamic, adaptive learning capabilities of RL.<n>By structuring the AI system hierarchically, the proposed approach aims to utilize scripted agents for routine, tactical-level decisions and RL agents for higher-level, strategic decision-making.
arXiv Detail & Related papers (2025-11-28T23:50:29Z) - Experience-Guided Reflective Co-Evolution of Prompts and Heuristics for Automatic Algorithm Design [124.54166764570972]
Combinatorial optimization problems are traditionally tackled with handcrafted algorithms.<n>Recent progress has highlighted the potential of automatics design powered by large language models.<n>We propose the Experience-Evolution Reflective Co-Guided of Prompt and Heuristics (EvoPH) for automatic algorithm design.
arXiv Detail & Related papers (2025-09-29T09:24:09Z) - RL-finetuning LLMs from on- and off-policy data with a single algorithm [53.70731390624718]
We introduce a novel reinforcement learning algorithm (AGRO) for fine-tuning large-language models.<n>AGRO leverages the concept of generation consistency, which states that the optimal policy satisfies the notion of consistency across any possible generation of the model.<n>We derive algorithms that find optimal solutions via the sample-based policy gradient and provide theoretical guarantees on their convergence.
arXiv Detail & Related papers (2025-03-25T12:52:38Z) - Faster Last-iterate Convergence of Policy Optimization in Zero-Sum
Markov Games [63.60117916422867]
This paper focuses on the most basic setting of competitive multi-agent RL, namely two-player zero-sum Markov games.
We propose a single-loop policy optimization method with symmetric updates from both agents, where the policy is updated via the entropy-regularized optimistic multiplicative weights update (OMWU) method.
Our convergence results improve upon the best known complexities, and lead to a better understanding of policy optimization in competitive Markov games.
arXiv Detail & Related papers (2022-10-03T16:05:43Z) - Efficient Model-Based Multi-Agent Mean-Field Reinforcement Learning [89.31889875864599]
We propose an efficient model-based reinforcement learning algorithm for learning in multi-agent systems.
Our main theoretical contributions are the first general regret bounds for model-based reinforcement learning for MFC.
We provide a practical parametrization of the core optimization problem.
arXiv Detail & Related papers (2021-07-08T18:01:02Z) - An Efficient Application of Neuroevolution for Competitive Multiagent
Learning [0.0]
NEAT is a popular evolutionary strategy used to obtain the best performing neural network architecture.
This paper utilizes the NEAT algorithm to achieve competitive multiagent learning on a modified pong game environment.
arXiv Detail & Related papers (2021-05-23T10:34:48Z) - Epistocracy Algorithm: A Novel Hyper-heuristic Optimization Strategy for
Solving Complex Optimization Problems [1.471992435706872]
This paper proposes a novel evolutionary algorithm called Epistocracy which incorporates human socio-political behavior and intelligence to solve complex optimization problems.
The inspiration of the Epistocracy algorithm originates from a political regime where educated people have more voting power than the uneducated or less educated.
Experimental results show that the Epistocracy algorithm outperforms the tested state-of-the-art evolutionary and swarm intelligence algorithms in terms of performance, precision, and robustness.
arXiv Detail & Related papers (2021-01-30T19:07:09Z) - Phase Retrieval using Expectation Consistent Signal Recovery Algorithm
based on Hypernetwork [73.94896986868146]
Phase retrieval is an important component in modern computational imaging systems.
Recent advances in deep learning have opened up a new possibility for robust and fast PR.
We develop a novel framework for deep unfolding to overcome the existing limitations.
arXiv Detail & Related papers (2021-01-12T08:36:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.