Multi-Agent Craftax: Benchmarking Open-Ended Multi-Agent Reinforcement Learning at the Hyperscale
- URL: http://arxiv.org/abs/2511.04904v1
- Date: Fri, 07 Nov 2025 01:09:36 GMT
- Title: Multi-Agent Craftax: Benchmarking Open-Ended Multi-Agent Reinforcement Learning at the Hyperscale
- Authors: Bassel Al Omari, Michael Matthews, Alexander Rutherford, Jakob Nicolaus Foerster,
- Abstract summary: textitCraftax-MA is an extension of the popular open-ended RL environment, Craftax.<n>textitCraftax-Coop introduces heterogeneous agents, trading and more mechanics that require complex cooperation among agents for success.
- Score: 53.08403177911567
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Progress in multi-agent reinforcement learning (MARL) requires challenging benchmarks that assess the limits of current methods. However, existing benchmarks often target narrow short-horizon challenges that do not adequately stress the long-term dependencies and generalization capabilities inherent in many multi-agent systems. To address this, we first present \textit{Craftax-MA}: an extension of the popular open-ended RL environment, Craftax, that supports multiple agents and evaluates a wide range of general abilities within a single environment. Written in JAX, \textit{Craftax-MA} is exceptionally fast with a training run using 250 million environment interactions completing in under an hour. To provide a more compelling challenge for MARL, we also present \textit{Craftax-Coop}, an extension introducing heterogeneous agents, trading and more mechanics that require complex cooperation among agents for success. We provide analysis demonstrating that existing algorithms struggle with key challenges in this benchmark, including long-horizon credit assignment, exploration and cooperation, and argue for its potential to drive long-term research in MARL.
Related papers
- MARTI-MARS$^2$: Scaling Multi-Agent Self-Search via Reinforcement Learning for Code Generation [64.2621682259008]
Multi-Agent Reinforced Training and Inference Framework with Self-Search Scaling (MARTI-MARS2)<n>We propose a Multi-Agent Reinforced Training and Inference Framework with Self-Search Scaling (MARTI-MARS2) to integrate policy learning with multi-agent tree search.<n>We show that MARTI-MARS2 achieves 77.7%, outperforming strong baselines like GPT-5.1 on challenging code generation benchmarks.
arXiv Detail & Related papers (2026-02-08T07:28:44Z) - Collaborative Multi-Agent Test-Time Reinforcement Learning for Reasoning [112.16686518063456]
We introduce textbfMulti-Agent Test-Time Reinforcement Learning (MATTRL), a framework that injects structured textual experience into multi-agent deliberation at inference time.<n>MATTRL forms a multi-expert team of specialists for multi-turn discussions, retrieves and integrates test-time experiences, and reaches consensus for final decision-making.<n>Across challenging benchmarks in medicine, math, and education, MATTRL improves accuracy by an average of 3.67% over a multi-agent baseline, and by 8.67% over comparable single-agent baselines
arXiv Detail & Related papers (2026-01-14T17:57:43Z) - UltraHorizon: Benchmarking Agent Capabilities in Ultra Long-Horizon Scenarios [63.67884284105684]
We introduce textbfUltraHorizon, a novel benchmark that measures the foundational capabilities essential for complex real-world challenges.<n>Agents are designed in long-horizon discovery tasks where they must iteratively uncover hidden rules.<n>Our experiments reveal that LLM-agents consistently underperform in these settings, whereas human participants achieve higher scores.
arXiv Detail & Related papers (2025-09-26T02:04:00Z) - The Heterogeneous Multi-Agent Challenge [0.4199844472131922]
Multi-Agent Reinforcement Learning (MARL) is a growing research area which gained significant traction in recent years.<n> Heterogeneous Multi-Agent Reinforcement Learning (HeMARL) is where agents with different sensors, resources, or capabilities must cooperate based on local information.
arXiv Detail & Related papers (2025-09-23T19:30:30Z) - O-MAPL: Offline Multi-agent Preference Learning [5.4482836906033585]
Inferring reward functions from demonstrations is a key challenge in reinforcement learning (RL)<n>We introduce a novel end-to-end preference-based learning framework for cooperative MARL.<n>Our algorithm outperforms existing methods across various tasks.
arXiv Detail & Related papers (2025-01-31T08:08:20Z) - Large Multimodal Agents: A Survey [78.81459893884737]
Large language models (LLMs) have achieved superior performance in powering text-based AI agents.
There is an emerging research trend focused on extending these LLM-powered AI agents into the multimodal domain.
This review aims to provide valuable insights and guidelines for future research in this rapidly evolving field.
arXiv Detail & Related papers (2024-02-23T06:04:23Z) - JaxMARL: Multi-Agent RL Environments and Algorithms in JAX [105.343918678781]
We present JaxMARL, the first open-source, Python-based library that combines GPU-enabled efficiency with support for a large number of commonly used MARL environments.
Our experiments show that, in terms of wall clock time, our JAX-based training pipeline is around 14 times faster than existing approaches.
We also introduce and benchmark SMAX, a JAX-based approximate reimplementation of the popular StarCraft Multi-Agent Challenge.
arXiv Detail & Related papers (2023-11-16T18:58:43Z) - FoX: Formation-aware exploration in multi-agent reinforcement learning [10.554220876480297]
We propose a formation-based equivalence relation on the exploration space and aim to reduce the search space by exploring only meaningful states in different formations.
Numerical results show that the proposed FoX framework significantly outperforms the state-of-the-art MARL algorithms on Google Research Football (GRF) and sparse Starcraft II multi-agent challenge (SMAC) tasks.
arXiv Detail & Related papers (2023-08-22T08:39:44Z) - Cooperative Exploration for Multi-Agent Deep Reinforcement Learning [127.4746863307944]
We propose cooperative multi-agent exploration (CMAE) for deep reinforcement learning.
The goal is selected from multiple projected state spaces via a normalized entropy-based technique.
We demonstrate that CMAE consistently outperforms baselines on various tasks.
arXiv Detail & Related papers (2021-07-23T20:06:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.