Related papers: Benchmarking Robustness and Generalization in Multi-Agent Systems: A Case Study on Neural MMO

Benchmarking Robustness and Generalization in Multi-Agent Systems: A Case Study on Neural MMO

URL: http://arxiv.org/abs/2308.15802v1
Date: Wed, 30 Aug 2023 07:16:11 GMT
Title: Benchmarking Robustness and Generalization in Multi-Agent Systems: A Case Study on Neural MMO
Authors: Yangkun Chen, Joseph Suarez, Junjie Zhang, Chenghui Yu, Bo Wu, Hanmo Chen, Hengman Zhu, Rui Du, Shanliang Qian, Shuai Liu, Weijun Hong, Jinke He, Yibing Zhang, Liang Zhao, Clare Zhu, Julian Togelius, Sharada Mohanty, Jiaxin Chen, Xiu Li, Xiaolong Zhu, Phillip Isola
Abstract summary: We present the results of the second Neural MMO challenge, hosted at IJCAI 2022, which received 1600+ submissions. This competition targets robustness and generalization in multi-agent systems. We will open-source our benchmark including the environment wrapper, baselines, a visualization tool, and selected policies for further research.
Score: 50.58083807719749
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We present the results of the second Neural MMO challenge, hosted at IJCAI 2022, which received 1600+ submissions. This competition targets robustness and generalization in multi-agent systems: participants train teams of agents to complete a multi-task objective against opponents not seen during training. The competition combines relatively complex environment design with large numbers of agents in the environment. The top submissions demonstrate strong success on this task using mostly standard reinforcement learning (RL) methods combined with domain-specific engineering. We summarize the competition design and results and suggest that, as an academic community, competitions may be a powerful approach to solving hard problems and establishing a solid benchmark for algorithms. We will open-source our benchmark including the environment wrapper, baselines, a visualization tool, and selected policies for further research.

Related papers

Online Submission and Evaluation System Design for Competition Operations [16.589706967125252]
This paper presents an online competition system that automates the submission and evaluation process for a competition.<n>The system has already been used successfully for several competitions, including the Grid-Based Pathfinding Competition and the League of Robot Runners competition.
arXiv Detail & Related papers (2025-07-23T17:44:10Z)
AI Research Agents for Machine Learning: Search, Exploration, and Generalization in MLE-bench [65.21702462691933]
We formalize AI research agents as search policies that navigate a space of candidate solutions, iteratively modifying them using operators.<n>Our best pairing of search strategy and operator set achieves a state-of-the-art result on MLE-bench lite, increasing the success rate of achieving a Kaggle medal from 39.6% to 47.7%.
arXiv Detail & Related papers (2025-07-03T11:59:15Z)
Symbolic Mixture-of-Experts: Adaptive Skill-based Routing for Heterogeneous Reasoning [76.10639521319382]
We propose Symbolic-MoE, a symbolic, text-based, and gradient-free Mixture-of-Experts framework. We show that Symbolic-MoE's instance-level expert selection improves performance by a large margin but -- when implemented naively -- can introduce a high computational overhead.
arXiv Detail & Related papers (2025-03-07T18:03:13Z)
MultiAgentBench: Evaluating the Collaboration and Competition of LLM agents [59.825725526176655]
Large Language Models (LLMs) have shown remarkable capabilities as autonomous agents. Existing benchmarks either focus on single-agent tasks or are confined to narrow domains, failing to capture the dynamics of multi-agent coordination and competition. We introduce MultiAgentBench, a benchmark designed to evaluate LLM-based multi-agent systems across diverse, interactive scenarios.
arXiv Detail & Related papers (2025-03-03T05:18:50Z)
FightLadder: A Benchmark for Competitive Multi-Agent Reinforcement Learning [25.857375787748715]
We present FightLadder, a real-time fighting game platform, to empower competitive MARL research. We provide implementations of state-of-the-art MARL algorithms for competitive games, as well as a set of evaluation metrics. We demonstrate the feasibility of this platform by training a general agent that consistently defeats 12 built-in characters in single-player mode.
arXiv Detail & Related papers (2024-06-04T08:04:23Z)
CompeteSMoE -- Effective Training of Sparse Mixture of Experts via Competition [52.2034494666179]
Sparse mixture of experts (SMoE) offers an appealing solution to scale up the model complexity beyond the mean of increasing the network's depth or width. We propose a competition mechanism to address this fundamental challenge of representation collapse. By routing inputs only to experts with the highest neural response, we show that, under mild assumptions, competition enjoys the same convergence rate as the optimal estimator.
arXiv Detail & Related papers (2024-02-04T15:17:09Z)
The NeurIPS 2022 Neural MMO Challenge: A Massively Multiagent Competition with Specialization and Trade [41.639843908635875]
The NeurIPS-2022 Neural MMO Challenge attracted 500 participants and received over 1,600 submissions. This year's competition runs on the latest v1.6 Neural MMO, which introduces new equipment, combat, trading, and a better scoring system. This paper summarizes the design and results of the challenge, explores the potential of this environment as a benchmark for learning methods, and presents some practical reinforcement learning approaches for complex tasks with sparse rewards.
arXiv Detail & Related papers (2023-11-07T04:14:45Z)
DIAMBRA Arena: a New Reinforcement Learning Platform for Research and Experimentation [91.3755431537592]
This work presents DIAMBRA Arena, a new platform for reinforcement learning research and experimentation. It features a collection of high-quality environments exposing a Python API fully compliant with OpenAI Gym standard. They are episodic tasks with discrete actions and observations composed by raw pixels plus additional numerical values.
arXiv Detail & Related papers (2022-10-19T14:39:10Z)
Retrospective on the 2021 BASALT Competition on Learning from Human Feedback [92.37243979045817]
The goal of the competition was to promote research towards agents that use learning from human feedback (LfHF) techniques to solve open-world tasks. Rather than mandating the use of LfHF techniques, we described four tasks in natural language to be accomplished in the video game Minecraft. Teams developed a diverse range of LfHF algorithms across a variety of possible human feedback types.
arXiv Detail & Related papers (2022-04-14T17:24:54Z)
Towards robust and domain agnostic reinforcement learning competitions [12.731614722371376]
Reinforcement learning competitions have formed the basis for standard research benchmarks. Despite this, a majority of challenges suffer from the same fundamental problems. We present a new framework of competition design that promotes the development of algorithms that overcome these barriers.
arXiv Detail & Related papers (2021-06-07T16:15:46Z)
The MineRL 2020 Competition on Sample Efficient Reinforcement Learning using Human Priors [62.9301667732188]
We propose a second iteration of the MineRL Competition. The primary goal of the competition is to foster the development of algorithms which can efficiently leverage human demonstrations. The competition is structured into two rounds in which competitors are provided several paired versions of the dataset and environment. At the end of each round, competitors submit containerized versions of their learning algorithms to the AIcrowd platform.
arXiv Detail & Related papers (2021-01-26T20:32:30Z)
Forgetful Experience Replay in Hierarchical Reinforcement Learning from Demonstrations [55.41644538483948]
In this paper, we propose a combination of approaches that allow the agent to use low-quality demonstrations in complex vision-based environments. Our proposed goal-oriented structuring of replay buffer allows the agent to automatically highlight sub-goals for solving complex hierarchical tasks in demonstrations. The solution based on our algorithm beats all the solutions for the famous MineRL competition and allows the agent to mine a diamond in the Minecraft environment.
arXiv Detail & Related papers (2020-06-17T15:38:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.