BenchMARL: Benchmarking Multi-Agent Reinforcement Learning
- URL: http://arxiv.org/abs/2312.01472v2
- Date: Fri, 5 Jul 2024 09:31:43 GMT
- Title: BenchMARL: Benchmarking Multi-Agent Reinforcement Learning
- Authors: Matteo Bettini, Amanda Prorok, Vincent Moens,
- Abstract summary: BenchMARL is the first training library created to enable standardized benchmarking across different algorithms, models, and environments.
BenchMARL uses TorchRL as its backend, granting it high performance and maintained state-of-the-art implementations.
- Score: 8.130948896195878
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The field of Multi-Agent Reinforcement Learning (MARL) is currently facing a reproducibility crisis. While solutions for standardized reporting have been proposed to address the issue, we still lack a benchmarking tool that enables standardization and reproducibility, while leveraging cutting-edge Reinforcement Learning (RL) implementations. In this paper, we introduce BenchMARL, the first MARL training library created to enable standardized benchmarking across different algorithms, models, and environments. BenchMARL uses TorchRL as its backend, granting it high performance and maintained state-of-the-art implementations while addressing the broad community of MARL PyTorch users. Its design enables systematic configuration and reporting, thus allowing users to create and run complex benchmarks from simple one-line inputs. BenchMARL is open-sourced on GitHub: https://github.com/facebookresearch/BenchMARL
Related papers
- MIRAGE-Bench: Automatic Multilingual Benchmark Arena for Retrieval-Augmented Generation Systems [43.19298196163617]
We develop MIRAGE-Bench, a standardized arena-based multilingual RAG benchmark for 18 diverse languages on Wikipedia.
Using this idea, We develop MIRAGE-Bench, a standardized arena-based multilingual RAG benchmark for 18 diverse languages on Wikipedia.
arXiv Detail & Related papers (2024-10-17T16:18:49Z) - LiveBench: A Challenging, Contamination-Free LLM Benchmark [101.21578097087699]
We release LiveBench, the first benchmark that contains frequently-updated questions from recent information sources.
We evaluate many prominent closed-source models, as well as dozens of open-source models ranging from 0.5B to 110B in size.
Questions will be added and updated on a monthly basis, and we will release new tasks and harder versions of tasks over time.
arXiv Detail & Related papers (2024-06-27T16:47:42Z) - MixEval: Deriving Wisdom of the Crowd from LLM Benchmark Mixtures [57.886592207948844]
We propose MixEval, a new paradigm for establishing efficient, gold-standard evaluation by strategically mixing off-the-shelf benchmarks.
It bridges (1) comprehensive and well-distributed real-world user queries and (2) efficient and fairly-graded ground-truth-based benchmarks, by matching queries mined from the web with similar queries from existing benchmarks.
arXiv Detail & Related papers (2024-06-03T05:47:05Z) - JaxMARL: Multi-Agent RL Environments and Algorithms in JAX [105.343918678781]
We present JaxMARL, the first open-source, Python-based library that combines GPU-enabled efficiency with support for a large number of commonly used MARL environments.
Our experiments show that, in terms of wall clock time, our JAX-based training pipeline is around 14 times faster than existing approaches.
We also introduce and benchmark SMAX, a JAX-based approximate reimplementation of the popular StarCraft Multi-Agent Challenge.
arXiv Detail & Related papers (2023-11-16T18:58:43Z) - AlberDICE: Addressing Out-Of-Distribution Joint Actions in Offline
Multi-Agent RL via Alternating Stationary Distribution Correction Estimation [65.4532392602682]
One of the main challenges in offline Reinforcement Learning (RL) is the distribution shift that arises from the learned policy deviating from the data collection policy.
This is often addressed by avoiding out-of-distribution (OOD) actions during policy improvement as their presence can lead to substantial performance degradation.
We introduce AlberDICE, an offline MARL algorithm that performs centralized training of individual agents based on stationary distribution optimization.
arXiv Detail & Related papers (2023-11-03T18:56:48Z) - RLTF: Reinforcement Learning from Unit Test Feedback [17.35361167578498]
Reinforcement Learning from Unit Test Feedback is a novel online RL framework with unit test feedback of multi-granularity for refining code LLMs.
Our approach generates data in real-time during training and simultaneously utilizes fine-grained feedback signals to guide the model towards producing higher-quality code.
arXiv Detail & Related papers (2023-07-10T05:18:18Z) - SequeL: A Continual Learning Library in PyTorch and JAX [50.33956216274694]
SequeL is a library for Continual Learning that supports both PyTorch and JAX frameworks.
It provides a unified interface for a wide range of Continual Learning algorithms, including regularization-based approaches, replay-based approaches, and hybrid approaches.
We release SequeL as an open-source library, enabling researchers and developers to easily experiment and extend the library for their own purposes.
arXiv Detail & Related papers (2023-04-21T10:00:22Z) - Off-the-Grid MARL: Datasets with Baselines for Offline Multi-Agent
Reinforcement Learning [4.159549932951023]
offline multi-agent reinforcement learning (MARL) provides a promising paradigm for building effective decentralised controllers from such datasets.
MARL is still in its infancy and therefore lacks standardised benchmark datasets and baselines.
OG-MARL is a growing repository of high-quality datasets with baselines for cooperative offline MARL research.
arXiv Detail & Related papers (2023-02-01T15:41:27Z) - MARLlib: A Scalable and Efficient Multi-agent Reinforcement Learning
Library [82.77446613763809]
We present MARLlib, a library designed to offer fast development for multi-agent tasks and algorithm combinations.
MARLlib can effectively disentangle the intertwined nature of the multi-agent task and the learning process of the algorithm.
The library's source code is publicly accessible on GitHub.
arXiv Detail & Related papers (2022-10-11T03:11:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.