Related papers: BenchMARL: Benchmarking Multi-Agent Reinforcement Learning

BenchMARL: Benchmarking Multi-Agent Reinforcement Learning

URL: http://arxiv.org/abs/2312.01472v2
Date: Fri, 5 Jul 2024 09:31:43 GMT
Title: BenchMARL: Benchmarking Multi-Agent Reinforcement Learning
Authors: Matteo Bettini, Amanda Prorok, Vincent Moens,
Abstract summary: BenchMARL is the first training library created to enable standardized benchmarking across different algorithms, models, and environments. BenchMARL uses TorchRL as its backend, granting it high performance and maintained state-of-the-art implementations.
Score: 8.130948896195878
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The field of Multi-Agent Reinforcement Learning (MARL) is currently facing a reproducibility crisis. While solutions for standardized reporting have been proposed to address the issue, we still lack a benchmarking tool that enables standardization and reproducibility, while leveraging cutting-edge Reinforcement Learning (RL) implementations. In this paper, we introduce BenchMARL, the first MARL training library created to enable standardized benchmarking across different algorithms, models, and environments. BenchMARL uses TorchRL as its backend, granting it high performance and maintained state-of-the-art implementations while addressing the broad community of MARL PyTorch users. Its design enables systematic configuration and reporting, thus allowing users to create and run complex benchmarks from simple one-line inputs. BenchMARL is open-sourced on GitHub: https://github.com/facebookresearch/BenchMARL

Related papers

ResBench: Benchmarking LLM-Generated FPGA Designs with Resource Awareness [7.3895963946365795]
Large Language Models (LLMs) have emerged as a promising tool for HDL generation. Existing benchmarks for LLM-based code generation primarily focus on functional correctness while overlooking hardware resource usage. We introduce ResBench, the first resource-focused benchmark explicitly designed to distinguish between resource-optimized and inefficient LLM-generated HDL code.
arXiv Detail & Related papers (2025-03-11T18:54:17Z)
MIRAGE-Bench: Automatic Multilingual Benchmark Arena for Retrieval-Augmented Generation Systems [43.19298196163617]
We develop MIRAGE-Bench, a standardized arena-based multilingual RAG benchmark for 18 diverse languages on Wikipedia. Using this idea, We develop MIRAGE-Bench, a standardized arena-based multilingual RAG benchmark for 18 diverse languages on Wikipedia.
arXiv Detail & Related papers (2024-10-17T16:18:49Z)
Benchmarking Predictive Coding Networks -- Made Simple [48.652114040426625]
We tackle the problems of efficiency and scalability for predictive coding networks (PCNs) in machine learning. We propose a library, called PCX, that focuses on performance and simplicity, and use it to implement a large set of standard benchmarks. We perform extensive tests on such benchmarks using both existing algorithms for PCNs, as well as adaptations of other methods popular in the bio-plausible deep learning community.
arXiv Detail & Related papers (2024-07-01T10:33:44Z)
LiveBench: A Challenging, Contamination-Free LLM Benchmark [101.21578097087699]
We release LiveBench, the first benchmark that contains frequently-updated questions from recent information sources. We evaluate many prominent closed-source models, as well as dozens of open-source models ranging from 0.5B to 110B in size. Questions will be added and updated on a monthly basis, and we will release new tasks and harder versions of tasks over time.
arXiv Detail & Related papers (2024-06-27T16:47:42Z)
MixEval: Deriving Wisdom of the Crowd from LLM Benchmark Mixtures [57.886592207948844]
We propose MixEval, a new paradigm for establishing efficient, gold-standard evaluation by strategically mixing off-the-shelf benchmarks. It bridges (1) comprehensive and well-distributed real-world user queries and (2) efficient and fairly-graded ground-truth-based benchmarks, by matching queries mined from the web with similar queries from existing benchmarks.
arXiv Detail & Related papers (2024-06-03T05:47:05Z)
JaxMARL: Multi-Agent RL Environments and Algorithms in JAX [105.343918678781]
We present JaxMARL, the first open-source, Python-based library that combines GPU-enabled efficiency with support for a large number of commonly used MARL environments. Our experiments show that, in terms of wall clock time, our JAX-based training pipeline is around 14 times faster than existing approaches. We also introduce and benchmark SMAX, a JAX-based approximate reimplementation of the popular StarCraft Multi-Agent Challenge.
arXiv Detail & Related papers (2023-11-16T18:58:43Z)
AlberDICE: Addressing Out-Of-Distribution Joint Actions in Offline Multi-Agent RL via Alternating Stationary Distribution Correction Estimation [65.4532392602682]
One of the main challenges in offline Reinforcement Learning (RL) is the distribution shift that arises from the learned policy deviating from the data collection policy. This is often addressed by avoiding out-of-distribution (OOD) actions during policy improvement as their presence can lead to substantial performance degradation. We introduce AlberDICE, an offline MARL algorithm that performs centralized training of individual agents based on stationary distribution optimization.
arXiv Detail & Related papers (2023-11-03T18:56:48Z)
RLTF: Reinforcement Learning from Unit Test Feedback [17.35361167578498]
Reinforcement Learning from Unit Test Feedback is a novel online RL framework with unit test feedback of multi-granularity for refining code LLMs. Our approach generates data in real-time during training and simultaneously utilizes fine-grained feedback signals to guide the model towards producing higher-quality code.
arXiv Detail & Related papers (2023-07-10T05:18:18Z)
SequeL: A Continual Learning Library in PyTorch and JAX [50.33956216274694]
SequeL is a library for Continual Learning that supports both PyTorch and JAX frameworks. It provides a unified interface for a wide range of Continual Learning algorithms, including regularization-based approaches, replay-based approaches, and hybrid approaches. We release SequeL as an open-source library, enabling researchers and developers to easily experiment and extend the library for their own purposes.
arXiv Detail & Related papers (2023-04-21T10:00:22Z)
Off-the-Grid MARL: Datasets with Baselines for Offline Multi-Agent Reinforcement Learning [4.159549932951023]
offline multi-agent reinforcement learning (MARL) provides a promising paradigm for building effective decentralised controllers from such datasets. MARL is still in its infancy and therefore lacks standardised benchmark datasets and baselines. OG-MARL is a growing repository of high-quality datasets with baselines for cooperative offline MARL research.
arXiv Detail & Related papers (2023-02-01T15:41:27Z)
MARLlib: A Scalable and Efficient Multi-agent Reinforcement Learning Library [82.77446613763809]
We present MARLlib, a library designed to offer fast development for multi-agent tasks and algorithm combinations. MARLlib can effectively disentangle the intertwined nature of the multi-agent task and the learning process of the algorithm. The library's source code is publicly accessible on GitHub.
arXiv Detail & Related papers (2022-10-11T03:11:12Z)

This list is automatically generated from the titles and abstracts of the papers in this site.