MABL: Bi-Level Latent-Variable World Model for Sample-Efficient
Multi-Agent Reinforcement Learning
- URL: http://arxiv.org/abs/2304.06011v2
- Date: Tue, 13 Feb 2024 19:50:54 GMT
- Title: MABL: Bi-Level Latent-Variable World Model for Sample-Efficient
Multi-Agent Reinforcement Learning
- Authors: Aravind Venugopal, Stephanie Milani, Fei Fang, Balaraman Ravindran
- Abstract summary: We propose a novel model-based MARL algorithm, MABL, that learns a bi-level latent-variable world model from high-dimensional inputs.
For each agent, MABL learns a global latent state at the upper level, which is used to inform the learning of an agent latent state at the lower level.
MaBL surpasses SOTA multi-agent latent-variable world models in both sample efficiency and overall performance.
- Score: 43.30657890400801
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Multi-agent reinforcement learning (MARL) methods often suffer from high
sample complexity, limiting their use in real-world problems where data is
sparse or expensive to collect. Although latent-variable world models have been
employed to address this issue by generating abundant synthetic data for MARL
training, most of these models cannot encode vital global information available
during training into their latent states, which hampers learning efficiency.
The few exceptions that incorporate global information assume centralized
execution of their learned policies, which is impractical in many applications
with partial observability.
We propose a novel model-based MARL algorithm, MABL (Multi-Agent Bi-Level
world model), that learns a bi-level latent-variable world model from
high-dimensional inputs. Unlike existing models, MABL is capable of encoding
essential global information into the latent states during training while
guaranteeing the decentralized execution of learned policies. For each agent,
MABL learns a global latent state at the upper level, which is used to inform
the learning of an agent latent state at the lower level. During execution,
agents exclusively use lower-level latent states and act independently.
Crucially, MABL can be combined with any model-free MARL algorithm for policy
learning. In our empirical evaluation with complex discrete and continuous
multi-agent tasks including SMAC, Flatland, and MAMuJoCo, MABL surpasses SOTA
multi-agent latent-variable world models in both sample efficiency and overall
performance.
Related papers
- Text2World: Benchmarking Large Language Models for Symbolic World Model Generation [41.02446816970586]
We introduce a novel benchmark, Text2World, based on planning domain definition language (PDDL)
We find that reasoning models trained with large-scale reinforcement learning outperform others.
Building on these insights, we examine several promising strategies to enhance the world modeling capabilities of LLMs.
arXiv Detail & Related papers (2025-02-18T17:59:48Z) - Scaling Autonomous Agents via Automatic Reward Modeling And Planning [52.39395405893965]
Large language models (LLMs) have demonstrated remarkable capabilities across a range of tasks.
However, they still struggle with problems requiring multi-step decision-making and environmental feedback.
We propose a framework that can automatically learn a reward model from the environment without human annotations.
arXiv Detail & Related papers (2025-02-17T18:49:25Z) - Ranked from Within: Ranking Large Multimodal Models for Visual Question Answering Without Labels [64.94853276821992]
Large multimodal models (LMMs) are increasingly deployed across diverse applications.
Traditional evaluation methods are largely dataset-centric, relying on fixed, labeled datasets and supervised metrics.
We explore unsupervised model ranking for LMMs by leveraging their uncertainty signals, such as softmax probabilities.
arXiv Detail & Related papers (2024-12-09T13:05:43Z) - Model-in-the-Loop (MILO): Accelerating Multimodal AI Data Annotation with LLMs [19.331803578031188]
We propose the Model-in-the-Loop (MILO) framework, which integrates AI/ML models into the annotation process.
Our research introduces a collaborative paradigm that leverages the strengths of both professional human annotators and large language models (LLMs)
Three empirical studies on multimodal data annotation demonstrate MILO's efficacy in reducing handling time, improving data quality, and enhancing annotator experiences.
arXiv Detail & Related papers (2024-09-16T20:05:57Z) - Decentralized Transformers with Centralized Aggregation are Sample-Efficient Multi-Agent World Models [106.94827590977337]
We propose a novel world model for Multi-Agent RL (MARL) that learns decentralized local dynamics for scalability.
We also introduce a Perceiver Transformer as an effective solution to enable centralized representation aggregation.
Results on Starcraft Multi-Agent Challenge (SMAC) show that it outperforms strong model-free approaches and existing model-based methods in both sample efficiency and overall performance.
arXiv Detail & Related papers (2024-06-22T12:40:03Z) - Probing Multimodal Large Language Models for Global and Local Semantic Representations [57.25949445963422]
We study which layers of Multimodal Large Language Models make the most effort to the global image information.
In this study, we find that the intermediate layers of models can encode more global semantic information.
We find that the topmost layers may excessively focus on local information, leading to a diminished ability to encode global information.
arXiv Detail & Related papers (2024-02-27T08:27:15Z) - Multimodal Federated Learning via Contrastive Representation Ensemble [17.08211358391482]
Federated learning (FL) serves as a privacy-conscious alternative to centralized machine learning.
Existing FL methods all rely on model aggregation on single modality level.
We propose Contrastive Representation Ensemble and Aggregation for Multimodal FL (CreamFL)
arXiv Detail & Related papers (2023-02-17T14:17:44Z) - Off-the-Grid MARL: Datasets with Baselines for Offline Multi-Agent
Reinforcement Learning [4.159549932951023]
offline multi-agent reinforcement learning (MARL) provides a promising paradigm for building effective decentralised controllers from such datasets.
MARL is still in its infancy and therefore lacks standardised benchmark datasets and baselines.
OG-MARL is a growing repository of high-quality datasets with baselines for cooperative offline MARL research.
arXiv Detail & Related papers (2023-02-01T15:41:27Z) - Do Embodied Agents Dream of Pixelated Sheep: Embodied Decision Making
using Language Guided World Modelling [101.59430768507997]
Reinforcement learning (RL) agents typically learn tabula rasa, without prior knowledge of the world.
We propose using few-shot large language models (LLMs) to hypothesize an Abstract World Model (AWM)
Our method of hypothesizing an AWM with LLMs and then verifying the AWM based on agent experience not only increases sample efficiency over contemporary methods by an order of magnitude.
arXiv Detail & Related papers (2023-01-28T02:04:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.