MABL: Bi-Level Latent-Variable World Model for Sample-Efficient
Multi-Agent Reinforcement Learning
- URL: http://arxiv.org/abs/2304.06011v2
- Date: Tue, 13 Feb 2024 19:50:54 GMT
- Title: MABL: Bi-Level Latent-Variable World Model for Sample-Efficient
Multi-Agent Reinforcement Learning
- Authors: Aravind Venugopal, Stephanie Milani, Fei Fang, Balaraman Ravindran
- Abstract summary: We propose a novel model-based MARL algorithm, MABL, that learns a bi-level latent-variable world model from high-dimensional inputs.
For each agent, MABL learns a global latent state at the upper level, which is used to inform the learning of an agent latent state at the lower level.
MaBL surpasses SOTA multi-agent latent-variable world models in both sample efficiency and overall performance.
- Score: 43.30657890400801
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Multi-agent reinforcement learning (MARL) methods often suffer from high
sample complexity, limiting their use in real-world problems where data is
sparse or expensive to collect. Although latent-variable world models have been
employed to address this issue by generating abundant synthetic data for MARL
training, most of these models cannot encode vital global information available
during training into their latent states, which hampers learning efficiency.
The few exceptions that incorporate global information assume centralized
execution of their learned policies, which is impractical in many applications
with partial observability.
We propose a novel model-based MARL algorithm, MABL (Multi-Agent Bi-Level
world model), that learns a bi-level latent-variable world model from
high-dimensional inputs. Unlike existing models, MABL is capable of encoding
essential global information into the latent states during training while
guaranteeing the decentralized execution of learned policies. For each agent,
MABL learns a global latent state at the upper level, which is used to inform
the learning of an agent latent state at the lower level. During execution,
agents exclusively use lower-level latent states and act independently.
Crucially, MABL can be combined with any model-free MARL algorithm for policy
learning. In our empirical evaluation with complex discrete and continuous
multi-agent tasks including SMAC, Flatland, and MAMuJoCo, MABL surpasses SOTA
multi-agent latent-variable world models in both sample efficiency and overall
performance.
Related papers
- Model-in-the-Loop (MILO): Accelerating Multimodal AI Data Annotation with LLMs [19.331803578031188]
We propose the Model-in-the-Loop (MILO) framework, which integrates AI/ML models into the annotation process.
Our research introduces a collaborative paradigm that leverages the strengths of both professional human annotators and large language models (LLMs)
Three empirical studies on multimodal data annotation demonstrate MILO's efficacy in reducing handling time, improving data quality, and enhancing annotator experiences.
arXiv Detail & Related papers (2024-09-16T20:05:57Z) - Decentralized Transformers with Centralized Aggregation are Sample-Efficient Multi-Agent World Models [106.94827590977337]
We propose a novel world model for Multi-Agent RL (MARL) that learns decentralized local dynamics for scalability.
We also introduce a Perceiver Transformer as an effective solution to enable centralized representation aggregation.
Results on Starcraft Multi-Agent Challenge (SMAC) show that it outperforms strong model-free approaches and existing model-based methods in both sample efficiency and overall performance.
arXiv Detail & Related papers (2024-06-22T12:40:03Z) - Uni-MoE: Scaling Unified Multimodal LLMs with Mixture of Experts [54.529880848937104]
We develop a unified MLLM with the MoE architecture, named Uni-MoE, that can handle a wide array of modalities.
Specifically, it features modality-specific encoders with connectors for a unified multimodal representation.
We evaluate the instruction-tuned Uni-MoE on a comprehensive set of multimodal datasets.
arXiv Detail & Related papers (2024-05-18T12:16:01Z) - Probing Multimodal Large Language Models for Global and Local Semantic Representations [57.25949445963422]
We study which layers of Multimodal Large Language Models make the most effort to the global image information.
In this study, we find that the intermediate layers of models can encode more global semantic information.
We find that the topmost layers may excessively focus on local information, leading to a diminished ability to encode global information.
arXiv Detail & Related papers (2024-02-27T08:27:15Z) - Multimodal Federated Learning via Contrastive Representation Ensemble [17.08211358391482]
Federated learning (FL) serves as a privacy-conscious alternative to centralized machine learning.
Existing FL methods all rely on model aggregation on single modality level.
We propose Contrastive Representation Ensemble and Aggregation for Multimodal FL (CreamFL)
arXiv Detail & Related papers (2023-02-17T14:17:44Z) - Off-the-Grid MARL: Datasets with Baselines for Offline Multi-Agent
Reinforcement Learning [4.159549932951023]
offline multi-agent reinforcement learning (MARL) provides a promising paradigm for building effective decentralised controllers from such datasets.
MARL is still in its infancy and therefore lacks standardised benchmark datasets and baselines.
OG-MARL is a growing repository of high-quality datasets with baselines for cooperative offline MARL research.
arXiv Detail & Related papers (2023-02-01T15:41:27Z) - Do Embodied Agents Dream of Pixelated Sheep: Embodied Decision Making
using Language Guided World Modelling [101.59430768507997]
Reinforcement learning (RL) agents typically learn tabula rasa, without prior knowledge of the world.
We propose using few-shot large language models (LLMs) to hypothesize an Abstract World Model (AWM)
Our method of hypothesizing an AWM with LLMs and then verifying the AWM based on agent experience not only increases sample efficiency over contemporary methods by an order of magnitude.
arXiv Detail & Related papers (2023-01-28T02:04:07Z) - Large Language Models Are Latent Variable Models: Explaining and Finding
Good Demonstrations for In-Context Learning [104.58874584354787]
In recent years, pre-trained large language models (LLMs) have demonstrated remarkable efficiency in achieving an inference-time few-shot learning capability known as in-context learning.
This study aims to examine the in-context learning phenomenon through a Bayesian lens, viewing real-world LLMs as latent variable models.
arXiv Detail & Related papers (2023-01-27T18:59:01Z) - Scalable Multi-Agent Reinforcement Learning through Intelligent
Information Aggregation [6.09506921406322]
We propose a novel architecture for multi-agent reinforcement learning (MARL) which uses local information intelligently to compute paths for all the agents in a decentralized manner.
InforMARL aggregates information about the local neighborhood of agents for both the actor and the critic using a graph neural network and can be used in conjunction with any standard MARL algorithm.
arXiv Detail & Related papers (2022-11-03T20:02:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.