Related papers: MABL: Bi-Level Latent-Variable World Model for Sample-Efficient Multi-Agent Reinforcement Learning

MABL: Bi-Level Latent-Variable World Model for Sample-Efficient Multi-Agent Reinforcement Learning

URL: http://arxiv.org/abs/2304.06011v2
Date: Tue, 13 Feb 2024 19:50:54 GMT
Title: MABL: Bi-Level Latent-Variable World Model for Sample-Efficient Multi-Agent Reinforcement Learning
Authors: Aravind Venugopal, Stephanie Milani, Fei Fang, Balaraman Ravindran
Abstract summary: We propose a novel model-based MARL algorithm, MABL, that learns a bi-level latent-variable world model from high-dimensional inputs. For each agent, MABL learns a global latent state at the upper level, which is used to inform the learning of an agent latent state at the lower level. MaBL surpasses SOTA multi-agent latent-variable world models in both sample efficiency and overall performance.
Score: 43.30657890400801
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Multi-agent reinforcement learning (MARL) methods often suffer from high sample complexity, limiting their use in real-world problems where data is sparse or expensive to collect. Although latent-variable world models have been employed to address this issue by generating abundant synthetic data for MARL training, most of these models cannot encode vital global information available during training into their latent states, which hampers learning efficiency. The few exceptions that incorporate global information assume centralized execution of their learned policies, which is impractical in many applications with partial observability. We propose a novel model-based MARL algorithm, MABL (Multi-Agent Bi-Level world model), that learns a bi-level latent-variable world model from high-dimensional inputs. Unlike existing models, MABL is capable of encoding essential global information into the latent states during training while guaranteeing the decentralized execution of learned policies. For each agent, MABL learns a global latent state at the upper level, which is used to inform the learning of an agent latent state at the lower level. During execution, agents exclusively use lower-level latent states and act independently. Crucially, MABL can be combined with any model-free MARL algorithm for policy learning. In our empirical evaluation with complex discrete and continuous multi-agent tasks including SMAC, Flatland, and MAMuJoCo, MABL surpasses SOTA multi-agent latent-variable world models in both sample efficiency and overall performance.

Related papers

Ego-centric Learning of Communicative World Models for Autonomous Driving [31.66608520780982]
We study multi-agent reinforcement learning (MARL) for tasks in complex high-dimensional environments, such as autonomous driving.<n>By making use of generative AI embodied in world model together with its latent representation, we develop it CALL, underlineCommunicunderlineative Worunderlineld Modeunderlinel.
arXiv Detail & Related papers (2025-06-09T18:56:40Z)
Revisiting Multi-Agent World Modeling from a Diffusion-Inspired Perspective [45.44355861030715]
We develop a flexible and robust world model for Multi-Agent Reinforcement Learning (MARL) using diffusion models.<n>Our method, Diffusion-Inspired Multi-Agent world model (DIMA), achieves state-of-the-art performance across multiple multi-agent control benchmarks.
arXiv Detail & Related papers (2025-05-27T09:11:38Z)
Large Language Models as Attribution Regularizers for Efficient Model Training [0.0]
Large Language Models (LLMs) have demonstrated remarkable performance across diverse domains. We introduce a novel yet straightforward method for incorporating LLM-generated global task feature attributions into the training process of smaller networks. Our approach yields superior performance in few-shot learning scenarios.
arXiv Detail & Related papers (2025-02-27T16:55:18Z)
Text2World: Benchmarking Large Language Models for Symbolic World Model Generation [45.03755994315517]
We introduce a novel benchmark, Text2World, based on planning domain definition language (PDDL) We find that reasoning models trained with large-scale reinforcement learning outperform others. Building on these insights, we examine several promising strategies to enhance the world modeling capabilities of LLMs.
arXiv Detail & Related papers (2025-02-18T17:59:48Z)
Scaling Autonomous Agents via Automatic Reward Modeling And Planning [52.39395405893965]
Large language models (LLMs) have demonstrated remarkable capabilities across a range of tasks. However, they still struggle with problems requiring multi-step decision-making and environmental feedback. We propose a framework that can automatically learn a reward model from the environment without human annotations.
arXiv Detail & Related papers (2025-02-17T18:49:25Z)
Ranked from Within: Ranking Large Multimodal Models for Visual Question Answering Without Labels [64.94853276821992]
Large multimodal models (LMMs) are increasingly deployed across diverse applications. Traditional evaluation methods are largely dataset-centric, relying on fixed, labeled datasets and supervised metrics. We explore unsupervised model ranking for LMMs by leveraging their uncertainty signals, such as softmax probabilities.
arXiv Detail & Related papers (2024-12-09T13:05:43Z)
Model-in-the-Loop (MILO): Accelerating Multimodal AI Data Annotation with LLMs [19.331803578031188]
We propose the Model-in-the-Loop (MILO) framework, which integrates AI/ML models into the annotation process. Our research introduces a collaborative paradigm that leverages the strengths of both professional human annotators and large language models (LLMs) Three empirical studies on multimodal data annotation demonstrate MILO's efficacy in reducing handling time, improving data quality, and enhancing annotator experiences.
arXiv Detail & Related papers (2024-09-16T20:05:57Z)
Decentralized Transformers with Centralized Aggregation are Sample-Efficient Multi-Agent World Models [106.94827590977337]
We propose a novel world model for Multi-Agent RL (MARL) that learns decentralized local dynamics for scalability. We also introduce a Perceiver Transformer as an effective solution to enable centralized representation aggregation. Results on Starcraft Multi-Agent Challenge (SMAC) show that it outperforms strong model-free approaches and existing model-based methods in both sample efficiency and overall performance.
arXiv Detail & Related papers (2024-06-22T12:40:03Z)
Uni-MoE: Scaling Unified Multimodal LLMs with Mixture of Experts [54.529880848937104]
We develop a unified MLLM with the MoE architecture, named Uni-MoE, that can handle a wide array of modalities. Specifically, it features modality-specific encoders with connectors for a unified multimodal representation. We evaluate the instruction-tuned Uni-MoE on a comprehensive set of multimodal datasets.
arXiv Detail & Related papers (2024-05-18T12:16:01Z)
Probing Multimodal Large Language Models for Global and Local Semantic Representations [57.25949445963422]
We study which layers of Multimodal Large Language Models make the most effort to the global image information. In this study, we find that the intermediate layers of models can encode more global semantic information. We find that the topmost layers may excessively focus on local information, leading to a diminished ability to encode global information.
arXiv Detail & Related papers (2024-02-27T08:27:15Z)
Multimodal Federated Learning via Contrastive Representation Ensemble [17.08211358391482]
Federated learning (FL) serves as a privacy-conscious alternative to centralized machine learning. Existing FL methods all rely on model aggregation on single modality level. We propose Contrastive Representation Ensemble and Aggregation for Multimodal FL (CreamFL)
arXiv Detail & Related papers (2023-02-17T14:17:44Z)
Off-the-Grid MARL: Datasets with Baselines for Offline Multi-Agent Reinforcement Learning [4.159549932951023]
offline multi-agent reinforcement learning (MARL) provides a promising paradigm for building effective decentralised controllers from such datasets. MARL is still in its infancy and therefore lacks standardised benchmark datasets and baselines. OG-MARL is a growing repository of high-quality datasets with baselines for cooperative offline MARL research.
arXiv Detail & Related papers (2023-02-01T15:41:27Z)
Do Embodied Agents Dream of Pixelated Sheep: Embodied Decision Making using Language Guided World Modelling [101.59430768507997]
Reinforcement learning (RL) agents typically learn tabula rasa, without prior knowledge of the world. We propose using few-shot large language models (LLMs) to hypothesize an Abstract World Model (AWM) Our method of hypothesizing an AWM with LLMs and then verifying the AWM based on agent experience not only increases sample efficiency over contemporary methods by an order of magnitude.
arXiv Detail & Related papers (2023-01-28T02:04:07Z)
Large Language Models Are Latent Variable Models: Explaining and Finding Good Demonstrations for In-Context Learning [104.58874584354787]
In recent years, pre-trained large language models (LLMs) have demonstrated remarkable efficiency in achieving an inference-time few-shot learning capability known as in-context learning. This study aims to examine the in-context learning phenomenon through a Bayesian lens, viewing real-world LLMs as latent variable models.
arXiv Detail & Related papers (2023-01-27T18:59:01Z)
Scalable Multi-Agent Reinforcement Learning through Intelligent Information Aggregation [6.09506921406322]
We propose a novel architecture for multi-agent reinforcement learning (MARL) which uses local information intelligently to compute paths for all the agents in a decentralized manner. InforMARL aggregates information about the local neighborhood of agents for both the actor and the critic using a graph neural network and can be used in conjunction with any standard MARL algorithm.
arXiv Detail & Related papers (2022-11-03T20:02:45Z)

This list is automatically generated from the titles and abstracts of the papers in this site.