Grounded Answers for Multi-agent Decision-making Problem through Generative World Model
- URL: http://arxiv.org/abs/2410.02664v1
- Date: Thu, 3 Oct 2024 16:49:59 GMT
- Title: Grounded Answers for Multi-agent Decision-making Problem through Generative World Model
- Authors: Zeyang Liu, Xinrui Yang, Shiguang Sun, Long Qian, Lipeng Wan, Xingyu Chen, Xuguang Lan,
- Abstract summary: generative models often produce sketchy and misleading solutions for complex multi-agent decision-making problems.
We show a paradigm that integrates a language-guided simulator into the multi-agent reinforcement learning pipeline to enhance the generated answer.
In particular, it can generate consistent interaction sequences and explainable reward functions at interaction states, opening the path for training generative models of the future.
- Score: 27.263093790379024
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent progress in generative models has stimulated significant innovations in many fields, such as image generation and chatbots. Despite their success, these models often produce sketchy and misleading solutions for complex multi-agent decision-making problems because they miss the trial-and-error experience and reasoning as humans. To address this limitation, we explore a paradigm that integrates a language-guided simulator into the multi-agent reinforcement learning pipeline to enhance the generated answer. The simulator is a world model that separately learns dynamics and reward, where the dynamics model comprises an image tokenizer as well as a causal transformer to generate interaction transitions autoregressively, and the reward model is a bidirectional transformer learned by maximizing the likelihood of trajectories in the expert demonstrations under language guidance. Given an image of the current state and the task description, we use the world model to train the joint policy and produce the image sequence as the answer by running the converged policy on the dynamics model. The empirical results demonstrate that this framework can improve the answers for multi-agent decision-making problems by showing superior performance on the training and unseen tasks of the StarCraft Multi-Agent Challenge benchmark. In particular, it can generate consistent interaction sequences and explainable reward functions at interaction states, opening the path for training generative models of the future.
Related papers
- Pre-Trained Video Generative Models as World Simulators [59.546627730477454]
We propose Dynamic World Simulation (DWS) to transform pre-trained video generative models into controllable world simulators.
To achieve precise alignment between conditioned actions and generated visual changes, we introduce a lightweight, universal action-conditioned module.
Experiments demonstrate that DWS can be versatilely applied to both diffusion and autoregressive transformer models.
arXiv Detail & Related papers (2025-02-10T14:49:09Z) - Perspectives for Direct Interpretability in Multi-Agent Deep Reinforcement Learning [0.41783829807634765]
Multi-Agent Deep Reinforcement Learning (MADRL) was proven efficient in solving complex problems in robotics or games.
This paper advocates for direct interpretability, generating post hoc explanations directly from trained models.
We explore modern methods, including relevance backpropagation, knowledge edition, model steering, activation patching, sparse autoencoders and circuit discovery.
arXiv Detail & Related papers (2025-02-02T09:15:27Z) - Decentralized Transformers with Centralized Aggregation are Sample-Efficient Multi-Agent World Models [106.94827590977337]
We propose a novel world model for Multi-Agent RL (MARL) that learns decentralized local dynamics for scalability.
We also introduce a Perceiver Transformer as an effective solution to enable centralized representation aggregation.
Results on Starcraft Multi-Agent Challenge (SMAC) show that it outperforms strong model-free approaches and existing model-based methods in both sample efficiency and overall performance.
arXiv Detail & Related papers (2024-06-22T12:40:03Z) - An Interactive Agent Foundation Model [49.77861810045509]
We propose an Interactive Agent Foundation Model that uses a novel multi-task agent training paradigm for training AI agents.
Our training paradigm unifies diverse pre-training strategies, including visual masked auto-encoders, language modeling, and next-action prediction.
We demonstrate the performance of our framework across three separate domains -- Robotics, Gaming AI, and Healthcare.
arXiv Detail & Related papers (2024-02-08T18:58:02Z) - Leveraging World Model Disentanglement in Value-Based Multi-Agent
Reinforcement Learning [18.651307543537655]
We propose a novel model-based multi-agent reinforcement learning approach named Value Decomposition Framework with Disentangled World Model.
We present experimental results in Easy, Hard, and Super-Hard StarCraft II micro-management challenges to demonstrate that our method achieves high sample efficiency and exhibits superior performance in defeating the enemy armies compared to other baselines.
arXiv Detail & Related papers (2023-09-08T22:12:43Z) - Pre-training Contextualized World Models with In-the-wild Videos for
Reinforcement Learning [54.67880602409801]
In this paper, we study the problem of pre-training world models with abundant in-the-wild videos for efficient learning of visual control tasks.
We introduce Contextualized World Models (ContextWM) that explicitly separate context and dynamics modeling.
Our experiments show that in-the-wild video pre-training equipped with ContextWM can significantly improve the sample efficiency of model-based reinforcement learning.
arXiv Detail & Related papers (2023-05-29T14:29:12Z) - Foundation Models for Decision Making: Problems, Methods, and
Opportunities [124.79381732197649]
Foundation models pretrained on diverse data at scale have demonstrated extraordinary capabilities in a wide range of vision and language tasks.
New paradigms are emerging for training foundation models to interact with other agents and perform long-term reasoning.
Research at the intersection of foundation models and decision making holds tremendous promise for creating powerful new systems.
arXiv Detail & Related papers (2023-03-07T18:44:07Z) - DIME: Fine-grained Interpretations of Multimodal Models via Disentangled
Local Explanations [119.1953397679783]
We focus on advancing the state-of-the-art in interpreting multimodal models.
Our proposed approach, DIME, enables accurate and fine-grained analysis of multimodal models.
arXiv Detail & Related papers (2022-03-03T20:52:47Z) - Multiscale Generative Models: Improving Performance of a Generative
Model Using Feedback from Other Dependent Generative Models [10.053377705165786]
We take a first step towards building interacting generative models (GANs) that reflects the interaction in real world.
We build and analyze a hierarchical set-up where a higher-level GAN is conditioned on the output of multiple lower-level GANs.
We present a technique of using feedback from the higher-level GAN to improve performance of lower-level GANs.
arXiv Detail & Related papers (2022-01-24T13:05:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.