Leveraging World Model Disentanglement in Value-Based Multi-Agent
Reinforcement Learning
- URL: http://arxiv.org/abs/2309.04615v1
- Date: Fri, 8 Sep 2023 22:12:43 GMT
- Title: Leveraging World Model Disentanglement in Value-Based Multi-Agent
Reinforcement Learning
- Authors: Zhizun Wang and David Meger
- Abstract summary: We propose a novel model-based multi-agent reinforcement learning approach named Value Decomposition Framework with Disentangled World Model.
We present experimental results in Easy, Hard, and Super-Hard StarCraft II micro-management challenges to demonstrate that our method achieves high sample efficiency and exhibits superior performance in defeating the enemy armies compared to other baselines.
- Score: 18.651307543537655
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we propose a novel model-based multi-agent reinforcement
learning approach named Value Decomposition Framework with Disentangled World
Model to address the challenge of achieving a common goal of multiple agents
interacting in the same environment with reduced sample complexity. Due to
scalability and non-stationarity problems posed by multi-agent systems,
model-free methods rely on a considerable number of samples for training. In
contrast, we use a modularized world model, composed of action-conditioned,
action-free, and static branches, to unravel the environment dynamics and
produce imagined outcomes based on past experience, without sampling directly
from the real environment. We employ variational auto-encoders and variational
graph auto-encoders to learn the latent representations for the world model,
which is merged with a value-based framework to predict the joint action-value
function and optimize the overall training objective. We present experimental
results in Easy, Hard, and Super-Hard StarCraft II micro-management challenges
to demonstrate that our method achieves high sample efficiency and exhibits
superior performance in defeating the enemy armies compared to other baselines.
Related papers
- Decentralized Transformers with Centralized Aggregation are Sample-Efficient Multi-Agent World Models [106.94827590977337]
We propose a novel world model for Multi-Agent RL (MARL) that learns decentralized local dynamics for scalability.
We also introduce a Perceiver Transformer as an effective solution to enable centralized representation aggregation.
Results on Starcraft Multi-Agent Challenge (SMAC) show that it outperforms strong model-free approaches and existing model-based methods in both sample efficiency and overall performance.
arXiv Detail & Related papers (2024-06-22T12:40:03Z) - ReCoRe: Regularized Contrastive Representation Learning of World Model [21.29132219042405]
We present a world model that learns invariant features using contrastive unsupervised learning and an intervention-invariant regularizer.
Our method outperforms current state-of-the-art model-based and model-free RL methods and significantly improves on out-of-distribution point navigation tasks evaluated on the iGibson benchmark.
arXiv Detail & Related papers (2023-12-14T15:53:07Z) - STORM: Efficient Stochastic Transformer based World Models for
Reinforcement Learning [82.03481509373037]
Recently, model-based reinforcement learning algorithms have demonstrated remarkable efficacy in visual input environments.
We introduce Transformer-based wORld Model (STORM), an efficient world model architecture that combines strong modeling and generation capabilities.
Storm achieves a mean human performance of $126.7%$ on the Atari $100$k benchmark, setting a new record among state-of-the-art methods.
arXiv Detail & Related papers (2023-10-14T16:42:02Z) - HarmonyDream: Task Harmonization Inside World Models [93.07314830304193]
Model-based reinforcement learning (MBRL) holds the promise of sample-efficient learning.
We propose a simple yet effective approach, HarmonyDream, which automatically adjusts loss coefficients to maintain task harmonization.
arXiv Detail & Related papers (2023-09-30T11:38:13Z) - Mingling Foresight with Imagination: Model-Based Cooperative Multi-Agent
Reinforcement Learning [15.12491397254381]
We propose an implicit model-based multi-agent reinforcement learning method based on value decomposition methods.
Under this method, agents can interact with the learned virtual environment and evaluate the current state value according to imagined future states.
arXiv Detail & Related papers (2022-04-20T12:16:27Z) - Efficient Model-based Multi-agent Reinforcement Learning via Optimistic
Equilibrium Computation [93.52573037053449]
H-MARL (Hallucinated Multi-Agent Reinforcement Learning) learns successful equilibrium policies after a few interactions with the environment.
We demonstrate our approach experimentally on an autonomous driving simulation benchmark.
arXiv Detail & Related papers (2022-03-14T17:24:03Z) - Multiscale Generative Models: Improving Performance of a Generative
Model Using Feedback from Other Dependent Generative Models [10.053377705165786]
We take a first step towards building interacting generative models (GANs) that reflects the interaction in real world.
We build and analyze a hierarchical set-up where a higher-level GAN is conditioned on the output of multiple lower-level GANs.
We present a technique of using feedback from the higher-level GAN to improve performance of lower-level GANs.
arXiv Detail & Related papers (2022-01-24T13:05:56Z) - HyperTransformer: Model Generation for Supervised and Semi-Supervised
Few-Shot Learning [14.412066456583917]
We propose a transformer-based model for few-shot learning that generates weights of a convolutional neural network (CNN) directly from support samples.
Our method is particularly effective for small target CNN architectures where learning a fixed universal task-independent embedding is not optimal.
We extend our approach to a semi-supervised regime utilizing unlabeled samples in the support set and further improving few-shot performance.
arXiv Detail & Related papers (2022-01-11T20:15:35Z) - Model-based Multi-agent Policy Optimization with Adaptive Opponent-wise
Rollouts [52.844741540236285]
This paper investigates the model-based methods in multi-agent reinforcement learning (MARL)
We propose a novel decentralized model-based MARL method, named Adaptive Opponent-wise Rollout Policy (AORPO)
arXiv Detail & Related papers (2021-05-07T16:20:22Z) - Goal-Aware Prediction: Learning to Model What Matters [105.43098326577434]
One of the fundamental challenges in using a learned forward dynamics model is the mismatch between the objective of the learned model and that of the downstream planner or policy.
We propose to direct prediction towards task relevant information, enabling the model to be aware of the current task and encouraging it to only model relevant quantities of the state space.
We find that our method more effectively models the relevant parts of the scene conditioned on the goal, and as a result outperforms standard task-agnostic dynamics models and model-free reinforcement learning.
arXiv Detail & Related papers (2020-07-14T16:42:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.