SVDE: Scalable Value-Decomposition Exploration for Cooperative
Multi-Agent Reinforcement Learning
- URL: http://arxiv.org/abs/2303.09058v1
- Date: Thu, 16 Mar 2023 03:17:20 GMT
- Title: SVDE: Scalable Value-Decomposition Exploration for Cooperative
Multi-Agent Reinforcement Learning
- Authors: Shuhan Qi, Shuhao Zhang, Qiang Wang, Jiajia Zhang, Jing Xiao, Xuan
Wang
- Abstract summary: We propose a scalable value-decomposition exploration (SVDE) method, which includes a scalable training mechanism, intrinsic reward design, and explorative experience replay.
Our method achieves the best performance on almost all maps compared to other popular algorithms in a set of StarCraft II micromanagement games.
- Score: 22.389803019100423
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Value-decomposition methods, which reduce the difficulty of a multi-agent
system by decomposing the joint state-action space into local
observation-action spaces, have become popular in cooperative multi-agent
reinforcement learning (MARL). However, value-decomposition methods still have
the problems of tremendous sample consumption for training and lack of active
exploration. In this paper, we propose a scalable value-decomposition
exploration (SVDE) method, which includes a scalable training mechanism,
intrinsic reward design, and explorative experience replay. The scalable
training mechanism asynchronously decouples strategy learning with
environmental interaction, so as to accelerate sample generation in a MapReduce
manner. For the problem of lack of exploration, an intrinsic reward design and
explorative experience replay are proposed, so as to enhance exploration to
produce diverse samples and filter non-novel samples, respectively.
Empirically, our method achieves the best performance on almost all maps
compared to other popular algorithms in a set of StarCraft II micromanagement
games. A data-efficiency experiment also shows the acceleration of SVDE for
sample collection and policy convergence, and we demonstrate the effectiveness
of factors in SVDE through a set of ablation experiments.
Related papers
- Adaptive teachers for amortized samplers [76.88721198565861]
Amortized inference is the task of training a parametric model, such as a neural network, to approximate a distribution with a given unnormalized density where exact sampling is intractable.
Off-policy RL training facilitates the discovery of diverse, high-reward candidates, but existing methods still face challenges in efficient exploration.
We propose an adaptive training distribution (the Teacher) to guide the training of the primary amortized sampler (the Student) by prioritizing high-loss regions.
arXiv Detail & Related papers (2024-10-02T11:33:13Z) - Imagine, Initialize, and Explore: An Effective Exploration Method in
Multi-Agent Reinforcement Learning [27.81925751697255]
We propose a novel method for efficient multi-agent exploration in complex scenarios.
We formulate the imagination as a sequence modeling problem, where the states, observations, prompts, actions, and rewards are predicted autoregressively.
By initializing agents at the critical states, IIE significantly increases the likelihood of discovering potentially important underexplored regions.
arXiv Detail & Related papers (2024-02-28T01:45:01Z) - Learning Better with Less: Effective Augmentation for Sample-Efficient
Visual Reinforcement Learning [57.83232242068982]
Data augmentation (DA) is a crucial technique for enhancing the sample efficiency of visual reinforcement learning (RL) algorithms.
It remains unclear which attributes of DA account for its effectiveness in achieving sample-efficient visual RL.
This work conducts comprehensive experiments to assess the impact of DA's attributes on its efficacy.
arXiv Detail & Related papers (2023-05-25T15:46:20Z) - Strangeness-driven Exploration in Multi-Agent Reinforcement Learning [0.0]
We introduce a new exploration method with the strangeness that can be easily incorporated into any centralized training and decentralized execution (CTDE)-based MARL algorithms.
The exploration bonus is obtained from the strangeness and the proposed exploration method is not much affected by transitions commonly observed in MARL tasks.
arXiv Detail & Related papers (2022-12-27T11:08:49Z) - MEET: A Monte Carlo Exploration-Exploitation Trade-off for Buffer
Sampling [2.501153467354696]
State-of-the-art sampling strategies for the experience replay buffer improve the performance of the Reinforcement Learning agent.
They do not incorporate uncertainty in the Q-Value estimation.
This paper proposes a new sampling strategy that leverages the exploration-exploitation trade-off.
arXiv Detail & Related papers (2022-10-24T18:55:41Z) - Rewarding Episodic Visitation Discrepancy for Exploration in
Reinforcement Learning [64.8463574294237]
We propose Rewarding Episodic Visitation Discrepancy (REVD) as an efficient and quantified exploration method.
REVD provides intrinsic rewards by evaluating the R'enyi divergence-based visitation discrepancy between episodes.
It is tested on PyBullet Robotics Environments and Atari games.
arXiv Detail & Related papers (2022-09-19T08:42:46Z) - Sampling Through the Lens of Sequential Decision Making [9.101505546901999]
We propose a reward-guided sampling strategy called Adaptive Sample with Reward (ASR)
Our approach optimally adjusts the sampling process to achieve optimal performance.
Empirical results in information retrieval and clustering demonstrate ASR's superb performance across different datasets.
arXiv Detail & Related papers (2022-08-17T04:01:29Z) - DEALIO: Data-Efficient Adversarial Learning for Imitation from
Observation [57.358212277226315]
In imitation learning from observation IfO, a learning agent seeks to imitate a demonstrating agent using only observations of the demonstrated behavior without access to the control signals generated by the demonstrator.
Recent methods based on adversarial imitation learning have led to state-of-the-art performance on IfO problems, but they typically suffer from high sample complexity due to a reliance on data-inefficient, model-free reinforcement learning algorithms.
This issue makes them impractical to deploy in real-world settings, where gathering samples can incur high costs in terms of time, energy, and risk.
We propose a more data-efficient IfO algorithm
arXiv Detail & Related papers (2021-03-31T23:46:32Z) - Multi-Scale Positive Sample Refinement for Few-Shot Object Detection [61.60255654558682]
Few-shot object detection (FSOD) helps detectors adapt to unseen classes with few training instances.
We propose a Multi-scale Positive Sample Refinement (MPSR) approach to enrich object scales in FSOD.
MPSR generates multi-scale positive samples as object pyramids and refines the prediction at various scales.
arXiv Detail & Related papers (2020-07-18T09:48:29Z) - Forgetful Experience Replay in Hierarchical Reinforcement Learning from
Demonstrations [55.41644538483948]
In this paper, we propose a combination of approaches that allow the agent to use low-quality demonstrations in complex vision-based environments.
Our proposed goal-oriented structuring of replay buffer allows the agent to automatically highlight sub-goals for solving complex hierarchical tasks in demonstrations.
The solution based on our algorithm beats all the solutions for the famous MineRL competition and allows the agent to mine a diamond in the Minecraft environment.
arXiv Detail & Related papers (2020-06-17T15:38:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.