MAC-PO: Multi-Agent Experience Replay via Collective Priority
Optimization
- URL: http://arxiv.org/abs/2302.10418v1
- Date: Tue, 21 Feb 2023 03:11:21 GMT
- Title: MAC-PO: Multi-Agent Experience Replay via Collective Priority
Optimization
- Authors: Yongsheng Mei, Hanhan Zhou, Tian Lan, Guru Venkataramani, Peng Wei
- Abstract summary: We propose name, which formulates optimal prioritized experience replay for multi-agent problems.
By minimizing the resulting policy regret, we can narrow the gap between the current policy and a nominal optimal policy.
- Score: 12.473095790918347
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Experience replay is crucial for off-policy reinforcement learning (RL)
methods. By remembering and reusing the experiences from past different
policies, experience replay significantly improves the training efficiency and
stability of RL algorithms. Many decision-making problems in practice naturally
involve multiple agents and require multi-agent reinforcement learning (MARL)
under centralized training decentralized execution paradigm. Nevertheless,
existing MARL algorithms often adopt standard experience replay where the
transitions are uniformly sampled regardless of their importance. Finding
prioritized sampling weights that are optimized for MARL experience replay has
yet to be explored. To this end, we propose \name, which formulates optimal
prioritized experience replay for multi-agent problems as a regret minimization
over the sampling weights of transitions. Such optimization is relaxed and
solved using the Lagrangian multiplier approach to obtain the close-form
optimal sampling weights. By minimizing the resulting policy regret, we can
narrow the gap between the current policy and a nominal optimal policy, thus
acquiring an improved prioritization scheme for multi-agent tasks. Our
experimental results on Predator-Prey and StarCraft Multi-Agent Challenge
environments demonstrate the effectiveness of our method, having a better
ability to replay important transitions and outperforming other
state-of-the-art baselines.
Related papers
- From Novice to Expert: LLM Agent Policy Optimization via Step-wise Reinforcement Learning [62.54484062185869]
We introduce StepAgent, which utilizes step-wise reward to optimize the agent's reinforcement learning process.
We propose implicit-reward and inverse reinforcement learning techniques to facilitate agent reflection and policy adjustment.
arXiv Detail & Related papers (2024-11-06T10:35:11Z) - CoPS: Empowering LLM Agents with Provable Cross-Task Experience Sharing [70.25689961697523]
We propose a generalizable algorithm that enhances sequential reasoning by cross-task experience sharing and selection.
Our work bridges the gap between existing sequential reasoning paradigms and validates the effectiveness of leveraging cross-task experiences.
arXiv Detail & Related papers (2024-10-22T03:59:53Z) - ROER: Regularized Optimal Experience Replay [34.462315999611256]
Prioritized experience replay (PER) reweights experiences by the temporal difference (TD) error.
We show the connections between the experience prioritization and occupancy optimization.
Regularized optimal experience replay (ROER) achieves noticeable improvement on difficult Antmaze environment.
arXiv Detail & Related papers (2024-07-04T15:14:57Z) - CUER: Corrected Uniform Experience Replay for Off-Policy Continuous Deep Reinforcement Learning Algorithms [5.331052581441265]
We develop a novel algorithm, Corrected Uniform Experience (CUER), which samples the stored experience while considering the fairness among all other experiences.
CUER provides promising improvements for off-policy continuous control algorithms in terms of sample efficiency, final performance, and stability of the policy during the training.
arXiv Detail & Related papers (2024-06-13T12:03:40Z) - Mastering the Unsupervised Reinforcement Learning Benchmark from Pixels [112.63440666617494]
Reinforcement learning algorithms can succeed but require large amounts of interactions between the agent and the environment.
We propose a new method to solve it, using unsupervised model-based RL, for pre-training the agent.
We show robust performance on the Real-Word RL benchmark, hinting at resiliency to environment perturbations during adaptation.
arXiv Detail & Related papers (2022-09-24T14:22:29Z) - Plan Better Amid Conservatism: Offline Multi-Agent Reinforcement
Learning with Actor Rectification [74.10976684469435]
offline reinforcement learning (RL) algorithms can be transferred to multi-agent settings directly.
We propose a simple yet effective method, Offline Multi-Agent RL with Actor Rectification (OMAR), to tackle this critical challenge.
OMAR significantly outperforms strong baselines with state-of-the-art performance in multi-agent continuous control benchmarks.
arXiv Detail & Related papers (2021-11-22T13:27:42Z) - Off-Policy Correction for Deep Deterministic Policy Gradient Algorithms
via Batch Prioritized Experience Replay [0.0]
We develop a novel algorithm, Batch Prioritizing Experience Replay via KL Divergence, which prioritizes batch of transitions.
We combine our algorithm with Deep Deterministic Policy Gradient and Twin Delayed Deep Deterministic Policy Gradient and evaluate it on various continuous control tasks.
arXiv Detail & Related papers (2021-11-02T19:51:59Z) - Large Batch Experience Replay [22.473676537463607]
We introduce new theoretical foundations of Prioritized Experience Replay.
LaBER is an easy-to-code and efficient method for sampling the replay buffer.
arXiv Detail & Related papers (2021-10-04T15:53:13Z) - Regret Minimization Experience Replay [14.233842517210437]
prioritized sampling is a promising technique to improve the performance of RL agents.
In this work, we analyze the optimal prioritization strategy that can minimize the regret of RL policy theoretically.
We propose two practical algorithms, RM-DisCor and RM-TCE.
arXiv Detail & Related papers (2021-05-15T16:08:45Z) - Experience Replay with Likelihood-free Importance Weights [123.52005591531194]
We propose to reweight experiences based on their likelihood under the stationary distribution of the current policy.
We apply the proposed approach empirically on two competitive methods, Soft Actor Critic (SAC) and Twin Delayed Deep Deterministic policy gradient (TD3)
arXiv Detail & Related papers (2020-06-23T17:17:44Z) - Scalable Multi-Agent Inverse Reinforcement Learning via
Actor-Attention-Critic [54.2180984002807]
Multi-agent adversarial inverse reinforcement learning (MA-AIRL) is a recent approach that applies single-agent AIRL to multi-agent problems.
We propose a multi-agent inverse RL algorithm that is more sample-efficient and scalable than previous works.
arXiv Detail & Related papers (2020-02-24T20:30:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.