On Solving Cooperative MARL Problems with a Few Good Experiences
- URL: http://arxiv.org/abs/2001.07993v1
- Date: Wed, 22 Jan 2020 12:53:53 GMT
- Title: On Solving Cooperative MARL Problems with a Few Good Experiences
- Authors: Rajiv Ranjan Kumar, Pradeep Varakantham
- Abstract summary: Cooperative Multi-agent Reinforcement Learning (MARL) is crucial for cooperative decentralized decision learning.
In many domains such as search and rescue, drone surveillance, package delivery and fire fighting problems, a key challenge is learning with a few good experiences.
We provide a novel fictitious self imitation approach that is able to simultaneously handle non-stationarity and sparse good experiences.
- Score: 8.596915685049511
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Cooperative Multi-agent Reinforcement Learning (MARL) is crucial for
cooperative decentralized decision learning in many domains such as search and
rescue, drone surveillance, package delivery and fire fighting problems. In
these domains, a key challenge is learning with a few good experiences, i.e.,
positive reinforcements are obtained only in a few situations (e.g., on
extinguishing a fire or tracking a crime or delivering a package) and in most
other situations there is zero or negative reinforcement. Learning decisions
with a few good experiences is extremely challenging in cooperative MARL
problems due to three reasons. First, compared to the single agent case,
exploration is harder as multiple agents have to be coordinated to receive a
good experience. Second, environment is not stationary as all the agents are
learning at the same time (and hence change policies). Third, scale of problem
increases significantly with every additional agent.
Relevant existing work is extensive and has focussed on dealing with a few
good experiences in single-agent RL problems or on scalable approaches for
handling non-stationarity in MARL problems. Unfortunately, neither of these
approaches (or their extensions) are able to address the problem of sparse good
experiences effectively. Therefore, we provide a novel fictitious self
imitation approach that is able to simultaneously handle non-stationarity and
sparse good experiences in a scalable manner. Finally, we provide a thorough
comparison (experimental or descriptive) against relevant cooperative MARL
algorithms to demonstrate the utility of our approach.
Related papers
- MO-MIX: Multi-Objective Multi-Agent Cooperative Decision-Making With Deep Reinforcement Learning [68.91090643731987]
Deep reinforcement learning (RL) has been applied extensively to solve complex decision-making problems.<n>Existing approaches are limited to separate fields and can only handle multi-agent decision-making with a single objective.<n>We propose MO-mix to solve the multi-objective multi-agent reinforcement learning (MOMARL) problem.
arXiv Detail & Related papers (2026-02-28T16:25:22Z) - MARTI-MARS$^2$: Scaling Multi-Agent Self-Search via Reinforcement Learning for Code Generation [64.2621682259008]
Multi-Agent Reinforced Training and Inference Framework with Self-Search Scaling (MARTI-MARS2)<n>We propose a Multi-Agent Reinforced Training and Inference Framework with Self-Search Scaling (MARTI-MARS2) to integrate policy learning with multi-agent tree search.<n>We show that MARTI-MARS2 achieves 77.7%, outperforming strong baselines like GPT-5.1 on challenging code generation benchmarks.
arXiv Detail & Related papers (2026-02-08T07:28:44Z) - SWE-Exp: Experience-Driven Software Issue Resolution [19.525080502900785]
We introduce SWE-Exp, an experience - enhanced approach that distills concise and actionable experience from prior agent trajectories.<n>Our method introduces a multi-faceted experience bank that captures both successful and failed repair attempts.<n>Experiments show that SWE-Exp achieves state-of-the-art resolution rate (41.6% Pass@1) on SWE-bench-Verified.
arXiv Detail & Related papers (2025-07-31T09:13:42Z) - When Disagreements Elicit Robustness: Investigating Self-Repair Capabilities under LLM Multi-Agent Disagreements [56.29265568399648]
We argue that disagreements prevent premature consensus and expand the explored solution space.<n>Disagreements on task-critical steps can derail collaboration depending on the topology of solution paths.
arXiv Detail & Related papers (2025-02-21T02:24:43Z) - O-MAPL: Offline Multi-agent Preference Learning [5.4482836906033585]
Inferring reward functions from demonstrations is a key challenge in reinforcement learning (RL)
We introduce a novel end-to-end preference-based learning framework for cooperative MARL.
Our algorithm outperforms existing methods across various tasks.
arXiv Detail & Related papers (2025-01-31T08:08:20Z) - Multi-Agent Imitation Learning: Value is Easy, Regret is Hard [52.31989962031179]
We study a multi-agent imitation learning (MAIL) problem where we take the perspective of a learner attempting to coordinate a group of agents.
Most prior work in MAIL essentially reduces the problem to matching the behavior of the expert within the support of the demonstrations.
While doing so is sufficient to drive the value gap between the learner and the expert to zero under the assumption that agents are non-strategic, it does not guarantee to deviations by strategic agents.
arXiv Detail & Related papers (2024-06-06T16:18:20Z) - Learning Independently from Causality in Multi-Agent Environments [0.0]
Multi-Agent Reinforcement Learning (MARL) comprises an area of growing interest in the field of machine learning.
The lazy agent pathology is a famous problem in MARL that denotes the event when some of the agents in a MARL team do not contribute to the common goal.
We study a fully decentralised MARL setup where agents need to learn cooperation strategies and show that there is a causal relation between individual observations and the team reward.
arXiv Detail & Related papers (2023-11-05T19:12:08Z) - Learning Reward Machines in Cooperative Multi-Agent Tasks [75.79805204646428]
This paper presents a novel approach to Multi-Agent Reinforcement Learning (MARL)
It combines cooperative task decomposition with the learning of reward machines (RMs) encoding the structure of the sub-tasks.
The proposed method helps deal with the non-Markovian nature of the rewards in partially observable environments.
arXiv Detail & Related papers (2023-03-24T15:12:28Z) - ACE: Cooperative Multi-agent Q-learning with Bidirectional
Action-Dependency [65.28061634546577]
Multi-agent reinforcement learning (MARL) suffers from the non-stationarity problem.
In this paper, we propose bidirectional action-dependent Q-learning (ACE)
ACE outperforms the state-of-the-art algorithms on Google Research Football and StarCraft Multi-Agent Challenge.
arXiv Detail & Related papers (2022-11-29T10:22:55Z) - Revisiting Some Common Practices in Cooperative Multi-Agent
Reinforcement Learning [11.91425153754564]
We show that in environments with a highly multi-modal reward landscape, value decomposition, and parameter sharing can be problematic and lead to undesired outcomes.
In contrast, policy gradient (PG) methods with individual policies provably converge to an optimal solution in these cases.
We present practical suggestions on implementing multi-agent PG algorithms for either high rewards or diverse emergent behaviors.
arXiv Detail & Related papers (2022-06-15T13:03:05Z) - Off-Beat Multi-Agent Reinforcement Learning [62.833358249873704]
We investigate model-free multi-agent reinforcement learning (MARL) in environments where off-beat actions are prevalent.
We propose a novel episodic memory, LeGEM, for model-free MARL algorithms.
We evaluate LeGEM on various multi-agent scenarios with off-beat actions, including Stag-Hunter Game, Quarry Game, Afforestation Game, and StarCraft II micromanagement tasks.
arXiv Detail & Related papers (2022-05-27T02:21:04Z) - MA-Dreamer: Coordination and communication through shared imagination [5.253168177256072]
We present MA-Dreamer, a model-based method that uses both agent-centric and global differentiable models of the environment.
Our experiments show that in long-term speaker-listener tasks and in cooperative games with strong partial-observability, MA-Dreamer finds a solution that makes effective use of coordination.
arXiv Detail & Related papers (2022-04-10T13:54:26Z) - Q-Mixing Network for Multi-Agent Pathfinding in Partially Observable
Grid Environments [62.997667081978825]
We consider the problem of multi-agent navigation in partially observable grid environments.
We suggest utilizing the reinforcement learning approach when the agents, first, learn the policies that map observations to actions and then follow these policies to reach their goals.
arXiv Detail & Related papers (2021-08-13T09:44:47Z) - UneVEn: Universal Value Exploration for Multi-Agent Reinforcement
Learning [53.73686229912562]
We propose a novel MARL approach called Universal Value Exploration (UneVEn)
UneVEn learns a set of related tasks simultaneously with a linear decomposition of universal successor features.
Empirical results on a set of exploration games, challenging cooperative predator-prey tasks requiring significant coordination among agents, and StarCraft II micromanagement benchmarks show that UneVEn can solve tasks where other state-of-the-art MARL methods fail.
arXiv Detail & Related papers (2020-10-06T19:08:47Z) - Shared Experience Actor-Critic for Multi-Agent Reinforcement Learning [11.292086312664383]
Our proposed algorithm, called Shared Experience Actor-Critic (SEAC), applies experience sharing in an actor-critic framework.
We evaluate SEAC in a collection of sparse-reward multi-agent environments and find that it consistently outperforms two baselines and two state-of-the-art algorithms.
arXiv Detail & Related papers (2020-06-12T13:24:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.