Factored Online Planning in Many-Agent POMDPs
- URL: http://arxiv.org/abs/2312.11434v3
- Date: Fri, 23 Feb 2024 17:35:41 GMT
- Title: Factored Online Planning in Many-Agent POMDPs
- Authors: Maris F.L. Galesloot, Thiago D. Sim\~ao, Sebastian Junges, Nils Jansen
- Abstract summary: In centralized multi-agent systems, the action and observation spaces grow exponentially with the number of agents.
We introduce weighted particle filtering to a sample-based online planner for MPOMDPs.
Third, we present a scalable approximation of the belief.
- Score: 8.728372851272727
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In centralized multi-agent systems, often modeled as multi-agent partially
observable Markov decision processes (MPOMDPs), the action and observation
spaces grow exponentially with the number of agents, making the value and
belief estimation of single-agent online planning ineffective. Prior work
partially tackles value estimation by exploiting the inherent structure of
multi-agent settings via so-called coordination graphs. Additionally, belief
estimation methods have been improved by incorporating the likelihood of
observations into the approximation. However, the challenges of value
estimation and belief estimation have only been tackled individually, which
prevents existing methods from scaling to settings with many agents. Therefore,
we address these challenges simultaneously. First, we introduce weighted
particle filtering to a sample-based online planner for MPOMDPs. Second, we
present a scalable approximation of the belief. Third, we bring an approach
that exploits the typical locality of agent interactions to novel online
planning algorithms for MPOMDPs operating on a so-called sparse particle filter
tree. Our experimental evaluation against several state-of-the-art baselines
shows that our methods (1) are competitive in settings with only a few agents
and (2) improve over the baselines in the presence of many agents.
Related papers
- Counterfactual Conservative Q Learning for Offline Multi-agent
Reinforcement Learning [54.788422270960496]
We propose a novel multi-agent offline RL algorithm, named CounterFactual Conservative Q-Learning (CFCQL)
CFCQL calculates conservative regularization for each agent separately in a counterfactual way and then linearly combines them to realize an overall conservative value estimation.
We prove that it still enjoys the underestimation property and the performance guarantee as those single agent conservative methods do, but the induced regularization and safe policy improvement bound are independent of the agent number.
arXiv Detail & Related papers (2023-09-22T08:10:25Z) - SACHA: Soft Actor-Critic with Heuristic-Based Attention for Partially
Observable Multi-Agent Path Finding [3.4260993997836753]
We propose a novel multi-agent actor-critic method called Soft Actor-Critic with Heuristic-Based Attention (SACHA)
SACHA learns a neural network for each agent to selectively pay attention to the shortest path guidance from multiple agents within its field of view.
We demonstrate decent improvements over several state-of-the-art learning-based MAPF methods with respect to success rate and solution quality.
arXiv Detail & Related papers (2023-07-05T23:36:33Z) - On the Complexity of Multi-Agent Decision Making: From Learning in Games
to Partial Monitoring [105.13668993076801]
A central problem in the theory of multi-agent reinforcement learning (MARL) is to understand what structural conditions and algorithmic principles lead to sample-efficient learning guarantees.
We study this question in a general framework for interactive decision making with multiple agents.
We show that characterizing the statistical complexity for multi-agent decision making is equivalent to characterizing the statistical complexity of single-agent decision making.
arXiv Detail & Related papers (2023-05-01T06:46:22Z) - IPCC-TP: Utilizing Incremental Pearson Correlation Coefficient for Joint
Multi-Agent Trajectory Prediction [73.25645602768158]
IPCC-TP is a novel relevance-aware module based on Incremental Pearson Correlation Coefficient to improve multi-agent interaction modeling.
Our module can be conveniently embedded into existing multi-agent prediction methods to extend original motion distribution decoders.
arXiv Detail & Related papers (2023-03-01T15:16:56Z) - Taming Multi-Agent Reinforcement Learning with Estimator Variance
Reduction [12.94372063457462]
Centralised training with decentralised execution (CT-DE) serves as the foundation of many leading multi-agent reinforcement learning (MARL) algorithms.
It suffers from a critical drawback due to its reliance on learning from a single sample of the joint-action at a given state.
We propose an enhancement tool that accommodates any actor-critic MARL method.
arXiv Detail & Related papers (2022-09-02T13:44:00Z) - Permutation Invariant Policy Optimization for Mean-Field Multi-Agent
Reinforcement Learning: A Principled Approach [128.62787284435007]
We propose the mean-field proximal policy optimization (MF-PPO) algorithm, at the core of which is a permutation-invariant actor-critic neural architecture.
We prove that MF-PPO attains the globally optimal policy at a sublinear rate of convergence.
In particular, we show that the inductive bias introduced by the permutation-invariant neural architecture enables MF-PPO to outperform existing competitors.
arXiv Detail & Related papers (2021-05-18T04:35:41Z) - The Gradient Convergence Bound of Federated Multi-Agent Reinforcement
Learning with Efficient Communication [20.891460617583302]
The paper considers independent reinforcement learning (IRL) for collaborative decision-making in the paradigm of federated learning (FL)
FL generates excessive communication overheads between agents and a remote central server.
This paper proposes two advanced optimization schemes to improve the system's utility value.
arXiv Detail & Related papers (2021-03-24T07:21:43Z) - Off-Policy Multi-Agent Decomposed Policy Gradients [30.389041305278045]
We investigate causes that hinder the performance of MAPG algorithms and present a multi-agent decomposed policy gradient method (DOP)
DOP supports efficient off-policy learning and addresses the issue of centralized-decentralized mismatch and credit assignment.
In addition, empirical evaluations on the StarCraft II micromanagement benchmark and multi-agent particle environments demonstrate that DOP significantly outperforms both state-of-the-art value-based and policy-based multi-agent reinforcement learning algorithms.
arXiv Detail & Related papers (2020-07-24T02:21:55Z) - Multi-Agent Determinantal Q-Learning [39.79718674655209]
We propose multi-agent determinantal Q-learning. Q-DPP promotes agents to acquire diverse behavioral models.
We demonstrate that Q-DPP generalizes major solutions including VDN, QMIX, and QTRAN on decentralizable cooperative tasks.
arXiv Detail & Related papers (2020-06-02T09:32:48Z) - F2A2: Flexible Fully-decentralized Approximate Actor-critic for
Cooperative Multi-agent Reinforcement Learning [110.35516334788687]
Decentralized multi-agent reinforcement learning algorithms are sometimes unpractical in complicated applications.
We propose a flexible fully decentralized actor-critic MARL framework, which can handle large-scale general cooperative multi-agent setting.
Our framework can achieve scalability and stability for large-scale environment and reduce information transmission.
arXiv Detail & Related papers (2020-04-17T14:56:29Z) - FACMAC: Factored Multi-Agent Centralised Policy Gradients [103.30380537282517]
We propose FACtored Multi-Agent Centralised policy gradients (FACMAC)
It is a new method for cooperative multi-agent reinforcement learning in both discrete and continuous action spaces.
We evaluate FACMAC on variants of the multi-agent particle environments, a novel multi-agent MuJoCo benchmark, and a challenging set of StarCraft II micromanagement tasks.
arXiv Detail & Related papers (2020-03-14T21:29:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.