Learning Decentralized Partially Observable Mean Field Control for
Artificial Collective Behavior
- URL: http://arxiv.org/abs/2307.06175v2
- Date: Thu, 22 Feb 2024 22:55:14 GMT
- Title: Learning Decentralized Partially Observable Mean Field Control for
Artificial Collective Behavior
- Authors: Kai Cui, Sascha Hauck, Christian Fabian, Heinz Koeppl
- Abstract summary: We propose novel models for decentralized partially observable MFC (Dec-POMFC)
We provide rigorous theoretical results, including a dynamic programming principle.
Overall, our framework takes a step towards RL-based engineering of artificial collective behavior via MFC.
- Score: 28.313779052437134
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent reinforcement learning (RL) methods have achieved success in various
domains. However, multi-agent RL (MARL) remains a challenge in terms of
decentralization, partial observability and scalability to many agents.
Meanwhile, collective behavior requires resolution of the aforementioned
challenges, and remains of importance to many state-of-the-art applications
such as active matter physics, self-organizing systems, opinion dynamics, and
biological or robotic swarms. Here, MARL via mean field control (MFC) offers a
potential solution to scalability, but fails to consider decentralized and
partially observable systems. In this paper, we enable decentralized behavior
of agents under partial information by proposing novel models for decentralized
partially observable MFC (Dec-POMFC), a broad class of problems with
permutation-invariant agents allowing for reduction to tractable single-agent
Markov decision processes (MDP) with single-agent RL solution. We provide
rigorous theoretical results, including a dynamic programming principle,
together with optimality guarantees for Dec-POMFC solutions applied to finite
swarms of interest. Algorithmically, we propose Dec-POMFC-based policy gradient
methods for MARL via centralized training and decentralized execution, together
with policy gradient approximation guarantees. In addition, we improve upon
state-of-the-art histogram-based MFC by kernel methods, which is of separate
interest also for fully observable MFC. We evaluate numerically on
representative collective behavior tasks such as adapted Kuramoto and Vicsek
swarming models, being on par with state-of-the-art MARL. Overall, our
framework takes a step towards RL-based engineering of artificial collective
behavior via MFC.
Related papers
- QFree: A Universal Value Function Factorization for Multi-Agent
Reinforcement Learning [2.287186762346021]
We propose QFree, a universal value function factorization method for multi-agent reinforcement learning.
We show that QFree achieves the state-of-the-art performance in a general-purpose complex MARL benchmark environment.
arXiv Detail & Related papers (2023-11-01T08:07:16Z) - Monte-Carlo Search for an Equilibrium in Dec-POMDPs [11.726372393432195]
Decentralized partially observable Markov decision processes (Dec-POMDPs) formalize the problem of individual controllers for a group of collaborative agents.
seeking a Nash equilibrium -- each agent policy being a best response to the other agents -- is more accessible.
We show that this approach can be adapted to cases where only a generative model (a simulator) of the Dec-POMDP is available.
arXiv Detail & Related papers (2023-05-19T16:47:46Z) - Major-Minor Mean Field Multi-Agent Reinforcement Learning [29.296206774925388]
Multi-agent reinforcement learning (MARL) remains difficult to scale to many agents.
Recent MARL using Mean Field Control (MFC) provides a tractable and rigorous approach to otherwise difficult cooperative MARL.
We generalize MFC to instead simultaneously model many similar and few complex agents.
arXiv Detail & Related papers (2023-03-19T14:12:57Z) - Macro-Action-Based Multi-Agent/Robot Deep Reinforcement Learning under
Partial Observability [4.111899441919164]
State-of-the-art multi-agent reinforcement learning (MARL) methods have provided promising solutions to a variety of complex problems.
We first propose a group of value-based RL approaches for MacDec-POMDPs.
We formulate a set of macro-action-based policy gradient algorithms under the three training paradigms.
arXiv Detail & Related papers (2022-09-20T21:13:51Z) - Efficient Model-Based Multi-Agent Mean-Field Reinforcement Learning [89.31889875864599]
We propose an efficient model-based reinforcement learning algorithm for learning in multi-agent systems.
Our main theoretical contributions are the first general regret bounds for model-based reinforcement learning for MFC.
We provide a practical parametrization of the core optimization problem.
arXiv Detail & Related papers (2021-07-08T18:01:02Z) - Adaptive Stochastic ADMM for Decentralized Reinforcement Learning in
Edge Industrial IoT [106.83952081124195]
Reinforcement learning (RL) has been widely investigated and shown to be a promising solution for decision-making and optimal control processes.
We propose an adaptive ADMM (asI-ADMM) algorithm and apply it to decentralized RL with edge-computing-empowered IIoT networks.
Experiment results show that our proposed algorithms outperform the state of the art in terms of communication costs and scalability, and can well adapt to complex IoT environments.
arXiv Detail & Related papers (2021-06-30T16:49:07Z) - Permutation Invariant Policy Optimization for Mean-Field Multi-Agent
Reinforcement Learning: A Principled Approach [128.62787284435007]
We propose the mean-field proximal policy optimization (MF-PPO) algorithm, at the core of which is a permutation-invariant actor-critic neural architecture.
We prove that MF-PPO attains the globally optimal policy at a sublinear rate of convergence.
In particular, we show that the inductive bias introduced by the permutation-invariant neural architecture enables MF-PPO to outperform existing competitors.
arXiv Detail & Related papers (2021-05-18T04:35:41Z) - F2A2: Flexible Fully-decentralized Approximate Actor-critic for
Cooperative Multi-agent Reinforcement Learning [110.35516334788687]
Decentralized multi-agent reinforcement learning algorithms are sometimes unpractical in complicated applications.
We propose a flexible fully decentralized actor-critic MARL framework, which can handle large-scale general cooperative multi-agent setting.
Our framework can achieve scalability and stability for large-scale environment and reduce information transmission.
arXiv Detail & Related papers (2020-04-17T14:56:29Z) - Monotonic Value Function Factorisation for Deep Multi-Agent
Reinforcement Learning [55.20040781688844]
QMIX is a novel value-based method that can train decentralised policies in a centralised end-to-end fashion.
We propose the StarCraft Multi-Agent Challenge (SMAC) as a new benchmark for deep multi-agent reinforcement learning.
arXiv Detail & Related papers (2020-03-19T16:51:51Z) - FACMAC: Factored Multi-Agent Centralised Policy Gradients [103.30380537282517]
We propose FACtored Multi-Agent Centralised policy gradients (FACMAC)
It is a new method for cooperative multi-agent reinforcement learning in both discrete and continuous action spaces.
We evaluate FACMAC on variants of the multi-agent particle environments, a novel multi-agent MuJoCo benchmark, and a challenging set of StarCraft II micromanagement tasks.
arXiv Detail & Related papers (2020-03-14T21:29:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.