A Novel Point-based Algorithm for Multi-agent Control Using the Common
Information Approach
- URL: http://arxiv.org/abs/2304.04346v1
- Date: Mon, 10 Apr 2023 01:27:43 GMT
- Title: A Novel Point-based Algorithm for Multi-agent Control Using the Common
Information Approach
- Authors: Dengwang Tang, Ashutosh Nayyar, Rahul Jain
- Abstract summary: We propose a new algorithm for multi-agent control problems, called coordinator's search value (CHSVI)
The algorithm combines the CI approach and point-based POMDP algorithms for large action spaces.
We demonstrate the algorithm through optimally solving several benchmark problems.
- Score: 8.733794945008562
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: The Common Information (CI) approach provides a systematic way to transform a
multi-agent stochastic control problem to a single-agent partially observed
Markov decision problem (POMDP) called the coordinator's POMDP. However, such a
POMDP can be hard to solve due to its extraordinarily large action space. We
propose a new algorithm for multi-agent stochastic control problems, called
coordinator's heuristic search value iteration (CHSVI), that combines the CI
approach and point-based POMDP algorithms for large action spaces. We
demonstrate the algorithm through optimally solving several benchmark problems.
Related papers
- UCB-driven Utility Function Search for Multi-objective Reinforcement Learning [75.11267478778295]
In Multi-objective Reinforcement Learning (MORL) agents are tasked with optimising decision-making behaviours.
We focus on the case of linear utility functions parameterised by weight vectors w.
We introduce a method based on Upper Confidence Bound to efficiently search for the most promising weight vectors during different stages of the learning process.
arXiv Detail & Related papers (2024-05-01T09:34:42Z) - Decentralised Q-Learning for Multi-Agent Markov Decision Processes with
a Satisfiability Criterion [0.0]
We propose a reinforcement learning algorithm to solve a multi-agent Markov decision process (MMDP)
The goal is to lower the time average cost of each agent to below a pre-specified agent-specific bound.
arXiv Detail & Related papers (2023-11-21T13:56:44Z) - Deep Reinforcement Learning for Uplink Scheduling in NOMA-URLLC Networks [7.182684187774442]
This article addresses the problem of Ultra Reliable Low Communications (URLLC) in wireless networks, a framework with particularly stringent constraints imposed by many Internet of Things (IoT) applications from diverse sectors.
We propose a novel Deep Reinforcement Learning (DRL) scheduling algorithm, to solve the Non-Orthogonal Multiple Access (NOMA) uplink URLLC scheduling problem involving strict deadlines.
arXiv Detail & Related papers (2023-08-28T12:18:02Z) - Reinforcement Learning Methods for Wordle: A POMDP/Adaptive Control
Approach [0.3093890460224435]
We address the solution of the popular Wordle puzzle, using new reinforcement learning methods.
For the Wordle puzzle, they yield on-line solution strategies that are very close to optimal at relatively modest computational cost.
arXiv Detail & Related papers (2022-11-15T03:46:41Z) - Multi-Objective Policy Gradients with Topological Constraints [108.10241442630289]
We present a new algorithm for a policy gradient in TMDPs by a simple extension of the proximal policy optimization (PPO) algorithm.
We demonstrate this on a real-world multiple-objective navigation problem with an arbitrary ordering of objectives both in simulation and on a real robot.
arXiv Detail & Related papers (2022-09-15T07:22:58Z) - Emergence of Theory of Mind Collaboration in Multiagent Systems [65.97255691640561]
We propose an adaptive training algorithm to develop effective collaboration between agents with ToM.
We evaluate our algorithms with two games, where our algorithm surpasses all previous decentralized execution algorithms without modeling ToM.
arXiv Detail & Related papers (2021-09-30T23:28:00Z) - Permutation Invariant Policy Optimization for Mean-Field Multi-Agent
Reinforcement Learning: A Principled Approach [128.62787284435007]
We propose the mean-field proximal policy optimization (MF-PPO) algorithm, at the core of which is a permutation-invariant actor-critic neural architecture.
We prove that MF-PPO attains the globally optimal policy at a sublinear rate of convergence.
In particular, we show that the inductive bias introduced by the permutation-invariant neural architecture enables MF-PPO to outperform existing competitors.
arXiv Detail & Related papers (2021-05-18T04:35:41Z) - Scalable, Decentralized Multi-Agent Reinforcement Learning Methods
Inspired by Stigmergy and Ant Colonies [0.0]
We investigate a novel approach to decentralized multi-agent learning and planning.
In particular, this method is inspired by the cohesion, coordination, and behavior of ant colonies.
The approach combines single-agent RL and an ant-colony-inspired decentralized, stigmergic algorithm for multi-agent path planning and environment modification.
arXiv Detail & Related papers (2021-05-08T01:04:51Z) - Planning in Markov Decision Processes with Gap-Dependent Sample
Complexity [48.98199700043158]
We propose MDP-GapE, a new trajectory-based Monte-Carlo Tree Search algorithm for planning in a Markov Decision Process.
We prove an upper bound on the number of calls to the generative models needed for MDP-GapE to identify a near-optimal action with high probability.
arXiv Detail & Related papers (2020-06-10T15:05:51Z) - A Novel Multi-Agent System for Complex Scheduling Problems [2.294014185517203]
This paper is the conception and implementation of a multi-agent system that is applicable in various problem domains.
We simulate a NP-hard scheduling problem to demonstrate the validity of our approach.
This paper highlights the advantages of the agent-based approach, like the reduction in layout complexity, improved control of complicated systems, and extendability.
arXiv Detail & Related papers (2020-04-20T14:04:58Z) - Decentralized MCTS via Learned Teammate Models [89.24858306636816]
We present a trainable online decentralized planning algorithm based on decentralized Monte Carlo Tree Search.
We show that deep learning and convolutional neural networks can be employed to produce accurate policy approximators.
arXiv Detail & Related papers (2020-03-19T13:10:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.