Monte-Carlo Search for an Equilibrium in Dec-POMDPs
- URL: http://arxiv.org/abs/2305.11811v1
- Date: Fri, 19 May 2023 16:47:46 GMT
- Title: Monte-Carlo Search for an Equilibrium in Dec-POMDPs
- Authors: Yang You, Vincent Thomas, Francis Colas, Olivier Buffet
- Abstract summary: Decentralized partially observable Markov decision processes (Dec-POMDPs) formalize the problem of individual controllers for a group of collaborative agents.
seeking a Nash equilibrium -- each agent policy being a best response to the other agents -- is more accessible.
We show that this approach can be adapted to cases where only a generative model (a simulator) of the Dec-POMDP is available.
- Score: 11.726372393432195
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Decentralized partially observable Markov decision processes (Dec-POMDPs)
formalize the problem of designing individual controllers for a group of
collaborative agents under stochastic dynamics and partial observability.
Seeking a global optimum is difficult (NEXP complete), but seeking a Nash
equilibrium -- each agent policy being a best response to the other agents --
is more accessible, and allowed addressing infinite-horizon problems with
solutions in the form of finite state controllers. In this paper, we show that
this approach can be adapted to cases where only a generative model (a
simulator) of the Dec-POMDP is available. This requires relying on a
simulation-based POMDP solver to construct an agent's FSC node by node. A
related process is used to heuristically derive initial FSCs. Experiment with
benchmarks shows that MC-JESP is competitive with exisiting Dec-POMDP solvers,
even better than many offline methods using explicit models.
Related papers
- LLaMA-Berry: Pairwise Optimization for O1-like Olympiad-Level Mathematical Reasoning [56.273799410256075]
The framework combines Monte Carlo Tree Search (MCTS) with iterative Self-Refine to optimize the reasoning path.
The framework has been tested on general and advanced benchmarks, showing superior performance in terms of search efficiency and problem-solving capability.
arXiv Detail & Related papers (2024-10-03T18:12:29Z) - Model-Based RL for Mean-Field Games is not Statistically Harder than Single-Agent RL [57.745700271150454]
We study the sample complexity of reinforcement learning in Mean-Field Games (MFGs) with model-based function approximation.
We introduce the Partial Model-Based Eluder Dimension (P-MBED), a more effective notion to characterize the model class complexity.
arXiv Detail & Related papers (2024-02-08T14:54:47Z) - POMDP inference and robust solution via deep reinforcement learning: An
application to railway optimal maintenance [0.7046417074932257]
We propose a combined framework for inference and robust solution of POMDPs via deep RL.
First, all transition and observation model parameters are jointly inferred via Markov Chain Monte Carlo sampling of a hidden Markov model.
The POMDP with uncertain parameters is then solved via deep RL techniques with the parameter distributions incorporated into the solution via domain randomization.
arXiv Detail & Related papers (2023-07-16T15:44:58Z) - Learning Decentralized Partially Observable Mean Field Control for
Artificial Collective Behavior [28.313779052437134]
We propose novel models for decentralized partially observable MFC (Dec-POMFC)
We provide rigorous theoretical results, including a dynamic programming principle.
Overall, our framework takes a step towards RL-based engineering of artificial collective behavior via MFC.
arXiv Detail & Related papers (2023-07-12T14:02:03Z) - Predictable MDP Abstraction for Unsupervised Model-Based RL [93.91375268580806]
We propose predictable MDP abstraction (PMA)
Instead of training a predictive model on the original MDP, we train a model on a transformed MDP with a learned action space.
We theoretically analyze PMA and empirically demonstrate that PMA leads to significant improvements over prior unsupervised model-based RL approaches.
arXiv Detail & Related papers (2023-02-08T07:37:51Z) - A multilevel reinforcement learning framework for PDE based control [0.2538209532048867]
Reinforcement learning (RL) is a promising method to solve control problems.
Model-free RL algorithms are sample inefficient and require thousands if not millions of samples to learn optimal control policies.
We propose a multilevel RL framework in order to ease this cost by exploiting sublevel models that correspond to coarser scale discretization.
arXiv Detail & Related papers (2022-10-15T23:52:48Z) - Centralized Model and Exploration Policy for Multi-Agent RL [13.661446184763117]
Reinforcement learning in partially observable, fully cooperative multi-agent settings (Dec-POMDPs) can be used to address many real-world challenges.
Current RL algorithms for Dec-POMDPs suffer from poor sample complexity.
We propose a model-based algorithm, MARCO, in three cooperative communication tasks, where it improves sample efficiency by up to 20x.
arXiv Detail & Related papers (2021-07-14T00:34:08Z) - Modular Deep Reinforcement Learning for Continuous Motion Planning with
Temporal Logic [59.94347858883343]
This paper investigates the motion planning of autonomous dynamical systems modeled by Markov decision processes (MDP)
The novelty is to design an embedded product MDP (EP-MDP) between the LDGBA and the MDP.
The proposed LDGBA-based reward shaping and discounting schemes for the model-free reinforcement learning (RL) only depend on the EP-MDP states.
arXiv Detail & Related papers (2021-02-24T01:11:25Z) - Stein Variational Model Predictive Control [130.60527864489168]
Decision making under uncertainty is critical to real-world, autonomous systems.
Model Predictive Control (MPC) methods have demonstrated favorable performance in practice, but remain limited when dealing with complex distributions.
We show that this framework leads to successful planning in challenging, non optimal control problems.
arXiv Detail & Related papers (2020-11-15T22:36:59Z) - Planning in Markov Decision Processes with Gap-Dependent Sample
Complexity [48.98199700043158]
We propose MDP-GapE, a new trajectory-based Monte-Carlo Tree Search algorithm for planning in a Markov Decision Process.
We prove an upper bound on the number of calls to the generative models needed for MDP-GapE to identify a near-optimal action with high probability.
arXiv Detail & Related papers (2020-06-10T15:05:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.