Combining a Meta-Policy and Monte-Carlo Planning for Scalable Type-Based
Reasoning in Partially Observable Environments
- URL: http://arxiv.org/abs/2306.06067v1
- Date: Fri, 9 Jun 2023 17:43:49 GMT
- Title: Combining a Meta-Policy and Monte-Carlo Planning for Scalable Type-Based
Reasoning in Partially Observable Environments
- Authors: Jonathon Schwartz, Hanna Kurniawati, Marcus Hutter
- Abstract summary: We propose an online Monte-Carlo Tree Search based planning method for type-based reasoning in large partially observable environments.
POTMMCP incorporates a novel meta-policy for guiding search and evaluating beliefs, allowing it to search more effectively to longer horizons.
We show that our method converges to the optimal solution in the limit and empirically demonstrate that it effectively adapts online to diverse sets of other agents.
- Score: 21.548271801592907
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The design of autonomous agents that can interact effectively with other
agents without prior coordination is a core problem in multi-agent systems.
Type-based reasoning methods achieve this by maintaining a belief over a set of
potential behaviours for the other agents. However, current methods are limited
in that they assume full observability of the state and actions of the other
agent or do not scale efficiently to larger problems with longer planning
horizons. Addressing these limitations, we propose Partially Observable
Type-based Meta Monte-Carlo Planning (POTMMCP) - an online Monte-Carlo Tree
Search based planning method for type-based reasoning in large partially
observable environments. POTMMCP incorporates a novel meta-policy for guiding
search and evaluating beliefs, allowing it to search more effectively to longer
horizons using less planning time. We show that our method converges to the
optimal solution in the limit and empirically demonstrate that it effectively
adapts online to diverse sets of other agents across a range of environments.
Comparisons with the state-of-the art method on problems with up to $10^{14}$
states and $10^8$ observations indicate that POTMMCP is able to compute better
solutions significantly faster.
Related papers
- Efficient Adaptation in Mixed-Motive Environments via Hierarchical Opponent Modeling and Planning [51.52387511006586]
We propose Hierarchical Opponent modeling and Planning (HOP), a novel multi-agent decision-making algorithm.
HOP is hierarchically composed of two modules: an opponent modeling module that infers others' goals and learns corresponding goal-conditioned policies.
HOP exhibits superior few-shot adaptation capabilities when interacting with various unseen agents, and excels in self-play scenarios.
arXiv Detail & Related papers (2024-06-12T08:48:06Z) - Decentralized Monte Carlo Tree Search for Partially Observable
Multi-agent Pathfinding [49.730902939565986]
Multi-Agent Pathfinding problem involves finding a set of conflict-free paths for a group of agents confined to a graph.
In this study, we focus on the decentralized MAPF setting, where the agents may observe the other agents only locally.
We propose a decentralized multi-agent Monte Carlo Tree Search (MCTS) method for MAPF tasks.
arXiv Detail & Related papers (2023-12-26T06:57:22Z) - Factored Online Planning in Many-Agent POMDPs [8.728372851272727]
In centralized multi-agent systems, the action and observation spaces grow exponentially with the number of agents.
We introduce weighted particle filtering to a sample-based online planner for MPOMDPs.
Third, we present a scalable approximation of the belief.
arXiv Detail & Related papers (2023-12-18T18:35:30Z) - AI planning in the imagination: High-level planning on learned abstract
search spaces [68.75684174531962]
We propose a new method, called PiZero, that gives an agent the ability to plan in an abstract search space that the agent learns during training.
We evaluate our method on multiple domains, including the traveling salesman problem, Sokoban, 2048, the facility location problem, and Pacman.
arXiv Detail & Related papers (2023-08-16T22:47:16Z) - BetaZero: Belief-State Planning for Long-Horizon POMDPs using Learned Approximations [37.29355942795658]
Real-world planning problems have been modeled as partially observable Markov decision processes (POMDPs) and solved using approximate methods.
To solve high-dimensional POMDPs in practice, state-of-the-art methods use online planning with problem-specifics to reduce planning horizons.
We propose BetaZero, a belief-state planning algorithm for high-dimensional POMDPs.
arXiv Detail & Related papers (2023-05-31T23:47:31Z) - Continuous Monte Carlo Graph Search [61.11769232283621]
Continuous Monte Carlo Graph Search ( CMCGS) is an extension of Monte Carlo Tree Search (MCTS) to online planning.
CMCGS takes advantage of the insight that, during planning, sharing the same action policy between several states can yield high performance.
It can be scaled up through parallelization, and it outperforms the Cross-Entropy Method (CEM) in continuous control with learned dynamics models.
arXiv Detail & Related papers (2022-10-04T07:34:06Z) - Rule-based Shielding for Partially Observable Monte-Carlo Planning [78.05638156687343]
We propose two contributions to Partially Observable Monte-Carlo Planning (POMCP)
The first is a method for identifying unexpected actions selected by POMCP with respect to expert prior knowledge of the task.
The second is a shielding approach that prevents POMCP from selecting unexpected actions.
We evaluate our approach on Tiger, a standard benchmark for POMDPs, and a real-world problem related to velocity regulation in mobile robot navigation.
arXiv Detail & Related papers (2021-04-28T14:23:38Z) - Scalable Anytime Planning for Multi-Agent MDPs [37.69939216970677]
We present a scalable tree search planning algorithm for large multi-agent sequential decision problems that require dynamic collaboration.
Our algorithm comprises three elements: online planning with Monte Carlo Tree Search (MCTS), factored representations of local agent interactions with coordination graphs, and the iterative Max-Plus method for joint action selection.
arXiv Detail & Related papers (2021-01-12T22:50:17Z) - Decentralized MCTS via Learned Teammate Models [89.24858306636816]
We present a trainable online decentralized planning algorithm based on decentralized Monte Carlo Tree Search.
We show that deep learning and convolutional neural networks can be employed to produce accurate policy approximators.
arXiv Detail & Related papers (2020-03-19T13:10:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.