Efficient Monte Carlo Tree Search via On-the-Fly State-Conditioned Action Abstraction
- URL: http://arxiv.org/abs/2406.00614v1
- Date: Sun, 2 Jun 2024 04:31:30 GMT
- Title: Efficient Monte Carlo Tree Search via On-the-Fly State-Conditioned Action Abstraction
- Authors: Yunhyeok Kwak, Inwoo Hwang, Dooyoung Kim, Sanghack Lee, Byoung-Tak Zhang,
- Abstract summary: We propose an action abstraction based on the compositional structure between a state and sub-actions.
Our method learns a latent dynamics model with an auxiliary network that captures sub-actions relevant to the transition on the current state.
- Score: 27.53460927687747
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Monte Carlo Tree Search (MCTS) has showcased its efficacy across a broad spectrum of decision-making problems. However, its performance often degrades under vast combinatorial action space, especially where an action is composed of multiple sub-actions. In this work, we propose an action abstraction based on the compositional structure between a state and sub-actions for improving the efficiency of MCTS under a factored action space. Our method learns a latent dynamics model with an auxiliary network that captures sub-actions relevant to the transition on the current state, which we call state-conditioned action abstraction. Notably, it infers such compositional relationships from high-dimensional observations without the known environment model. During the tree traversal, our method constructs the state-conditioned action abstraction for each node on-the-fly, reducing the search space by discarding the exploration of redundant sub-actions. Experimental results demonstrate the superior sample efficiency of our method compared to vanilla MuZero, which suffers from expansive action space.
Related papers
- Action abstractions for amortized sampling [49.384037138511246]
We propose an approach to incorporate the discovery of action abstractions, or high-level actions, into the policy optimization process.
Our approach involves iteratively extracting action subsequences commonly used across many high-reward trajectories and chunking' them into a single action that is added to the action space.
arXiv Detail & Related papers (2024-10-19T19:22:50Z) - Concrete Subspace Learning based Interference Elimination for Multi-task
Model Fusion [86.6191592951269]
Merging models fine-tuned from common extensively pretrained large model but specialized for different tasks has been demonstrated as a cheap and scalable strategy to construct a multitask model that performs well across diverse tasks.
We propose the CONtinuous relaxation dis (Concrete) subspace learning method to identify a common lowdimensional subspace and utilize its shared information track interference problem without sacrificing performance.
arXiv Detail & Related papers (2023-12-11T07:24:54Z) - Adaptive Discretization using Voronoi Trees for Continuous POMDPs [7.713622698801596]
We propose a new sampling-based online POMDP solver, called Adaptive Discretization using Voronoi Trees (ADVT)
It uses Monte Carlo Tree Search in combination with an adaptive discretization of the action space as well as optimistic optimization to efficiently sample high-dimensional continuous action spaces.
ADVT scales substantially better to high-dimensional continuous action spaces, compared to state-of-the-art methods.
arXiv Detail & Related papers (2023-02-21T04:47:34Z) - Adaptive Discretization using Voronoi Trees for Continuous-Action POMDPs [7.713622698801596]
We propose a new sampling-based online POMDP solver, called Adaptive Discretization using Voronoi Trees (ADVT)
ADVT uses Monte Carlo Tree Search in combination with an adaptive discretization of the action space as well as optimistic optimization.
Experiments on simulations of four types of benchmark problems indicate that ADVT outperforms and scales substantially better to high-dimensional continuous action spaces.
arXiv Detail & Related papers (2022-09-13T05:04:49Z) - Hierarchical Compositional Representations for Few-shot Action
Recognition [51.288829293306335]
We propose a novel hierarchical compositional representations (HCR) learning approach for few-shot action recognition.
We divide a complicated action into several sub-actions by carefully designed hierarchical clustering.
We also adopt the Earth Mover's Distance in the transportation problem to measure the similarity between video samples in terms of sub-action representations.
arXiv Detail & Related papers (2022-08-19T16:16:59Z) - Causal Dynamics Learning for Task-Independent State Abstraction [61.707048209272884]
We introduce Causal Dynamics Learning for Task-Independent State Abstraction (CDL)
CDL learns a theoretically proved causal dynamics model that removes unnecessary dependencies between state variables and the action.
A state abstraction can then be derived from the learned dynamics.
arXiv Detail & Related papers (2022-06-27T17:02:53Z) - Modeling Multi-Label Action Dependencies for Temporal Action
Localization [53.53490517832068]
Real-world videos contain many complex actions with inherent relationships between action classes.
We propose an attention-based architecture that models these action relationships for the task of temporal action localization in unoccurrence videos.
We show improved performance over state-of-the-art methods on multi-label action localization benchmarks.
arXiv Detail & Related papers (2021-03-04T13:37:28Z) - Model-Invariant State Abstractions for Model-Based Reinforcement
Learning [54.616645151708994]
We introduce a new type of state abstraction called textitmodel-invariance.
This allows for generalization to novel combinations of unseen values of state variables.
We prove that an optimal policy can be learned over this model-invariance state abstraction.
arXiv Detail & Related papers (2021-02-19T10:37:54Z) - Improved POMDP Tree Search Planning with Prioritized Action Branching [33.94599291823342]
This paper proposes a method called PA-POMCPOW to sample a subset of the action space that provides varying mixtures of exploitation and exploration for inclusion in a search tree.
Experiments show that PA-POMCPOW is able to outperform existing state-of-the-art solvers on problems with large discrete action spaces.
arXiv Detail & Related papers (2020-10-07T18:33:57Z) - Bayesian Optimized Monte Carlo Planning [34.8909579244631]
Monte Carlo tree search with progressive widening attempts to improve scaling by sampling from the action space to construct a policy search tree.
We present a general method for efficient action sampling based on Bayesian optimization.
We implement the proposed approach in a new online tree search algorithm called Bayesian Optimized Monte Carlo Planning.
arXiv Detail & Related papers (2020-10-07T18:29:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.