Related papers: Grouping Nodes With Known Value Differences: A Lossless UCT-based Abstraction Algorithm

Grouping Nodes With Known Value Differences: A Lossless UCT-based Abstraction Algorithm

URL: http://arxiv.org/abs/2510.25388v1
Date: Wed, 29 Oct 2025 11:03:44 GMT
Title: Grouping Nodes With Known Value Differences: A Lossless UCT-based Abstraction Algorithm
Authors: Robin Schmöcker, Alexander Dockhorn, Bodo Rosenhahn,
Abstract summary: A core challenge of Monte Carlo Tree Search (MCTS) is its sample efficiency, which can be improved by grouping state-action pairs.<n>We break with the paradigm of grouping value-equivalent states or state-action pairs and instead group states and state-action pairs with possibly different values.<n>We call this abstraction framework Known Value Difference Abstractions ( KVDA), which infers the value differences by analysis of the immediate rewards.
Score: 64.43268969806098
License: http://creativecommons.org/licenses/by/4.0/
Abstract: A core challenge of Monte Carlo Tree Search (MCTS) is its sample efficiency, which can be improved by grouping state-action pairs and using their aggregate statistics instead of single-node statistics. On the Go Abstractions in Upper Confidence bounds applied to Trees (OGA-UCT) is the state-of-the-art MCTS abstraction algorithm for deterministic environments that builds its abstraction using the Abstractions of State-Action Pairs (ASAP) framework, which aims to detect states and state-action pairs with the same value under optimal play by analysing the search graph. ASAP, however, requires two state-action pairs to have the same immediate reward, which is a rigid condition that limits the number of abstractions that can be found and thereby the sample efficiency. In this paper, we break with the paradigm of grouping value-equivalent states or state-action pairs and instead group states and state-action pairs with possibly different values as long as the difference between their values can be inferred. We call this abstraction framework Known Value Difference Abstractions (KVDA), which infers the value differences by analysis of the immediate rewards and modifies OGA-UCT to use this framework instead. The modification is called KVDA-UCT, which detects significantly more abstractions than OGA-UCT, introduces no additional parameter, and outperforms OGA-UCT on a variety of deterministic environments and parameter settings.

Related papers

Discovering State Equivalences in UCT Search Trees By Action Pruning [64.43268969806098]
We show that Ideal Pruning Abstractions in UCT (IPA-UCT) outperforms OGA-UCT across a large range of test domains and iteration budgets.<n>We also show that both IPA and ASAP are special cases of a more general framework that we call p-ASAP.
arXiv Detail & Related papers (2025-10-30T10:54:43Z)
Investigating Intra-Abstraction Policies For Non-exact Abstraction Algorithms [64.43268969806098]
One weakness of Monte Carlo Tree Search (MCTS) is its sample efficiency.<n>We propose and empirically evaluate several alternative intra-abstraction policies.
arXiv Detail & Related papers (2025-10-28T11:00:30Z)
AUPO - Abstracted Until Proven Otherwise: A Reward Distribution Based Abstraction Algorithm [64.43268969806098]
We introduce a novel, drop-in modification to Monte Carlo Tree Search's (MCTS) decision policy that we call AUPO.<n> Comparisons based on a range of IPPC benchmark problems show that AUPO clearly outperforms MCTS.
arXiv Detail & Related papers (2025-10-27T11:04:22Z)
Time-critical and confidence-based abstraction dropping methods [44.99833362998488]
Non-exact abstractions make convergence to the optimal action in the abstract space impossible.<n>We propose two novel abstraction dropping schemes, namely OGA-IAAD and OGA-CAD.
arXiv Detail & Related papers (2025-07-03T15:12:05Z)
Hierarchical State Abstraction Based on Structural Information Principles [70.24495170921075]
We propose a novel mathematical Structural Information principles-based State Abstraction framework, namely SISA, from the information-theoretic perspective. SISA is a general framework that can be flexibly integrated with different representation-learning objectives to improve their performances further.
arXiv Detail & Related papers (2023-04-24T11:06:52Z)
ISTR: End-to-End Instance Segmentation with Transformers [147.14073165997846]
We propose an instance segmentation Transformer, termed ISTR, which is the first end-to-end framework of its kind. ISTR predicts low-dimensional mask embeddings, and matches them with ground truth mask embeddings for the set loss. Benefiting from the proposed end-to-end mechanism, ISTR demonstrates state-of-the-art performance even with approximation-based suboptimal embeddings.
arXiv Detail & Related papers (2021-05-03T06:00:09Z)
Randomized Value Functions via Posterior State-Abstraction Sampling [21.931580762349096]
We argue that an agent seeking out latent task structure must explicitly represent and maintain its uncertainty over that structure. We introduce a practical algorithm for doing this using two posterior distributions over state abstractions and abstract-state values. In empirically validating our approach, we find that substantial performance gains lie in the multi-task setting.
arXiv Detail & Related papers (2020-10-05T23:04:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.