TransZero: Parallel Tree Expansion in MuZero using Transformer Networks
- URL: http://arxiv.org/abs/2509.11233v1
- Date: Sun, 14 Sep 2025 12:20:38 GMT
- Title: TransZero: Parallel Tree Expansion in MuZero using Transformer Networks
- Authors: Emil Malmsten, Wendelin Böhmer,
- Abstract summary: We present TransZero, a model-based reinforcement learning algorithm that removes the sequential bottleneck in Monte Carlo Tree Search.<n>We show that TransZero achieves up to an eleven-fold speedup in wall-clock time compared to MuZero.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present TransZero, a model-based reinforcement learning algorithm that removes the sequential bottleneck in Monte Carlo Tree Search (MCTS). Unlike MuZero, which constructs its search tree step by step using a recurrent dynamics model, TransZero employs a transformer-based network to generate multiple latent future states simultaneously. Combined with the Mean-Variance Constrained (MVC) evaluator that eliminates dependence on inherently sequential visitation counts, our approach enables the parallel expansion of entire subtrees during planning. Experiments in MiniGrid and LunarLander show that TransZero achieves up to an eleven-fold speedup in wall-clock time compared to MuZero while maintaining sample efficiency. These results demonstrate that parallel tree construction can substantially accelerate model-based reinforcement learning, bringing real-time decision-making in complex environments closer to practice. The code is publicly available on GitHub.
Related papers
- DAG Learning from Zero-Inflated Count Data Using Continuous Optimization [2.0443308797642965]
ZICO achieves superior performance with faster runtimes on simulated data.<n>ZICO is fully vectorized and mini-batched, enabling learning on larger variable sets with practical runtimes in a wide range of domains.
arXiv Detail & Related papers (2025-12-18T06:26:43Z) - Trajectory-aware Shifted State Space Models for Online Video Super-Resolution [57.87099307245989]
This paper presents a novel online VSR method based on Trajectory-aware Shifted SSMs (TS-Mamba)<n>TS-Mamba first constructs the trajectories within a video to select the most similar tokens from the previous frames.<n>Our TS-Mamba achieves state-of-the-art performance in most cases and over 22.7% reduction complexity (in MACs)
arXiv Detail & Related papers (2025-08-14T08:42:15Z) - MesaNet: Sequence Modeling by Locally Optimal Test-Time Training [67.45211108321203]
We introduce a numerically stable, chunkwise parallelizable version of the recently proposed Mesa layer.<n>We show that optimal test-time training enables reaching lower language modeling perplexity and higher downstream benchmark performance than previous RNNs.
arXiv Detail & Related papers (2025-06-05T16:50:23Z) - DeMo: Decoupled Momentum Optimization [6.169574689318864]
Training large neural networks typically requires sharing between accelerators through specialized high-speed interconnects.<n>We introduce bfDecoupled textbfMomentum (DeMo), a fused magnitude and data parallel algorithm that reduces inter-accelerator communication requirements.<n> Empirical results show that models trained with DeMo match or exceed the performance of equivalent models trained with AdamW.
arXiv Detail & Related papers (2024-11-29T17:31:47Z) - NAVSIM: Data-Driven Non-Reactive Autonomous Vehicle Simulation and Benchmarking [65.24988062003096]
We present NAVSIM, a framework for benchmarking vision-based driving policies.
Our simulation is non-reactive, i.e., the evaluated policy and environment do not influence each other.
NAVSIM enabled a new competition held at CVPR 2024, where 143 teams submitted 463 entries, resulting in several new insights.
arXiv Detail & Related papers (2024-06-21T17:59:02Z) - UniZero: Generalized and Efficient Planning with Scalable Latent World Models [29.648382211926364]
UniZero is a novel approach that employs a modular transformer-based world model to effectively learn a shared latent space.<n>We show that UniZero significantly outperforms existing baselines in benchmarks that require long-term memory.<n>In standard single-task RL settings, such as Atari and DMControl, UniZero matches or even surpasses the performance of current state-of-the-art methods.
arXiv Detail & Related papers (2024-06-15T15:24:15Z) - ReZero: Boosting MCTS-based Algorithms by Backward-view and Entire-buffer Reanalyze [5.671696366787522]
We propose a general approach named ReZero to boost tree search operations for Monte Carlo Tree Search (MCTS) algorithms.<n>Specifically, we reanalyze training samples through a backward-view reuse technique which uses the value estimation of a certain child node to save the corresponding sub-tree search time.<n>Experiments conducted on Atari environments, DMControl suites and board games demonstrate that ReZero substantially improves training speed while maintaining high sample efficiency.
arXiv Detail & Related papers (2024-04-25T07:02:07Z) - Is Mamba Effective for Time Series Forecasting? [30.85990093479062]
We propose a Mamba-based model named Simple-Mamba (S-Mamba) for time series forecasting.
Specifically, we tokenize the time points of each variate autonomously via a linear layer.
Experiments on thirteen public datasets prove that S-Mamba maintains low computational overhead and achieves leading performance.
arXiv Detail & Related papers (2024-03-17T08:50:44Z) - Improving Token-Based World Models with Parallel Observation Prediction [55.41770427527391]
token-based world models (TBWMs) were recently proposed as sample-efficient methods.
During imagination, the sequential token-by-token generation of next observations results in a severe bottleneck.
We devise a novel Parallel Observation Prediction (POP) mechanism to resolve this bottleneck.
POP augments a Retentive Network (RetNet) with a novel forward mode tailored to our reinforcement learning setting.
arXiv Detail & Related papers (2024-02-08T12:58:07Z) - Convolutional State Space Models for Long-Range Spatiotemporal Modeling [65.0993000439043]
ConvS5 is an efficient variant for long-rangetemporal modeling.
It significantly outperforms Transformers and ConvNISTTM on a long horizon Moving-Lab experiment while training 3X faster than ConvLSTM and generating samples 400X faster than Transformers.
arXiv Detail & Related papers (2023-10-30T16:11:06Z) - Multi-Temporal Convolutions for Human Action Recognition in Videos [83.43682368129072]
We present a novel temporal-temporal convolution block that is capable of extracting at multiple resolutions.
The proposed blocks are lightweight and can be integrated into any 3D-CNN architecture.
arXiv Detail & Related papers (2020-11-08T10:40:26Z) - Continuous-Time Bayesian Networks with Clocks [33.774970857450086]
We introduce a set of node-wise clocks to construct a collection of graph-coupled semi-Markov chains.
We provide algorithms for parameter and structure inference, which make use of local dependencies.
arXiv Detail & Related papers (2020-07-01T09:33:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.