Related papers: ReZero: Boosting MCTS-based Algorithms by Backward-view and Entire-buffer Reanalyze

ReZero: Boosting MCTS-based Algorithms by Backward-view and Entire-buffer Reanalyze

URL: http://arxiv.org/abs/2404.16364v4
Date: Mon, 26 Aug 2024 02:28:14 GMT
Title: ReZero: Boosting MCTS-based Algorithms by Backward-view and Entire-buffer Reanalyze
Authors: Chunyu Xuan, Yazhe Niu, Yuan Pu, Shuai Hu, Yu Liu, Jing Yang,
Abstract summary: We propose a general approach named ReZero to boost tree search operations for Monte Carlo Tree Search algorithms. Specifically, we reanalyze training samples through a backward-view reuse technique which obtains the value estimation of a certain child node in advance. Experiments conducted on Atari environments and board games demonstrate that ReZero substantially improves training speed while maintaining high sample efficiency.
Score: 5.671696366787522
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Monte Carlo Tree Search (MCTS)-based algorithms, such as MuZero and its derivatives, have achieved widespread success in various decision-making domains. These algorithms employ the reanalyze process to enhance sample efficiency from stale data, albeit at the expense of significant wall-clock time consumption. To address this issue, we propose a general approach named ReZero to boost tree search operations for MCTS-based algorithms. Specifically, drawing inspiration from the one-armed bandit model, we reanalyze training samples through a backward-view reuse technique which obtains the value estimation of a certain child node in advance. To further adapt to this design, we periodically reanalyze the entire buffer instead of frequently reanalyzing the mini-batch. The synergy of these two designs can significantly reduce the search cost and meanwhile guarantee or even improve performance, simplifying both data collecting and reanalyzing. Experiments conducted on Atari environments and board games demonstrate that ReZero substantially improves training speed while maintaining high sample efficiency. The code is available as part of the LightZero benchmark at https://github.com/opendilab/LightZero.

Related papers

Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning [60.67176246634741]
We formalize the problem of optimizing test-time compute as a meta-reinforcement learning (RL) problem. We show that state-of-the-art models do not minimize regret, but one can do so by maximizing a dense reward bonus in conjunction with the outcome 0/1 reward RL.
arXiv Detail & Related papers (2025-03-10T17:40:43Z)
Sampling-Efficient Test-Time Scaling: Self-Estimating the Best-of-N Sampling in Early Decoding [64.2888389315149]
Test-time scaling improves large language model performance by adding extra compute during decoding. Best-of-N sampling serves as a common scaling technique, broadening the search space for finding better solutions. We propose Self-Truncation Best-of-N (ST-BoN), a novel decoding method that avoids fully generating all samplings.
arXiv Detail & Related papers (2025-03-03T11:21:01Z)
Optimizing Tensor Computation Graphs with Equality Saturation and Monte Carlo Tree Search [0.0]
We present a tensor graph rewriting approach that uses Monte Carlo tree search to build superior representation. Our approach improves the inference speedup of neural networks by up to 11% compared to existing methods.
arXiv Detail & Related papers (2024-10-07T22:22:02Z)
Efficient NeRF Optimization -- Not All Samples Remain Equally Hard [9.404889815088161]
We propose an application of online hard sample mining for efficient training of Neural Radiance Fields (NeRF) NeRF models produce state-of-the-art quality for many 3D reconstruction and rendering tasks but require substantial computational resources.
arXiv Detail & Related papers (2024-08-06T13:49:01Z)
Cascade Reward Sampling for Efficient Decoding-Time Alignment [17.278488115500615]
We introduce Cascade Reward Sampling (CARDS) to resolve both efficiency in decoding-time alignment. CARDS minimizes redundant computations of both large language models (LLMs) and reward models (RMs)
arXiv Detail & Related papers (2024-06-24T04:08:35Z)
Posterior Sampling with Delayed Feedback for Reinforcement Learning with Linear Function Approximation [62.969796245827006]
Delayed-PSVI is an optimistic value-based algorithm that explores the value function space via noise perturbation with posterior sampling. We show our algorithm achieves $widetildeO(sqrtd3H3 T + d2H2 E[tau]$ worst-case regret in the presence of unknown delays. We incorporate a gradient-based approximate sampling scheme via Langevin dynamics for Delayed-LPSVI.
arXiv Detail & Related papers (2023-10-29T06:12:43Z)
RLSAC: Reinforcement Learning enhanced Sample Consensus for End-to-End Robust Estimation [74.47709320443998]
We propose RLSAC, a novel Reinforcement Learning enhanced SAmple Consensus framework for end-to-end robust estimation. RLSAC employs a graph neural network to utilize both data and memory features to guide exploring directions for sampling the next minimum set. Our experimental results demonstrate that RLSAC can learn from features to gradually explore a better hypothesis.
arXiv Detail & Related papers (2023-08-10T03:14:19Z)
ReBotNet: Fast Real-time Video Enhancement [59.08038313427057]
Most restoration networks are slow, have high computational bottleneck, and can't be used for real-time video enhancement. In this work, we design an efficient and fast framework to perform real-time enhancement for practical use-cases like live video calls and video streams. To evaluate our method, we emulate two new datasets that real-world video call and streaming scenarios, and show extensive results on multiple datasets where ReBotNet outperforms existing approaches with lower computations, reduced memory requirements, and faster inference time.
arXiv Detail & Related papers (2023-03-23T17:58:05Z)
A Faster, Lighter and Stronger Deep Learning-Based Approach for Place Recognition [7.9400442516053475]
We propose a faster, lighter and stronger approach that can generate models with fewer parameters and can spend less time in the inference stage. We design RepVGG-lite as the backbone network in our architecture, it is more discriminative than other general networks in the Place Recognition task. Our system has 14 times less params than Patch-NetVLAD, 6.8 times lower theoretical FLOPs, and run faster 21 and 33 times in feature extraction and feature matching.
arXiv Detail & Related papers (2022-11-27T15:46:53Z)
Boosting Tail Neural Network for Realtime Custom Keyword Spotting [2.5137859989323537]
We propose a Boosting Tail Neural Network (BTNN) for improving the performance of Realtime Custom Keyword Spotting (RCKS) Inspired by Brain Science that a brain is only partly activated for a nerve simulation, numerous machine learning algorithms are developed to use a batch of weak classifiers to resolve arduous problems.
arXiv Detail & Related papers (2022-05-24T13:26:39Z)
Mastering Atari Games with Limited Data [73.6189496825209]
We propose a sample efficient model-based visual RL algorithm built on MuZero, which we name EfficientZero. Our method achieves 190.4% mean human performance on the Atari 100k benchmark with only two hours of real-time game experience. This is the first time an algorithm achieves super-human performance on Atari games with such little data.
arXiv Detail & Related papers (2021-10-30T09:13:39Z)
Parallel Actors and Learners: A Framework for Generating Scalable RL Implementations [14.432131909590824]
Reinforcement Learning (RL) has achieved significant success in application domains such as robotics, games, health care and others. Current implementations exhibit poor performance due to challenges such as irregular memory accesses and synchronization overheads. We propose a framework for generating scalable reinforcement learning implementations on multicore systems.
arXiv Detail & Related papers (2021-10-03T21:00:53Z)
Circa: Stochastic ReLUs for Private Deep Learning [6.538025863698682]
We re-think the ReLU computation and propose optimizations for PI tailored to neural networks. Specifically, we reformulate ReLU as an approximate sign test and introduce a novel truncation method for the sign test. We demonstrate improvements of up to 4.7x storage and 3x runtime over baseline implementations.
arXiv Detail & Related papers (2021-06-15T22:52:45Z)
FNAS: Uncertainty-Aware Fast Neural Architecture Search [54.49650267859032]
Reinforcement learning (RL)-based neural architecture search (NAS) generally guarantees better convergence yet suffers from the requirement of huge computational resources. We propose a general pipeline to accelerate the convergence of the rollout process as well as the RL process in NAS. Experiments on the Mobile Neural Architecture Search (MNAS) search space show the proposed Fast Neural Architecture Search (FNAS) accelerates standard RL-based NAS process by 10x.
arXiv Detail & Related papers (2021-05-25T06:32:52Z)
Combined Depth Space based Architecture Search For Person Re-identification [70.86236888223569]
We aim to design a lightweight and suitable network for person re-identification (ReID) We propose a novel search space called Combined Depth Space (CDS), based on which we search for an efficient network architecture, which we call CDNet. We then propose a low-cost search strategy named the Top-k Sample Search strategy to make full use of the search space and avoid trapping in local optimal result.
arXiv Detail & Related papers (2021-04-09T02:40:01Z)
Top-k Training of GANs: Improving GAN Performance by Throwing Away Bad Samples [67.11669996924671]
We introduce a simple (one line of code) modification to the Generative Adversarial Network (GAN) training algorithm. When updating the generator parameters, we zero out the gradient contributions from the elements of the batch that the critic scores as least realistic' We show that this top-k update' procedure is a generally applicable improvement.
arXiv Detail & Related papers (2020-02-14T19:27:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.