LightZero: A Unified Benchmark for Monte Carlo Tree Search in General
Sequential Decision Scenarios
- URL: http://arxiv.org/abs/2310.08348v1
- Date: Thu, 12 Oct 2023 14:18:09 GMT
- Title: LightZero: A Unified Benchmark for Monte Carlo Tree Search in General
Sequential Decision Scenarios
- Authors: Yazhe Niu, Yuan Pu, Zhenjie Yang, Xueyan Li, Tong Zhou, Jiyuan Ren,
Shuai Hu, Hongsheng Li, Yu Liu
- Abstract summary: Building agents based on tree-search planning capabilities with learned models has achieved remarkable success in classic decision-making problems, such as Go and Atari.
It has been deemed challenging or even infeasible to extend Monte Carlo Tree Search (MCTS) based algorithms to diverse real-world applications.
In this work, we introduce LightZero, the first unified benchmark for deploying MCTS/MuZero in general sequential decision scenarios.
- Score: 32.83545787965431
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Building agents based on tree-search planning capabilities with learned
models has achieved remarkable success in classic decision-making problems,
such as Go and Atari. However, it has been deemed challenging or even
infeasible to extend Monte Carlo Tree Search (MCTS) based algorithms to diverse
real-world applications, especially when these environments involve complex
action spaces and significant simulation costs, or inherent stochasticity. In
this work, we introduce LightZero, the first unified benchmark for deploying
MCTS/MuZero in general sequential decision scenarios. Specificially, we
summarize the most critical challenges in designing a general MCTS-style
decision-making solver, then decompose the tightly-coupled algorithm and system
design of tree-search RL methods into distinct sub-modules. By incorporating
more appropriate exploration and optimization strategies, we can significantly
enhance these sub-modules and construct powerful LightZero agents to tackle
tasks across a wide range of domains, such as board games, Atari, MuJoCo,
MiniGrid and GoBigger. Detailed benchmark results reveal the significant
potential of such methods in building scalable and efficient decision
intelligence. The code is available as part of OpenDILab at
https://github.com/opendilab/LightZero.
Related papers
- Multi-Agent Environments for Vehicle Routing Problems [1.0179489519625304]
We propose a library composed of multi-agent environments that simulates classic vehicle routing problems.
The library, built on PyTorch, provides a flexible modular architecture design that allows easy customization and incorporation of new routing problems.
arXiv Detail & Related papers (2024-11-21T18:46:23Z) - Technical Report: Enhancing LLM Reasoning with Reward-guided Tree Search [95.06503095273395]
o1-like reasoning approach is challenging, and researchers have been making various attempts to advance this open area of research.
We present a preliminary exploration into enhancing the reasoning abilities of LLMs through reward-guided tree search algorithms.
arXiv Detail & Related papers (2024-11-18T16:15:17Z) - CodeTree: Agent-guided Tree Search for Code Generation with Large Language Models [106.11371409170818]
Large language models (LLMs) can act as agents with capabilities to self-refine and improve generated code autonomously.
We propose CodeTree, a framework for LLM agents to efficiently explore the search space in different stages of the code generation process.
Specifically, we adopted a unified tree structure to explicitly explore different coding strategies, generate corresponding coding solutions, and subsequently refine the solutions.
arXiv Detail & Related papers (2024-11-07T00:09:54Z) - LLaMA-Berry: Pairwise Optimization for O1-like Olympiad-Level Mathematical Reasoning [56.273799410256075]
The framework combines Monte Carlo Tree Search (MCTS) with iterative Self-Refine to optimize the reasoning path.
The framework has been tested on general and advanced benchmarks, showing superior performance in terms of search efficiency and problem-solving capability.
arXiv Detail & Related papers (2024-10-03T18:12:29Z) - An Evolutionary Framework for Connect-4 as Test-Bed for Comparison of Advanced Minimax, Q-Learning and MCTS [0.0]
This paper develops a novel evolutionary framework to evaluate three classes of algorithms: RL, Minimax and Monte Carlo tree search (MCTS)
We show that MCTS achieves the best results in terms of win percentage, whereas Minimax and Q-Learning are ranked in second and third place, respectively.
arXiv Detail & Related papers (2024-05-26T15:11:45Z) - Efficient Multi-agent Reinforcement Learning by Planning [33.51282615335009]
Multi-agent reinforcement learning (MARL) algorithms have accomplished remarkable breakthroughs in solving large-scale decision-making tasks.
Most existing MARL algorithms are model-free, limiting sample efficiency and hindering their applicability in more challenging scenarios.
We propose the MAZero algorithm, which combines a centralized model with Monte Carlo Tree Search (MCTS) for policy search.
arXiv Detail & Related papers (2024-05-20T04:36:02Z) - ArchGym: An Open-Source Gymnasium for Machine Learning Assisted
Architecture Design [52.57999109204569]
ArchGym is an open-source framework that connects diverse search algorithms to architecture simulators.
We evaluate ArchGym across multiple vanilla and domain-specific search algorithms in designing custom memory controller, deep neural network accelerators, and custom SOC for AR/VR workloads.
arXiv Detail & Related papers (2023-06-15T06:41:23Z) - Maximize to Explore: One Objective Function Fusing Estimation, Planning,
and Exploration [87.53543137162488]
We propose an easy-to-implement online reinforcement learning (online RL) framework called textttMEX.
textttMEX integrates estimation and planning components while balancing exploration exploitation automatically.
It can outperform baselines by a stable margin in various MuJoCo environments with sparse rewards.
arXiv Detail & Related papers (2023-05-29T17:25:26Z) - Faithful Question Answering with Monte-Carlo Planning [78.02429369951363]
We propose FAME (FAithful question answering with MontE-carlo planning) to answer questions based on faithful reasoning steps.
We formulate the task as a discrete decision-making problem and solve it through the interaction of a reasoning environment and a controller.
FAME achieves state-of-the-art performance on the standard benchmark.
arXiv Detail & Related papers (2023-05-04T05:21:36Z) - Monte Carlo Tree Search: A Review of Recent Modifications and
Applications [0.17205106391379024]
Monte Carlo Tree Search (MCTS) is a powerful approach to designing game-playing bots or solving sequential decision problems.
The method relies on intelligent tree search that balances exploration and exploitation.
The method has become a state-of-the-art technique for games, however, in more complex games.
arXiv Detail & Related papers (2021-03-08T17:44:15Z) - Unlucky Explorer: A Complete non-Overlapping Map Exploration [0.949996206597248]
We introduce the Maze Dash puzzle as an exploration problem where the agent must find a Hamiltonian Path visiting all the cells.
An optimization has been applied to the proposed Monte-Carlo Tree Search (MCTS) algorithm to obtain a promising result.
Our comparison indicates that the MCTS-based approach is an up-and-coming method that could cope with the test cases with small and medium sizes.
arXiv Detail & Related papers (2020-05-28T17:19:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.