Related papers: Lipschitz Lifelong Monte Carlo Tree Search for Mastering Non-Stationary Tasks

Lipschitz Lifelong Monte Carlo Tree Search for Mastering Non-Stationary Tasks

URL: http://arxiv.org/abs/2502.00633v1
Date: Sun, 02 Feb 2025 02:45:20 GMT
Title: Lipschitz Lifelong Monte Carlo Tree Search for Mastering Non-Stationary Tasks
Authors: Zuyuan Zhang, Tian Lan,
Abstract summary: This paper presents LiZero for Lipschitz lifelong planning using Monte Carlo Tree Search (MCTS)<n>We propose a novel concept of adaptive UCT (aUCT) to transfer knowledge from a source task to the exploration/exploitation of a new task.<n>Experiment results show that LiZero significantly outperforms existing MCTS and lifelong learning baselines in terms of much faster convergence to optimal rewards.
Score: 19.42056439537988
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Monte Carlo Tree Search (MCTS) has proven highly effective in solving complex planning tasks by balancing exploration and exploitation using Upper Confidence Bound for Trees (UCT). However, existing work have not considered MCTS-based lifelong planning, where an agent faces a non-stationary series of tasks -- e.g., with varying transition probabilities and rewards -- that are drawn sequentially throughout the operational lifetime. This paper presents LiZero for Lipschitz lifelong planning using MCTS. We propose a novel concept of adaptive UCT (aUCT) to transfer knowledge from a source task to the exploration/exploitation of a new task, depending on both the Lipschitz continuity between tasks and the confidence of knowledge in in Monte Carlo action sampling. We analyze LiZero's acceleration factor in terms of improved sampling efficiency and also develop efficient algorithms to compute aUCT in an online fashion by both data-driven and model-based approaches, whose sampling complexity and error bounds are also characterized. Experiment results show that LiZero significantly outperforms existing MCTS and lifelong learning baselines in terms of much faster convergence (3$\sim$4x) to optimal rewards. Our results highlight the potential of LiZero to advance decision-making and planning in dynamic real-world environments.

Related papers

MASTER: A Multi-Agent System with LLM Specialized MCTS [11.780059513577848]
Large Language Models (LLM) are increasingly being explored for problem-solving tasks.<n>MCTS relies on extensive sampling simulations to approximate the true reward distribution.<n>We present a novel framework that coordinates agent recruitment and communication through LLM specialized MCTS.
arXiv Detail & Related papers (2025-01-24T08:01:11Z)
VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment [66.80143024475635]
We propose VinePPO, a straightforward approach to compute unbiased Monte Carlo-based estimates. We show that VinePPO consistently outperforms PPO and other RL-free baselines across MATH and GSM8K datasets.
arXiv Detail & Related papers (2024-10-02T15:49:30Z)
Optimized Monte Carlo Tree Search for Enhanced Decision Making in the FrozenLake Environment [0.0]
Monte Carlo Tree Search (MCTS) is a powerful algorithm for solving complex decision-making problems. This paper presents an optimized MCTS implementation applied to the FrozenLake environment, a classic reinforcement learning task.
arXiv Detail & Related papers (2024-09-25T05:04:53Z)
Interpreting and Improving Large Language Models in Arithmetic Calculation [72.19753146621429]
Large language models (LLMs) have demonstrated remarkable potential across numerous applications. In this work, we delve into uncovering a specific mechanism by which LLMs execute calculations. We investigate the potential benefits of selectively fine-tuning these essential heads/MLPs to boost the LLMs' computational performance.
arXiv Detail & Related papers (2024-09-03T07:01:46Z)
Accessing GPT-4 level Mathematical Olympiad Solutions via Monte Carlo Tree Self-refine with LLaMa-3 8B [48.45472563225202]
This paper introduces the MCT Self-Refine (MCTSr) algorithm, an innovative integration of Large Language Models (LLMs) with Monte Carlo Tree Search (MCTS) The algorithm constructs a Monte Carlo search tree through iterative processes of Selection, self-refine, self-evaluation, and Backpropagation. Extensive experiments demonstrate MCTSr's efficacy in solving Olympiad-level mathematical problems.
arXiv Detail & Related papers (2024-06-11T16:01:07Z)
Maximize to Explore: One Objective Function Fusing Estimation, Planning, and Exploration [87.53543137162488]
We propose an easy-to-implement online reinforcement learning (online RL) framework called textttMEX. textttMEX integrates estimation and planning components while balancing exploration exploitation automatically. It can outperform baselines by a stable margin in various MuJoCo environments with sparse rewards.
arXiv Detail & Related papers (2023-05-29T17:25:26Z)
MARLIN: Soft Actor-Critic based Reinforcement Learning for Congestion Control in Real Networks [63.24965775030673]
We propose a novel Reinforcement Learning (RL) approach to design generic Congestion Control (CC) algorithms. Our solution, MARLIN, uses the Soft Actor-Critic algorithm to maximize both entropy and return. We trained MARLIN on a real network with varying background traffic patterns to overcome the sim-to-real mismatch.
arXiv Detail & Related papers (2023-02-02T18:27:20Z)
Learn-Prune-Share for Lifelong Learning [25.678753894026357]
We propose a learn-prune-share (LPS) algorithm which addresses the challenges of catastrophic forgetting, parsimony, and knowledge reuse simultaneously. LPS splits the network into task-specific partitions via an ADMM-based pruning strategy. This leads to no forgetting, while maintaining parsimony.
arXiv Detail & Related papers (2020-12-13T04:05:16Z)
Lipschitz Lifelong Reinforcement Learning [40.36085483977208]
We consider the problem of knowledge transfer when an agent is facing a series of Reinforcement Learning (RL) tasks. We introduce a novel metric between Markov Decision Processes (MDPs) and establish that close MDPs have close optimal value functions. These theoretical results lead us to a value-transfer method for Lifelong RL, which we use to build a PAC-MDP algorithm with improved convergence rate.
arXiv Detail & Related papers (2020-01-15T16:29:30Z)
Hierarchical Reinforcement Learning as a Model of Human Task Interleaving [60.95424607008241]
We develop a hierarchical model of supervisory control driven by reinforcement learning. The model reproduces known empirical effects of task interleaving. The results support hierarchical RL as a plausible model of task interleaving.
arXiv Detail & Related papers (2020-01-04T17:53:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.