Related papers: Monte Carlo Tree Search for Comprehensive Exploration in LLM-Based Automatic Heuristic Design

Monte Carlo Tree Search for Comprehensive Exploration in LLM-Based Automatic Heuristic Design

URL: http://arxiv.org/abs/2501.08603v3
Date: Fri, 31 Jan 2025 05:28:15 GMT
Title: Monte Carlo Tree Search for Comprehensive Exploration in LLM-Based Automatic Heuristic Design
Authors: Zhi Zheng, Zhuoliang Xie, Zhenkun Wang, Bryan Hooi,
Abstract summary: Large Language Model (LLM)-based automatic design (AHD) methods have shown promise in generating high-quality designs without manual intervention.<n>This paper proposes to use Monte Carlo Tree Search (MCTS) for evolutionary evolution.
Score: 33.58608225370497
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Handcrafting heuristics for solving complex optimization tasks (e.g., route planning and task allocation) is a common practice but requires extensive domain knowledge. Recently, Large Language Model (LLM)-based automatic heuristic design (AHD) methods have shown promise in generating high-quality heuristics without manual interventions. Existing LLM-based AHD methods employ a population to maintain a fixed number of top-performing LLM-generated heuristics and introduce evolutionary computation (EC) to iteratively enhance the population. However, these population-based procedures cannot fully develop the potential of each heuristic and are prone to converge into local optima. To more comprehensively explore the space of heuristics, this paper proposes to use Monte Carlo Tree Search (MCTS) for LLM-based heuristic evolution. The proposed MCTS-AHD method organizes all LLM-generated heuristics in a tree structure and can better develop the potential of temporarily underperforming heuristics. In experiments, MCTS-AHD delivers significantly higher-quality heuristics on various complex tasks. Our code is available.

Related papers

Agentic Reinforced Policy Optimization [66.96989268893932]
Large-scale reinforcement learning with verifiable rewards (RLVR) has demonstrated its effectiveness in harnessing the potential of large language models (LLMs) for single-turn reasoning tasks.<n>Current RL algorithms inadequately balance the models' intrinsic long-horizon reasoning capabilities and their proficiency in multi-turn tool interactions.<n>We propose Agentic Reinforced Policy Optimization (ARPO), a novel agentic RL algorithm tailored for training multi-turn LLM-based agents.
arXiv Detail & Related papers (2025-07-26T07:53:11Z)
SELT: Self-Evaluation Tree Search for LLMs with Task Decomposition [5.5688696788198975]
We introduce SELT (Self-Evaluation LLM Tree Search), a novel framework to enhance LLM reasoning without relying on external reward models.<n>We validate our approach on challenging benchmarks, including the knowledge-based MMLU and the Tool Learning dataset Seal-Tools.
arXiv Detail & Related papers (2025-06-09T08:52:27Z)
Iterative Self-Incentivization Empowers Large Language Models as Agentic Searchers [74.17516978246152]
Large language models (LLMs) have been widely integrated into information retrieval to advance traditional techniques.<n>We propose EXSEARCH, an agentic search framework, where the LLM learns to retrieve useful information as the reasoning unfolds.<n>Experiments on four knowledge-intensive benchmarks show that EXSEARCH substantially outperforms baselines.
arXiv Detail & Related papers (2025-05-26T15:27:55Z)
Complex LLM Planning via Automated Heuristics Discovery [48.07520536415374]
We consider enhancing large language models (LLMs) for complex planning tasks. We propose automated inferences discovery (AutoHD), a novel approach that enables LLMs to explicitly generate functions to guide-time search. Our proposed method requires no additional model training or finetuning--and the explicit definition of functions generated by the LLMs provides interpretability and insights into the reasoning process.
arXiv Detail & Related papers (2025-02-26T16:52:31Z)
Planning of Heuristics: Strategic Planning on Large Language Models with Monte Carlo Tree Search for Automating Heuristic Optimization [7.755152930120769]
Planning of Heuristics (PoH) is an optimization method that integrates the self- reflection of LLMs with the Monte Carlo Tree Search (MCTS) PoH iteratively refines generated plannings by evaluating their performance and providing im provement suggestions.
arXiv Detail & Related papers (2025-02-17T04:35:01Z)
Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances LLM Reasoning via Autoregressive Search [57.28671084993782]
Large language models (LLMs) have demonstrated remarkable reasoning capabilities across diverse domains. Recent studies have shown that increasing test-time computation enhances LLMs' reasoning capabilities. We propose a two-stage training paradigm: 1) a small-scale format tuning stage to internalize the COAT reasoning format and 2) a large-scale self-improvement stage leveraging reinforcement learning.
arXiv Detail & Related papers (2025-02-04T17:26:58Z)
LLM-based Optimization of Compound AI Systems: A Survey [64.39860384538338]
In a compound AI system, components such as an LLM call, a retriever, a code interpreter, or tools are interconnected. Recent advancements enable end-to-end optimization of these parameters using an LLM. This paper presents a survey of the principles and emerging trends in LLM-based optimization of compound AI systems.
arXiv Detail & Related papers (2024-10-21T18:06:25Z)
On the Design and Analysis of LLM-Based Algorithms [74.7126776018275]
Large language models (LLMs) are used as sub-routines in algorithms. LLMs have achieved remarkable empirical success. Our proposed framework holds promise for advancing LLM-based algorithms.
arXiv Detail & Related papers (2024-07-20T07:39:07Z)
Q*: Improving Multi-step Reasoning for LLMs with Deliberative Planning [53.6472920229013]
Large Language Models (LLMs) have demonstrated impressive capability in many natural language tasks. LLMs are prone to produce errors, hallucinations and inconsistent statements when performing multi-step reasoning. We introduce Q*, a framework for guiding LLMs decoding process with deliberative planning.
arXiv Detail & Related papers (2024-06-20T13:08:09Z)
Meta Reasoning for Large Language Models [58.87183757029041]
We introduce Meta-Reasoning Prompting (MRP), a novel and efficient system prompting method for large language models (LLMs) MRP guides LLMs to dynamically select and apply different reasoning methods based on the specific requirements of each task. We evaluate the effectiveness of MRP through comprehensive benchmarks.
arXiv Detail & Related papers (2024-06-17T16:14:11Z)
Accessing GPT-4 level Mathematical Olympiad Solutions via Monte Carlo Tree Self-refine with LLaMa-3 8B [48.45472563225202]
This paper introduces the MCT Self-Refine (MCTSr) algorithm, an innovative integration of Large Language Models (LLMs) with Monte Carlo Tree Search (MCTS) The algorithm constructs a Monte Carlo search tree through iterative processes of Selection, self-refine, self-evaluation, and Backpropagation. Extensive experiments demonstrate MCTSr's efficacy in solving Olympiad-level mathematical problems.
arXiv Detail & Related papers (2024-06-11T16:01:07Z)
From Words to Actions: Unveiling the Theoretical Underpinnings of LLM-Driven Autonomous Systems [59.40480894948944]
Large language model (LLM) empowered agents are able to solve decision-making problems in the physical world. Under this model, the LLM Planner navigates a partially observable Markov decision process (POMDP) by iteratively generating language-based subgoals via prompting. We prove that the pretrained LLM Planner effectively performs Bayesian aggregated imitation learning (BAIL) through in-context learning.
arXiv Detail & Related papers (2024-05-30T09:42:54Z)
Extracting Heuristics from Large Language Models for Reward Shaping in Reinforcement Learning [28.077228879886402]
Reinforcement Learning (RL) suffers from sample inefficiency in reward domains, and the problem is further pronounced in case of transitions. To improve the sample efficiency, reward shaping is a well-studied approach to introduce intrinsic rewards that can help the RL agent converge to an optimal policy faster.
arXiv Detail & Related papers (2024-05-24T03:53:57Z)
ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL [80.10358123795946]
We develop a framework for building multi-turn RL algorithms for fine-tuning large language models. Our framework adopts a hierarchical RL approach and runs two RL algorithms in parallel. Empirically, we find that ArCHer significantly improves efficiency and performance on agent tasks.
arXiv Detail & Related papers (2024-02-29T18:45:56Z)
Entropy-Regularized Token-Level Policy Optimization for Language Agent Reinforcement [67.1393112206885]
Large Language Models (LLMs) have shown promise as intelligent agents in interactive decision-making tasks. We introduce Entropy-Regularized Token-level Policy Optimization (ETPO), an entropy-augmented RL method tailored for optimizing LLMs at the token level. We assess the effectiveness of ETPO within a simulated environment that models data science code generation as a series of multi-step interactive tasks.
arXiv Detail & Related papers (2024-02-09T07:45:26Z)
Evolution of Heuristics: Towards Efficient Automatic Algorithm Design Using Large Language Model [22.64392837434924]
EoH represents the ideas of thoughts in natural language, termed thoughts. They are translated into executable codes by Large Language Models (LLMs) EoH significantly outperforms widely-used human hand-crafted baseline algorithms for the online bin packing problem.
arXiv Detail & Related papers (2024-01-04T04:11:59Z)
Alphazero-like Tree-Search can Guide Large Language Model Decoding and Training [37.79247073276239]
Recent works like Tree-of-Thought (ToT) and Reasoning via Planning (RAP) aim to augment the reasoning capabilities of LLMs. We present an AlphaZero-like tree-search learning framework for LLMs (termed TS-LLM) We show how tree-search with a learned value function can guide LLM decoding.
arXiv Detail & Related papers (2023-09-29T12:20:19Z)
Learning to Schedule Heuristics for the Simultaneous Stochastic Optimization of Mining Complexes [2.538209532048867]
The proposed learn-to-perturb (L2P) hyper-heuristic is a multi-neighborhood simulated annealing algorithm. L2P is tested on several real-world mining complexes, with an emphasis on efficiency, robustness, and generalization capacity. Results show a reduction in the number of iterations by 30-50% and in the computational time by 30-45%.
arXiv Detail & Related papers (2022-02-25T18:20:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.