Related papers: Planning of Heuristics: Strategic Planning on Large Language Models with Monte Carlo Tree Search for Automating Heuristic Optimization

Planning of Heuristics: Strategic Planning on Large Language Models with Monte Carlo Tree Search for Automating Heuristic Optimization

URL: http://arxiv.org/abs/2502.11422v3
Date: Fri, 20 Jun 2025 07:14:59 GMT
Title: Planning of Heuristics: Strategic Planning on Large Language Models with Monte Carlo Tree Search for Automating Heuristic Optimization
Authors: Hui Wang, Xufeng Zhang, Chaoxu Mu,
Abstract summary: Planning of Heuristics(PoH) is an optimization method that integrates LLM self-reflection with Monte Carlo Tree Search.<n>PoH iteratively refines generated rewards by evaluating their performance and providing improvement suggestions.<n>In this paper, we apply PoH to solve the Traveling Salesman Problem and the Flow Shop Scheduling Problem.
Score: 7.755152930120769
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Heuristics have achieved great success in solving combinatorial optimization problems~(COPs). However, heuristics designed by humans require too much domain knowledge and testing time. Since Large Language Models~(LLMs) possess strong capabilities to understand and generate content with a knowledge base that covers various domains, they offer potential ways to automatically optimize heuristics. To this end, we propose Planning of Heuristics~(PoH), an optimization method that integrates LLM self-reflection with Monte Carlo Tree Search, a well-known planning algorithm. PoH iteratively refines generated heuristics by evaluating their performance and providing improvement suggestions. Our method enables to iteratively evaluate the generated heuristics~(states) and improve them based on the improvement suggestions~(actions) and evaluation results~(rewards), by effectively simulating future states to search for paths with higher rewards. In this paper, we apply PoH to solve the Traveling Salesman Problem and the Flow Shop Scheduling Problem. The experimental results show that PoH outperforms hand-crafted heuristics and other Automatic Heuristic Design methods based on LLMs, and achieves the state-of-the-art performance in automating heuristic optimization with LLMs to solve tested COPs, especially with large sizes.

Related papers

Preference Optimization for Combinatorial Optimization Problems [54.87466279363487]
Reinforcement Learning (RL) has emerged as a powerful tool for neural optimization, enabling models learns that solve complex problems without requiring expert knowledge.<n>Despite significant progress, existing RL approaches face challenges such as diminishing reward signals and inefficient exploration in vast action spaces.<n>We propose Preference Optimization, a novel method that transforms quantitative reward signals into qualitative preference signals via statistical comparison modeling.
arXiv Detail & Related papers (2025-05-13T16:47:00Z)
Leveraging Large Language Models to Develop Heuristics for Emerging Optimization Problems [0.0]
Combinatorial optimization problems often rely on algorithms to generate efficient solutions.<n>Recent advances in artificial intelligence have demonstrated the potential to automate generation through evolutionary frameworks.<n>We propose the Contextual Evolution of Heuristics framework, which incorporates problem-specific descriptions to enhance in-context learning.
arXiv Detail & Related papers (2025-03-05T10:22:49Z)
Complex LLM Planning via Automated Heuristics Discovery [48.07520536415374]
We consider enhancing large language models (LLMs) for complex planning tasks. We propose automated inferences discovery (AutoHD), a novel approach that enables LLMs to explicitly generate functions to guide-time search. Our proposed method requires no additional model training or finetuning--and the explicit definition of functions generated by the LLMs provides interpretability and insights into the reasoning process.
arXiv Detail & Related papers (2025-02-26T16:52:31Z)
Improving Existing Optimization Algorithms with LLMs [0.9668407688201361]
This paper investigates how Large Language Models (LLMs) can enhance existing optimization algorithms.<n>Using their pre-trained knowledge, we demonstrate their ability to propose innovative variations and implementation strategies.<n>Our results show that an alternative proposed by GPT-4o outperforms the expert-designed of CMSA.
arXiv Detail & Related papers (2025-02-12T10:58:57Z)
Monte Carlo Tree Search for Comprehensive Exploration in LLM-Based Automatic Heuristic Design [33.58608225370497]
Large Language Model (LLM)-based automatic design (AHD) methods have shown promise in generating high-quality designs without manual intervention.<n>This paper proposes to use Monte Carlo Tree Search (MCTS) for evolutionary evolution.
arXiv Detail & Related papers (2025-01-15T06:00:50Z)
Deep Insights into Automated Optimization with Large Language Models and Evolutionary Algorithms [3.833708891059351]
Large Language Models (LLMs) and Evolutionary Algorithms (EAs) offer promising new approach to overcome limitations and make optimization more automated. LLMs act as dynamic agents that can generate, refine, and interpret optimization strategies. EAs efficiently explore complex solution spaces through evolutionary operators.
arXiv Detail & Related papers (2024-10-28T09:04:49Z)
LLaMA-Berry: Pairwise Optimization for O1-like Olympiad-Level Mathematical Reasoning [56.273799410256075]
The framework combines Monte Carlo Tree Search (MCTS) with iterative Self-Refine to optimize the reasoning path. The framework has been tested on general and advanced benchmarks, showing superior performance in terms of search efficiency and problem-solving capability.
arXiv Detail & Related papers (2024-10-03T18:12:29Z)
Exploring and Benchmarking the Planning Capabilities of Large Language Models [57.23454975238014]
This work lays the foundations for improving planning capabilities of large language models (LLMs) We construct a comprehensive benchmark suite encompassing both classical planning benchmarks and natural language scenarios. We investigate the use of many-shot in-context learning to enhance LLM planning, exploring the relationship between increased context length and improved planning performance.
arXiv Detail & Related papers (2024-06-18T22:57:06Z)
Discovering Preference Optimization Algorithms with and for Large Language Models [50.843710797024805]
offline preference optimization is a key method for enhancing and controlling the quality of Large Language Model (LLM) outputs. We perform objective discovery to automatically discover new state-of-the-art preference optimization algorithms without (expert) human intervention. Experiments demonstrate the state-of-the-art performance of DiscoPOP, a novel algorithm that adaptively blends logistic and exponential losses.
arXiv Detail & Related papers (2024-06-12T16:58:41Z)
Extracting Heuristics from Large Language Models for Reward Shaping in Reinforcement Learning [28.077228879886402]
Reinforcement Learning (RL) suffers from sample inefficiency in reward domains, and the problem is further pronounced in case of transitions. To improve the sample efficiency, reward shaping is a well-studied approach to introduce intrinsic rewards that can help the RL agent converge to an optimal policy faster.
arXiv Detail & Related papers (2024-05-24T03:53:57Z)
Monte Carlo Tree Search Boosts Reasoning via Iterative Preference Learning [55.96599486604344]
We introduce an approach aimed at enhancing the reasoning capabilities of Large Language Models (LLMs) through an iterative preference learning process. We use Monte Carlo Tree Search (MCTS) to iteratively collect preference data, utilizing its look-ahead ability to break down instance-level rewards into more granular step-level signals. The proposed algorithm employs Direct Preference Optimization (DPO) to update the LLM policy using this newly generated step-level preference data.
arXiv Detail & Related papers (2024-05-01T11:10:24Z)
Unleashing the Potential of Large Language Models as Prompt Optimizers: Analogical Analysis with Gradient-based Model Optimizers [108.72225067368592]
We propose a novel perspective to investigate the design of large language models (LLMs)-based prompts.<n>We identify two pivotal factors in model parameter learning: update direction and update method.<n>We develop a capable Gradient-inspired Prompt-based GPO.
arXiv Detail & Related papers (2024-02-27T15:05:32Z)
Let's reward step by step: Step-Level reward model as the Navigators for Reasoning [64.27898739929734]
Process-Supervised Reward Model (PRM) furnishes LLMs with step-by-step feedback during the training phase. We propose a greedy search algorithm that employs the step-level feedback from PRM to optimize the reasoning pathways explored by LLMs. To explore the versatility of our approach, we develop a novel method to automatically generate step-level reward dataset for coding tasks and observed similar improved performance in the code generation tasks.
arXiv Detail & Related papers (2023-10-16T05:21:50Z)
Efficient Non-Parametric Optimizer Search for Diverse Tasks [93.64739408827604]
We present the first efficient scalable and general framework that can directly search on the tasks of interest. Inspired by the innate tree structure of the underlying math expressions, we re-arrange the spaces into a super-tree. We adopt an adaptation of the Monte Carlo method to tree search, equipped with rejection sampling and equivalent- form detection.
arXiv Detail & Related papers (2022-09-27T17:51:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.