Related papers: Efficient Heuristics Generation for Solving Combinatorial Optimization Problems Using Large Language Models

Efficient Heuristics Generation for Solving Combinatorial Optimization Problems Using Large Language Models

URL: http://arxiv.org/abs/2505.12627v2
Date: Wed, 11 Jun 2025 10:21:31 GMT
Title: Efficient Heuristics Generation for Solving Combinatorial Optimization Problems Using Large Language Models
Authors: Xuan Wu, Di Wang, Chunguo Wu, Lijie Wen, Chunyan Miao, Yubin Xiao, You Zhou,
Abstract summary: Recent studies exploited Large Language Models (LLMs) to autonomously generates for solving Combinatorial Optimization Problems (COPs)<n>The absence of task-specific knowledge in prompts often leads LLMs to provide unspecific search directions, obstructing derivation of well-performings.<n>We propose the Hercules algorithm, which leverages our designed Core Abstraction Prompting (CAP) method to abstract the core components from elite HGs and incorporate them as prior knowledge in prompts.
Score: 52.538586230181814
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent studies exploited Large Language Models (LLMs) to autonomously generate heuristics for solving Combinatorial Optimization Problems (COPs), by prompting LLMs to first provide search directions and then derive heuristics accordingly. However, the absence of task-specific knowledge in prompts often leads LLMs to provide unspecific search directions, obstructing the derivation of well-performing heuristics. Moreover, evaluating the derived heuristics remains resource-intensive, especially for those semantically equivalent ones, often requiring omissible resource expenditure. To enable LLMs to provide specific search directions, we propose the Hercules algorithm, which leverages our designed Core Abstraction Prompting (CAP) method to abstract the core components from elite heuristics and incorporate them as prior knowledge in prompts. We theoretically prove the effectiveness of CAP in reducing unspecificity and provide empirical results in this work. To reduce computing resources required for evaluating the derived heuristics, we propose few-shot Performance Prediction Prompting (PPP), a first-of-its-kind method for the Heuristic Generation (HG) task. PPP leverages LLMs to predict the fitness values of newly derived heuristics by analyzing their semantic similarity to previously evaluated ones. We further develop two tailored mechanisms for PPP to enhance predictive accuracy and determine unreliable predictions, respectively. The use of PPP makes Hercules more resource-efficient and we name this variant Hercules-P. Extensive experiments across four HG tasks, five COPs, and eight LLMs demonstrate that Hercules outperforms the state-of-the-art LLM-based HG algorithms, while Hercules-P excels at minimizing required computing resources. In addition, we illustrate the effectiveness of CAP, PPP, and the other proposed mechanisms by conducting relevant ablation studies.

Related papers

Agentic Reinforced Policy Optimization [66.96989268893932]
Large-scale reinforcement learning with verifiable rewards (RLVR) has demonstrated its effectiveness in harnessing the potential of large language models (LLMs) for single-turn reasoning tasks.<n>Current RL algorithms inadequately balance the models' intrinsic long-horizon reasoning capabilities and their proficiency in multi-turn tool interactions.<n>We propose Agentic Reinforced Policy Optimization (ARPO), a novel agentic RL algorithm tailored for training multi-turn LLM-based agents.
arXiv Detail & Related papers (2025-07-26T07:53:11Z)
RedAHD: Reduction-Based End-to-End Automatic Heuristic Design with Large Language Models [14.544461392180668]
We propose a novel end-to-end framework, named RedAHD, that enables these LLM-based design methods to operate without the need of humans.<n>More specifically, RedAHD employs LLMs to automate the process of reduction, i.e., transforming the COP at hand into similar COPs that are better-understood.<n>Our experimental results, evaluated on six COPs, show that RedAHD is capable of designing or improved results over the state-of-the-art methods with minimal human involvement.
arXiv Detail & Related papers (2025-05-26T17:21:16Z)
Ignite Forecasting with SPARK: An Efficient Generative Framework for Refining LLMs in Temporal Knowledge Graph Forecasting [13.402856325579236]
We introduce SPARK, a Sequence-level Proxy framework for refining Large Language Models in TKG forecasting.<n>Inspired by inference-time algorithms, SPARK offers a cost-effective, plug-and-play solution through two key innovations.<n> Experiments across diverse datasets validate SPARK's forecasting performance, robust generalization capabilities, and high efficiency.
arXiv Detail & Related papers (2025-03-27T03:02:02Z)
R1-Searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning [87.30285670315334]
textbfR1-Searcher is a novel two-stage outcome-based RL approach designed to enhance the search capabilities of Large Language Models.<n>Our framework relies exclusively on RL, without requiring process rewards or distillation for a cold start.<n>Our experiments demonstrate that our method significantly outperforms previous strong RAG methods, even when compared to the closed-source GPT-4o-mini.
arXiv Detail & Related papers (2025-03-07T17:14:44Z)
Efficient Reinforcement Learning with Large Language Model Priors [18.72288751305885]
Large language models (LLMs) have recently emerged as powerful general-purpose tools. We propose treating LLMs as prior action distributions and integrating them into RL frameworks. We show that incorporating LLM-based action priors significantly reduces exploration and complexity optimization.
arXiv Detail & Related papers (2024-10-10T13:54:11Z)
VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment [66.80143024475635]
We propose VinePPO, a straightforward approach to compute unbiased Monte Carlo-based estimates. We show that VinePPO consistently outperforms PPO and other RL-free baselines across MATH and GSM8K datasets.
arXiv Detail & Related papers (2024-10-02T15:49:30Z)
A Training Data Recipe to Accelerate A* Search with Language Models [3.037409201025504]
Large Language Models (LLMs) with search algorithms like A* holds the promise of enhanced reasoning and scalable inference. We empirically disentangle the requirements of A* search algorithm from the requirements of the LLM to generalise on this task. Our technique reduces the number of iterations required to find the solutions by up to 15x, with a wall-clock speed-up of search up to 5x.
arXiv Detail & Related papers (2024-07-13T19:21:44Z)
Extracting Heuristics from Large Language Models for Reward Shaping in Reinforcement Learning [28.077228879886402]
Reinforcement Learning (RL) suffers from sample inefficiency in reward domains, and the problem is further pronounced in case of transitions. To improve the sample efficiency, reward shaping is a well-studied approach to introduce intrinsic rewards that can help the RL agent converge to an optimal policy faster.
arXiv Detail & Related papers (2024-05-24T03:53:57Z)
How Can LLM Guide RL? A Value-Based Approach [68.55316627400683]
Reinforcement learning (RL) has become the de facto standard practice for sequential decision-making problems by improving future acting policies with feedback. Recent developments in large language models (LLMs) have showcased impressive capabilities in language understanding and generation, yet they fall short in exploration and self-improvement capabilities. We develop an algorithm named LINVIT that incorporates LLM guidance as a regularization factor in value-based RL, leading to significant reductions in the amount of data needed for learning.
arXiv Detail & Related papers (2024-02-25T20:07:13Z)
LLM Performance Predictors are good initializers for Architecture Search [28.251129134057035]
We construct Performance Predictors (PP) that estimate the performance of specific deep neural network architectures on downstream tasks. In machine translation (MT) tasks, GPT-4 with our PP prompts (LLM-PP) achieves a SoTA mean absolute error and a slight degradation in rank correlation coefficient compared to baseline predictors. For Neural Architecture Search (NAS), we introduce a Hybrid-Search algorithm (HS-NAS) employing LLM-Distill-PP for the initial search stages and reverting to the baseline predictor later.
arXiv Detail & Related papers (2023-10-25T15:34:30Z)
Secrets of RLHF in Large Language Models Part I: PPO [81.01936993929127]
Large language models (LLMs) have formulated a blueprint for the advancement of artificial general intelligence. reinforcement learning with human feedback (RLHF) emerges as the pivotal technological paradigm underpinning this pursuit. In this report, we dissect the framework of RLHF, re-evaluate the inner workings of PPO, and explore how the parts comprising PPO algorithms impact policy agent training.
arXiv Detail & Related papers (2023-07-11T01:55:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.