Refining Hybrid Genetic Search for CVRP via Reinforcement Learning-Finetuned LLM
- URL: http://arxiv.org/abs/2510.11121v1
- Date: Mon, 13 Oct 2025 08:08:58 GMT
- Title: Refining Hybrid Genetic Search for CVRP via Reinforcement Learning-Finetuned LLM
- Authors: Rongjie Zhu, Cong Zhang, Zhiguang Cao,
- Abstract summary: Large language models (LLMs) are increasingly used as automated designers for vehicle routing problems (VRPs)<n>This work challenges that paradigm by demonstrating that a smaller, specialized LLM, when meticulously fine-tuned, can generate components that surpass expert-crafteds within advanced solvers.<n>We propose RFTHGS, a novel Reinforcement learning framework for Fine-Tuning a small LLM to generate high-performance crossover operators.
- Score: 32.938753667649074
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: While large language models (LLMs) are increasingly used as automated heuristic designers for vehicle routing problems (VRPs), current state-of-the-art methods predominantly rely on prompting massive, general-purpose models like GPT-4. This work challenges that paradigm by demonstrating that a smaller, specialized LLM, when meticulously fine-tuned, can generate components that surpass expert-crafted heuristics within advanced solvers. We propose RFTHGS, a novel Reinforcement learning (RL) framework for Fine-Tuning a small LLM to generate high-performance crossover operators for the Hybrid Genetic Search (HGS) solver, applied to the Capacitated VRP (CVRP). Our method employs a multi-tiered, curriculum-based reward function that progressively guides the LLM to master generating first compilable, then executable, and finally, superior-performing operators that exceed human expert designs. This is coupled with an operator caching mechanism that discourages plagiarism and promotes diversity during training. Comprehensive experiments show that our fine-tuned LLM produces crossover operators which significantly outperform the expert-designed ones in HGS. The performance advantage remains consistent, generalizing from small-scale instances to large-scale problems with up to 1000 nodes. Furthermore, RFTHGS exceeds the performance of leading neuro-combinatorial baselines, prompt-based methods, and commercial LLMs such as GPT-4o and GPT-4o-mini.
Related papers
- G-LNS: Generative Large Neighborhood Search for LLM-Based Automatic Heuristic Design [7.681872137995253]
We propose a generative evolutionary framework that extends Large Neighborhood Search (LNS) operators.<n>G-LNS leverages Large Language Models (LLMs) to co-evolve tightly coupled pairs of destroy and repair operators.<n>Experiments show that G-LNS significantly outperforms LLM-based AHD methods as well as strong classical solvers.
arXiv Detail & Related papers (2026-02-09T04:13:35Z) - Less Finetuning, Better Retrieval: Rethinking LLM Adaptation for Biomedical Retrievers via Synthetic Data and Model Merging [6.761707146228743]
Retrieval-augmented generation (RAG) has become the backbone of grounding Large Language Models (LLMs)<n>We present Synthesize-Merge-Train (STM), a modular framework that enhances decoder-only LLMs with synthetic hard negatives, retrieval prompt optimization, and model merging.<n> Experiments on a subset of 12 medical and general tasks from the MTEB benchmark show STM boosts task-specific experts by up to 23.5% (average 7.5%) and produces merged models that outperform both single experts and strong baselines without extensive pretraining.
arXiv Detail & Related papers (2026-02-04T16:36:00Z) - MENTOR: A Reinforcement Learning Framework for Enabling Tool Use in Small Models via Teacher-Optimized Rewards [8.645370827540996]
Distilling the tool-using capabilities of large language models (LLMs) into smaller, more efficient small language models (SLMs) is a key challenge for their practical application.<n>The predominant approach, supervised fine-tuning (SFT), suffers from poor generalization as it trains models to imitate a static set of teacher trajectories rather than learn a robust methodology.<n>We propose MENTOR, a framework that combines reinforcement learning with teacher-guided distillation.
arXiv Detail & Related papers (2025-10-21T08:03:14Z) - MM-HELIX: Boosting Multimodal Long-Chain Reflective Reasoning with Holistic Platform and Adaptive Hybrid Policy Optimization [103.74675519953898]
Long-chain reflective reasoning is a prerequisite for solving complex real-world problems.<n>We build a benchmark consisting 1,260 samples of 42 challenging synthetic tasks.<n>We generate post-training data and explore learning paradigms for exploiting such data.
arXiv Detail & Related papers (2025-10-09T17:53:58Z) - Accelerating Reinforcement Learning Algorithms Convergence using Pre-trained Large Language Models as Tutors With Advice Reusing [5.414308305392762]
Large Language Models (LLMs) as tutors in a student-teacher architecture withReinforcement Learning (RL) algorithms.<n>Our results demonstrate that LLM tutoring significantly accelerates RL convergence while maintaining comparable optimal performance.<n>Advice reuse mechanism shows a further improvement in training duration but also results in less stable convergence dynamics.
arXiv Detail & Related papers (2025-09-10T07:08:04Z) - Agentic Reinforced Policy Optimization [66.96989268893932]
Large-scale reinforcement learning with verifiable rewards (RLVR) has demonstrated its effectiveness in harnessing the potential of large language models (LLMs) for single-turn reasoning tasks.<n>Current RL algorithms inadequately balance the models' intrinsic long-horizon reasoning capabilities and their proficiency in multi-turn tool interactions.<n>We propose Agentic Reinforced Policy Optimization (ARPO), a novel agentic RL algorithm tailored for training multi-turn LLM-based agents.
arXiv Detail & Related papers (2025-07-26T07:53:11Z) - Omni-Thinker: Scaling Multi-Task RL in LLMs with Hybrid Reward and Task Scheduling [66.0871543682453]
We present Omni-Thinker, a unified reinforcement learning framework that scales large language models across diverse tasks.<n>Our scheduler orders tasks according to accuracy backward transfer (BWT), reducing forgetting and improving multi-task performance.
arXiv Detail & Related papers (2025-07-20T01:50:16Z) - Training LLM-Based Agents with Synthetic Self-Reflected Trajectories and Partial Masking [61.61356842567952]
We propose STeP, a novel method for improving LLM-based agent training.<n>We synthesize self-reflected trajectories that include reflections and corrections of error steps.<n>Experiments demonstrate that our method improves agent performance across three representative tasks.
arXiv Detail & Related papers (2025-05-26T14:11:12Z) - CALM: Co-evolution of Algorithms and Language Model for Automatic Heuristic Design [11.639825726501659]
Large language models (LLMs) can autonomously discover high-performings at a fraction of the traditional cost.<n>We propose a hybrid framework that combines verbal and numerical guidance.<n>Our method outperforms state-of-the-art (SOTA) baselines across various optimization tasks.
arXiv Detail & Related papers (2025-05-18T07:48:47Z) - R1-Searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning [87.30285670315334]
textbfR1-Searcher is a novel two-stage outcome-based RL approach designed to enhance the search capabilities of Large Language Models.<n>Our framework relies exclusively on RL, without requiring process rewards or distillation for a cold start.<n>Our experiments demonstrate that our method significantly outperforms previous strong RAG methods, even when compared to the closed-source GPT-4o-mini.
arXiv Detail & Related papers (2025-03-07T17:14:44Z) - MALT: Improving Reasoning with Multi-Agent LLM Training [67.76186488361685]
MALT (Multi-Agent LLM Training) is a novel post-training strategy that divides the reasoning process into generation, verification, and refinement steps.<n>On MATH, GSM8K, and CSQA, MALT surpasses the same baseline LLM with a relative improvement of 15.66%, 7.42%, and 9.40% respectively.
arXiv Detail & Related papers (2024-12-02T19:30:36Z) - Enhancing the General Agent Capabilities of Low-Parameter LLMs through Tuning and Multi-Branch Reasoning [56.82041895921434]
Open-source pre-trained Large Language Models (LLMs) exhibit strong language understanding and generation capabilities.
When used as agents for dealing with complex problems in the real world, their performance is far inferior to large commercial models such as ChatGPT and GPT-4.
arXiv Detail & Related papers (2024-03-29T03:48:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.