Related papers: Hybrid Learning and Optimization methods for solving Capacitated Vehicle Routing Problem

Hybrid Learning and Optimization methods for solving Capacitated Vehicle Routing Problem

URL: http://arxiv.org/abs/2509.15262v1
Date: Thu, 18 Sep 2025 08:38:29 GMT
Title: Hybrid Learning and Optimization methods for solving Capacitated Vehicle Routing Problem
Authors: Monit Sharma, Hoong Chuin Lau,
Abstract summary: The Capacitated Vehicle Routing Problem (CVRP) is a fundamental NP-hard problem in logistics.<n>We propose a hybrid optimization approach that integrates deep reinforcement learning (RL) to automate the selection of penalty parameter values within both classical (RL-C-ALM) and quantum-enhanced (RL-Q-ALM) ALM solvers.
Score: 3.652509571098291
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The Capacitated Vehicle Routing Problem (CVRP) is a fundamental NP-hard problem in logistics. Augmented Lagrangian Methods (ALM) for solving CVRP performance depends heavily on well-tuned penalty parameters. In this paper, we propose a hybrid optimization approach that integrates deep reinforcement learning (RL) to automate the selection of penalty parameter values within both classical (RL-C-ALM) and quantum-enhanced (RL-Q-ALM) ALM solvers. Using Soft Actor-Critic, our approach learns penalty values from CVRP instance features and constraint violations. In RL-Q-ALM, subproblems are encoded as QUBOs and solved using Variational Quantum Eigensolvers (VQE). The agent learns across episodes by maximizing solution feasibility and minimizing cost. Experiments show that RL-C-ALM outperforms manually tuned ALM on synthetic and benchmark CVRP instances, achieving better solutions with fewer iterations. Also, RL-Q-ALM matches classical solution quality on small instances but incurs higher runtimes due to quantum overhead. Our results highlight the potential of combining RL with classical and quantum solvers for scalable, adaptive combinatorial optimization.

Related papers

Enhancing CVRP Solver through LLM-driven Automatic Heuristic Design [16.7839584177637]
This study presents AILS-AHD, a novel approach that leverages Large Language Models (LLMs) to revolutionize CVRP solving.<n>Our methodology integrates an evolutionary search framework with LLMs to dynamically generate and optimize ruins within the AILS method.<n>Our approach establishes new best-known solutions for 8 out of 10 instances in the CVRPLib large-scale benchmark.
arXiv Detail & Related papers (2026-02-26T15:12:23Z)
Ring-lite: Scalable Reasoning via C3PO-Stabilized Reinforcement Learning for LLMs [51.21041884010009]
Ring-lite is a Mixture-of-Experts (MoE)-based large language model optimized via reinforcement learning (RL)<n>Our approach matches the performance of state-of-the-art (SOTA) small-scale reasoning models on challenging benchmarks.
arXiv Detail & Related papers (2025-06-17T17:12:34Z)
Preference Optimization for Combinatorial Optimization Problems [54.87466279363487]
Reinforcement Learning (RL) has emerged as a powerful tool for neural optimization, enabling models learns that solve complex problems without requiring expert knowledge.<n>Despite significant progress, existing RL approaches face challenges such as diminishing reward signals and inefficient exploration in vast action spaces.<n>We propose Preference Optimization, a novel method that transforms quantitative reward signals into qualitative preference signals via statistical comparison modeling.
arXiv Detail & Related papers (2025-05-13T16:47:00Z)
Reinforcement learning for anisotropic p-adaptation and error estimation in high-order solvers [0.37109226820205005]
We present a novel approach to automate and optimize anisotropic p-adaptation in high-order h/p using Reinforcement Learning (RL) We develop an offline training approach, decoupled from the main solver, which shows minimal overcost when performing simulations. We derive an inexpensive RL-based error estimation approach that enables the quantification of local discretization errors.
arXiv Detail & Related papers (2024-07-26T17:55:23Z)
Multiobjective Vehicle Routing Optimization with Time Windows: A Hybrid Approach Using Deep Reinforcement Learning and NSGA-II [52.083337333478674]
This paper proposes a weight-aware deep reinforcement learning (WADRL) approach designed to address the multiobjective vehicle routing problem with time windows (MOVRPTW) The Non-dominated sorting genetic algorithm-II (NSGA-II) method is then employed to optimize the outcomes produced by the WADRL.
arXiv Detail & Related papers (2024-07-18T02:46:06Z)
Adaptive $Q$-Network: On-the-fly Target Selection for Deep Reinforcement Learning [18.579378919155864]
We propose Adaptive $Q$Network (AdaQN) to take into account the non-stationarity of the optimization procedure without requiring additional samples.<n>AdaQN is theoretically sound and empirically validate it in MuJoCo control problems and Atari $2600 games.
arXiv Detail & Related papers (2024-05-25T11:57:43Z)
Quantum-Assisted Solution Paths for the Capacitated Vehicle Routing Problem [0.0]
We discuss the Capacitated Vehicle Problem (CVRP) or its reduced variant, the Travelling Salesperson Problem (TSP) Even with today's most powerful classical algorithms, the CVRP is challenging to solve classically. Quantum computing may offer a way to improve the time to solution.
arXiv Detail & Related papers (2023-04-19T13:03:50Z)
Combining Pessimism with Optimism for Robust and Efficient Model-Based Deep Reinforcement Learning [56.17667147101263]
In real-world tasks, reinforcement learning agents encounter situations that are not present during training time. To ensure reliable performance, the RL agents need to exhibit robustness against worst-case situations. We propose the Robust Hallucinated Upper-Confidence RL (RH-UCRL) algorithm to provably solve this problem.
arXiv Detail & Related papers (2021-03-18T16:50:17Z)
Combining Deep Learning and Optimization for Security-Constrained Optimal Power Flow [94.24763814458686]
Security-constrained optimal power flow (SCOPF) is fundamental in power systems. Modeling of APR within the SCOPF problem results in complex large-scale mixed-integer programs. This paper proposes a novel approach that combines deep learning and robust optimization techniques.
arXiv Detail & Related papers (2020-07-14T12:38:21Z)
Cross Entropy Hyperparameter Optimization for Constrained Problem Hamiltonians Applied to QAOA [68.11912614360878]
Hybrid quantum-classical algorithms such as Quantum Approximate Optimization Algorithm (QAOA) are considered as one of the most encouraging approaches for taking advantage of near-term quantum computers in practical applications. Such algorithms are usually implemented in a variational form, combining a classical optimization method with a quantum machine to find good solutions to an optimization problem. In this study we apply a Cross-Entropy method to shape this landscape, which allows the classical parameter to find better parameters more easily and hence results in an improved performance.
arXiv Detail & Related papers (2020-03-11T13:52:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.