POMO+: Leveraging starting nodes in POMO for solving Capacitated Vehicle Routing Problem
- URL: http://arxiv.org/abs/2508.08493v1
- Date: Mon, 11 Aug 2025 21:55:16 GMT
- Title: POMO+: Leveraging starting nodes in POMO for solving Capacitated Vehicle Routing Problem
- Authors: Szymon Jakubicz, Karol Kuźniak, Jan Wawszczak, Paweł Gora,
- Abstract summary: In this work, we improved POMO, creating a method (textbfPOMO+) that leverages the initial nodes to find a solution in a more informed way.<n>We validated our models on the CVLIBRP dataset and noticed improvements in problem instances with up to 100 customers.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In recent years, reinforcement learning (RL) methods have emerged as a promising approach for solving combinatorial problems. Among RL-based models, POMO has demonstrated strong performance on a variety of tasks, including variants of the Vehicle Routing Problem (VRP). However, there is room for improvement for these tasks. In this work, we improved POMO, creating a method (\textbf{POMO+}) that leverages the initial nodes to find a solution in a more informed way. We ran experiments on our new model and observed that our solution converges faster and achieves better results. We validated our models on the CVRPLIB dataset and noticed improvements in problem instances with up to 100 customers. We hope that our research in this project can lead to further advancements in the field.
Related papers
- Learning to Pose Problems: Reasoning-Driven and Solver-Adaptive Data Synthesis for Large Reasoning Models [54.29243291958429]
We develop a problem generator that reasons explicitly to plan problem directions before synthesis.<n>We treat the solver's feedback on synthetic problems as a reward signal, enabling the generator to calibrate difficulty.<n>Our method achieves an average improvement of 2.5% and generalizes to both language and vision-language models.
arXiv Detail & Related papers (2025-11-13T03:08:51Z) - Exploring Solution Divergence and Its Effect on Large Language Model Problem Solving [37.94354699202412]
We show that higher solution divergence is positively related to better problem-solving abilities across various models.<n>We propose solution divergence as a novel metric that can support both SFT and RL strategies.
arXiv Detail & Related papers (2025-09-26T15:27:50Z) - Staying in the Sweet Spot: Responsive Reasoning Evolution via Capability-Adaptive Hint Scaffolding [59.60915947702282]
Reinforcement learning with verifiable rewards (RLVR) has achieved remarkable success in enhancing the reasoning capabilities of large language models (LLMs)<n>Existing RLVR methods often suffer from exploration inefficiency due to mismatches between the training data's difficulty and the model's capability.<n>We propose SEELE, a novel supervision-aided RLVR framework that dynamically adjusts problem difficulty to stay within the high-efficiency region.
arXiv Detail & Related papers (2025-09-08T17:36:21Z) - Improving Generalization of Neural Combinatorial Optimization for Vehicle Routing Problems via Test-Time Projection Learning [3.0711362702464684]
We introduce a novel learning framework driven by Large Language Models (LLMs)<n>Unlike prevailing techniques that necessitate joint training with the neural network, our approach operates exclusively during the inference phase.<n>Our method enables a backbone model (trained on 100-node instances) to achieve superior performance on large-scale Traveling Salesman Problem (TSP) and Capacitated Vehicle Routing Problem (CVRP) of up to 100K nodes from diverse distributions.
arXiv Detail & Related papers (2025-06-03T03:15:22Z) - Improving Generalization of Alignment with Human Preferences through
Group Invariant Learning [56.19242260613749]
Reinforcement Learning from Human Feedback (RLHF) enables the generation of responses more aligned with human preferences.
Previous work shows that Reinforcement Learning (RL) often exploits shortcuts to attain high rewards and overlooks challenging samples.
We propose a novel approach that can learn a consistent policy via RL across various data groups or domains.
arXiv Detail & Related papers (2023-10-18T13:54:15Z) - DIFUSCO: Graph-based Diffusion Solvers for Combinatorial Optimization [51.517956081644186]
We introduce a new graph-based diffusion framework, namely DIFUSCO.
Our framework casts NPC problems as discrete 0, 1-vector optimization problems.
For the MIS problem, DIFUSCO outperforms the previous state-of-the-art neural solver on the challenging SATLIB benchmark.
arXiv Detail & Related papers (2023-02-16T11:13:36Z) - DIMES: A Differentiable Meta Solver for Combinatorial Optimization
Problems [41.57773395100222]
Deep reinforcement learning (DRL) models have shown promising results in solving NP-hard Combinatorial Optimization problems.
This paper addresses the scalability challenge in large-scale optimization by proposing a novel approach, namely, DIMES.
Unlike previous DRL methods which suffer from costly autoregressive decoding or iterative refinements of discrete solutions, DIMES introduces a compact continuous space for parameterizing the underlying distribution of candidate solutions.
Extensive experiments show that DIMES outperforms recent DRL-based methods on large benchmark datasets for Traveling Salesman Problems and Maximal Independent Set problems.
arXiv Detail & Related papers (2022-10-08T23:24:37Z) - Solving the Traveling Salesperson Problem with Precedence Constraints by
Deep Reinforcement Learning [59.14935871979047]
This work presents solutions to the Traveling Salesperson Problem with precedence constraints (TSPPC) using Deep Reinforcement Learning (DRL)
Common to these approaches is the use of graph models based on multi-head attention layers.
arXiv Detail & Related papers (2022-07-04T14:31:47Z) - Neural Improvement Heuristics for Graph Combinatorial Optimization
Problems [49.85111302670361]
We introduce a novel Neural Improvement (NI) model capable of handling graph-based problems where information is encoded in the nodes, edges, or both.
The presented model serves as a fundamental component for hill-climbing-based algorithms that guide the selection of neighborhood operations for each.
arXiv Detail & Related papers (2022-06-01T10:35:29Z) - Sample-Efficient, Exploration-Based Policy Optimisation for Routing
Problems [2.6782615615913348]
This paper presents a new reinforcement learning approach that is based on entropy.
In addition, we design an off-policy-based reinforcement learning technique that maximises the expected return.
We show that our model can generalise to various route problems.
arXiv Detail & Related papers (2022-05-31T09:51:48Z) - Towards Explainable Metaheuristic: Mining Surrogate Fitness Models for
Importance of Variables [69.02115180674885]
We use four benchmark problems to train a surrogate model and investigate the learning of the search space by the surrogate model.
We show that the surrogate model picks out key characteristics of the problem as it is trained on population data from each generation.
arXiv Detail & Related papers (2022-05-31T09:16:18Z) - An Overview and Experimental Study of Learning-based Optimization
Algorithms for Vehicle Routing Problem [49.04543375851723]
Vehicle routing problem (VRP) is a typical discrete optimization problem.
Many studies consider learning-based optimization algorithms to solve VRP.
This paper reviews recent advances in this field and divides relevant approaches into end-to-end approaches and step-by-step approaches.
arXiv Detail & Related papers (2021-07-15T02:13:03Z) - Learning to Delegate for Large-scale Vehicle Routing [4.425982186154401]
Vehicle routing problems (VRPs) are a class of problems with wide practical applications.
Previous or learning-based works achieve decent solutions on small problem instances of up to 100 customers.
This article presents a novel learning-augmented local search algorithm to solve large-scale VRP.
arXiv Detail & Related papers (2021-07-08T22:51:58Z) - POMO: Policy Optimization with Multiple Optima for Reinforcement
Learning [8.819672165548477]
We introduce Policy Optimization with Multiple Optima (POMO), an end-to-end approach for building such a solver.
POMO is applicable to a wide range of CO problems. It is designed to exploit the symmetries in the representation of a CO solution.
We demonstrate the effectiveness of POMO by solving three popular NP-hard problems, namely, traveling salesman (TSP), capacitated vehicle routing (CVRP), and 0-1 knapsack (KP)
arXiv Detail & Related papers (2020-10-30T00:57:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.