Related papers: Construct, Merge, Solve & Adapt with Reinforcement Learning for the min-max Multiple Traveling Salesman Problem

Construct, Merge, Solve & Adapt with Reinforcement Learning for the min-max Multiple Traveling Salesman Problem

URL: http://arxiv.org/abs/2602.23579v1
Date: Fri, 27 Feb 2026 01:07:59 GMT
Title: Construct, Merge, Solve & Adapt with Reinforcement Learning for the min-max Multiple Traveling Salesman Problem
Authors: Guillem Rodríguez-Corominas, Maria J. Blesa, Christian Blum,
Abstract summary: Multiple Traveling Salesman Problem (mTSP) extends Traveling Salesman Problem to m tours that start and end at a common depot.<n>In the min-max variant, the objective is to minimize the longest tour, reflecting workload balance.<n>We propose a hybrid approach, Construct, Merge, Solve & Adapt with Reinforcement Learning (RL-CMSA), for the symmetric single-depot min-max mTSP.
Score: 1.388970969945297
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The Multiple Traveling Salesman Problem (mTSP) extends the Traveling Salesman Problem to m tours that start and end at a common depot and jointly visit all customers exactly once. In the min-max variant, the objective is to minimize the longest tour, reflecting workload balance. We propose a hybrid approach, Construct, Merge, Solve & Adapt with Reinforcement Learning (RL-CMSA), for the symmetric single-depot min-max mTSP. The method iteratively constructs diverse solutions using probabilistic clustering guided by learned pairwise q-values, merges routes into a compact pool, solves a restricted set-covering MILP, and refines solutions via inter-route remove, shift, and swap moves. The q-values are updated by reinforcing city-pair co-occurrences in high-quality solutions, while the pool is adapted through ageing and pruning. This combination of exact optimization and reinforcement-guided construction balances exploration and exploitation. Computational results on random and TSPLIB instances show that RL-CMSA consistently finds (near-)best solutions and outperforms a state-of-the-art hybrid genetic algorithm under comparable time limits, especially as instance size and the number of salesmen increase.

Related papers

Assignment-Routing Optimization: Solvers for Problems Under Constraints [0.0]
We study the Joint Routing-Assignment (JRA) problem in which items must be assigned one-to-one to placeholders.<n>We develop a solver tailored for practical packaging-planning scenarios with richer constraints.<n>Results highlight the practical applicability of MIP-based JRA optimization for robotic packaging, motion planning, and complex logistics.
arXiv Detail & Related papers (2025-12-21T06:32:31Z)
Don't Be Greedy, Just Relax! Pruning LLMs via Frank-Wolfe [61.68406997155879]
State-of-the-art Large Language Model (LLM) pruning methods operate layer-wise, minimizing the per-layer pruning error on a small dataset to avoid full retraining.<n>Existing methods hence rely on greedy convexs that ignore the weight interactions in the pruning objective.<n>Our method drastically reduces the per-layer pruning error, outperforms strong baselines on state-of-the-art GPT architectures, and remains memory-efficient.
arXiv Detail & Related papers (2025-10-15T16:13:44Z)
Solving the Min-Max Multiple Traveling Salesmen Problem via Learning-Based Path Generation and Optimal Splitting [8.941535841606584]
Min-Max Multiple Traveling Salesmen Problem aims to coordinate tours for multiple salesmen so that the length of the longest tour is minimized.<n>Due to its NP-hard nature, exact solvers become impractical under the assumption that $P ne NP$.<n>We propose a novel two-stage framework named textbfGenerate-and-Split (GaS), which integrates reinforcement learning with an optimal splitting algorithm in a joint training process.
arXiv Detail & Related papers (2025-08-23T17:00:57Z)
Collab: Controlled Decoding using Mixture of Agents for LLM Alignment [90.6117569025754]
Reinforcement learning from human feedback has emerged as an effective technique to align Large Language models.<n>Controlled Decoding provides a mechanism for aligning a model at inference time without retraining.<n>We propose a mixture of agent-based decoding strategies leveraging the existing off-the-shelf aligned LLM policies.
arXiv Detail & Related papers (2025-03-27T17:34:25Z)
Reinforcement learning with combinatorial actions for coupled restless bandits [62.89013331120493]
We propose SEQUOIA, an RL algorithm that directly optimize for long-term reward over the feasible action space.<n>We empirically validate SEQUOIA on four novel restless bandit problems with constraints: multiple interventions, path constraints, bipartite matching, and capacity constraints.
arXiv Detail & Related papers (2025-03-01T21:25:21Z)
LLaMA-Berry: Pairwise Optimization for O1-like Olympiad-Level Mathematical Reasoning [56.273799410256075]
The framework combines Monte Carlo Tree Search (MCTS) with iterative Self-Refine to optimize the reasoning path. The framework has been tested on general and advanced benchmarks, showing superior performance in terms of search efficiency and problem-solving capability.
arXiv Detail & Related papers (2024-10-03T18:12:29Z)
A Hybrid Genetic Algorithm for the min-max Multiple Traveling Salesman Problem [0.9790236766474201]
This paper proposes a hybrid genetic algorithm for solving the Multiple Traveling Salesman Problem (mTSP) A novel crossover operator is designed to combine similar tours from two parents and offers great diversity for the population. Our algorithm outperforms all existing algorithms on average, with similar cutoff time thresholds, when tested against benchmark sets.
arXiv Detail & Related papers (2023-07-14T01:57:10Z)
Symmetric Tensor Networks for Generative Modeling and Constrained Combinatorial Optimization [72.41480594026815]
Constrained optimization problems abound in industry, from portfolio optimization to logistics. One of the major roadblocks in solving these problems is the presence of non-trivial hard constraints which limit the valid search space. In this work, we encode arbitrary integer-valued equality constraints of the form Ax=b, directly into U(1) symmetric networks (TNs) and leverage their applicability as quantum-inspired generative models.
arXiv Detail & Related papers (2022-11-16T18:59:54Z)
DIMES: A Differentiable Meta Solver for Combinatorial Optimization Problems [41.57773395100222]
Deep reinforcement learning (DRL) models have shown promising results in solving NP-hard Combinatorial Optimization problems. This paper addresses the scalability challenge in large-scale optimization by proposing a novel approach, namely, DIMES. Unlike previous DRL methods which suffer from costly autoregressive decoding or iterative refinements of discrete solutions, DIMES introduces a compact continuous space for parameterizing the underlying distribution of candidate solutions. Extensive experiments show that DIMES outperforms recent DRL-based methods on large benchmark datasets for Traveling Salesman Problems and Maximal Independent Set problems.
arXiv Detail & Related papers (2022-10-08T23:24:37Z)
Multi-block-Single-probe Variance Reduced Estimator for Coupled Compositional Optimization [49.58290066287418]
We propose a novel method named Multi-block-probe Variance Reduced (MSVR) to alleviate the complexity of compositional problems. Our results improve upon prior ones in several aspects, including the order of sample complexities and dependence on strongity.
arXiv Detail & Related papers (2022-07-18T12:03:26Z)
Anytime Behavior of Inexact TSP Solvers and Perspectives for Automated Algorithm Selection [0.0]
Traveling-Salesperson-Problem (TSP) is arguably one of the best-known NP-hard optimization problems. We extend existing benchmarking studies by addressing anytime behaviour of inexact TSP solvers. It turns out that performance ranking of solvers is highly dependent on the focused approximation quality.
arXiv Detail & Related papers (2020-05-27T11:36:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.