Reinforced Lin-Kernighan-Helsgaun Algorithms for the Traveling Salesman
Problems
- URL: http://arxiv.org/abs/2207.03876v1
- Date: Fri, 15 Jul 2022 06:43:33 GMT
- Title: Reinforced Lin-Kernighan-Helsgaun Algorithms for the Traveling Salesman
Problems
- Authors: Jiongzhi Zheng and Kun He and Jianrong Zhou and Yan Jin and Chu-Min Li
- Abstract summary: The Lin-Kernighan-Helsgaun (LKH) algorithm is one of the state-of-the-art local search algorithms for the Traveling Salesman Problem (TSP)
LKH and LKH-3 associate a candidate set to each city to improve the algorithm efficiency, and have two different methods, called $alpha$-measure and POPMUSIC, to decide the candidate sets.
In this work, we first propose a Variable Strategy Reinforced LKH (VSR-LKH) algorithm, which incorporates three reinforcement learning methods (Q-learning, Sarsa, and Monte Carlo)
- Score: 17.564543997481508
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: TSP is a classical NP-hard combinatorial optimization problem with many
practical variants. LKH is one of the state-of-the-art local search algorithms
for the TSP. LKH-3 is a powerful extension of LKH that can solve many TSP
variants. Both LKH and LKH-3 associate a candidate set to each city to improve
the efficiency, and have two different methods, $\alpha$-measure and POPMUSIC,
to decide the candidate sets. In this work, we first propose a Variable
Strategy Reinforced LKH (VSR-LKH) algorithm, which incorporates three
reinforcement learning methods (Q-learning, Sarsa, Monte Carlo) with LKH, for
the TSP. We further propose a new algorithm called VSR-LKH-3 that combines the
variable strategy reinforcement learning method with LKH-3 for typical TSP
variants, including the TSP with time windows (TSPTW) and Colored TSP (CTSP).
The proposed algorithms replace the inflexible traversal operations in LKH and
LKH-3 and let the algorithms learn to make a choice at each search step by
reinforcement learning. Both LKH and LKH-3, with either $\alpha$-measure or
POPMUSIC, can be significantly improved by our methods. Extensive experiments
on 236 widely-used TSP benchmarks with up to 85,900 cities demonstrate the
excellent performance of VSR-LKH. VSR-LKH-3 also significantly outperforms the
state-of-the-art heuristics for TSPTW and CTSP.
Related papers
- Is Optimal Transport Necessary for Inverse Reinforcement Learning? [0.0]
Inverse Reinforcement Learning (IRL) aims to recover a reward function from expert demonstrations.<n>We propose two simple, alternatives to Optimal Transport (OT) in IRL.<n>We show that our simple rewards match or outperform recent OT-based approaches.
arXiv Detail & Related papers (2025-06-07T13:29:37Z) - Bandit based Dynamic Candidate Edge Selection in Solving Traveling Salesman Problems [13.620917861669607]
We introduce a bandit-based method for the Lin-Kernighan-Helsgaun (LKH) algorithm for the Traveling Salesman Problem (TSP)<n>We incorporate multi-armed bandit models to dynamically select the most suitable candidate edges in each iteration, enabling LKH to make smarter choices and lead to improved solutions.
arXiv Detail & Related papers (2025-05-21T06:11:00Z) - DualOpt: A Dual Divide-and-Optimize Algorithm for the Large-scale Traveling Salesman Problem [16.841700899374654]
The paper proposes a dual divide-and-optimize algorithm (DualOpt) for solving the large-scale traveling salesman problem (T)
DualOpt combines two complementary strategies to improve both solution quality and computational efficiency.
The proposed DualOpt achieves highly competitive results compared to 10 state-of-the-art algorithms in the literature.
arXiv Detail & Related papers (2025-01-15T04:16:28Z) - Multi-armed Bandit and Backbone boost Lin-Kernighan-Helsgaun Algorithm for the Traveling Salesman Problems [9.035853156510507]
Lin-Kernighan-Helsguan (LKH) is a classic local search algorithm for the Traveling Salesman Problem (TSP)
LKH introduces an $alpha$-value to replace the traditional distance metric for evaluating the edge quality.
We propose a novel way to extract backbone information during the TSP local search process, which is dynamic and can be updated once a local optimal solution is found.
arXiv Detail & Related papers (2025-01-07T16:45:41Z) - Offline Imitation Learning from Multiple Baselines with Applications to Compiler Optimization [17.729842629392742]
We study a Reinforcement Learning problem in which we are given a set of trajectories collected with K baseline policies.
The goal is to learn a policy which performs as well as the best combination of baselines on the entire state space.
arXiv Detail & Related papers (2024-03-28T14:34:02Z) - H-TSP: Hierarchically Solving the Large-Scale Travelling Salesman
Problem [11.310986634385742]
We propose an end-to-end learning framework based on hierarchical reinforcement learning, called H-TSP, for addressing the large-scale Travelling Salesman Problem (TSP)
We show that H-TSP can achieve comparable results as SOTA search-based approaches, and we reduce the time consumption up to two orders of magnitude (3.32s vs. 395.85s)
Although there are still gaps to SOTA results with respect to solution quality, we believe H-TSP will be useful for practical applications, particularly those that are time-sensitive e.g., on-call routing and ride
arXiv Detail & Related papers (2023-04-19T03:10:30Z) - MARLIN: Soft Actor-Critic based Reinforcement Learning for Congestion
Control in Real Networks [63.24965775030673]
We propose a novel Reinforcement Learning (RL) approach to design generic Congestion Control (CC) algorithms.
Our solution, MARLIN, uses the Soft Actor-Critic algorithm to maximize both entropy and return.
We trained MARLIN on a real network with varying background traffic patterns to overcome the sim-to-real mismatch.
arXiv Detail & Related papers (2023-02-02T18:27:20Z) - Improving Generalization of Deep Reinforcement Learning-based TSP
Solvers [19.29028564568974]
We propose a novel approach named MAGIC that includes a deep learning architecture and a DRL training method.
Our architecture, which integrates a multilayer perceptron, a graph neural network, and an attention model, defines a policy that sequentially generates a traveling salesman solution.
Our training method includes several innovations: (1) we interleave DRL policy updates with local search (using a new local search technique), (2) we use a novel simple baseline, and (3) we apply gradient learning.
arXiv Detail & Related papers (2021-10-06T15:16:19Z) - Uniform-PAC Bounds for Reinforcement Learning with Linear Function
Approximation [92.3161051419884]
We study reinforcement learning with linear function approximation.
Existing algorithms only have high-probability regret and/or Probably Approximately Correct (PAC) sample complexity guarantees.
We propose a new algorithm called FLUTE, which enjoys uniform-PAC convergence to the optimal policy with high probability.
arXiv Detail & Related papers (2021-06-22T08:48:56Z) - Online Sub-Sampling for Reinforcement Learning with General Function
Approximation [111.01990889581243]
In this paper, we establish an efficient online sub-sampling framework that measures the information gain of data points collected by an RL algorithm.
For a value-based method with complexity-bounded function class, we show that the policy only needs to be updated for $proptooperatornamepolylog(K)$ times.
In contrast to existing approaches that update the policy for at least $Omega(K)$ times, our approach drastically reduces the number of optimization calls in solving for a policy.
arXiv Detail & Related papers (2021-06-14T07:36:25Z) - Sublinear Least-Squares Value Iteration via Locality Sensitive Hashing [49.73889315176884]
We present the first provable Least-Squares Value Iteration (LSVI) algorithms that have runtime complexity sublinear in the number of actions.
We build the connections between the theory of approximate maximum inner product search and the regret analysis of reinforcement learning.
arXiv Detail & Related papers (2021-05-18T05:23:53Z) - Learning Sampling Policy for Faster Derivative Free Optimization [100.27518340593284]
We propose a new reinforcement learning based ZO algorithm (ZO-RL) with learning the sampling policy for generating the perturbations in ZO optimization instead of using random sampling.
Our results show that our ZO-RL algorithm can effectively reduce the variances of ZO gradient by learning a sampling policy, and converge faster than existing ZO algorithms in different scenarios.
arXiv Detail & Related papers (2021-04-09T14:50:59Z) - Combining Reinforcement Learning with Lin-Kernighan-Helsgaun Algorithm
for the Traveling Salesman Problem [12.851149098610096]
We propose a variable strategy reinforced approach, denoted as VSR-LKH.
It combines three reinforcement learning methods (Q-learning, Sarsa and Monte Carlo) with the well-known TSP algorithm, called Lin-Kernighan-Helsgaun (LKH)
VSR-LKH replaces the inflexible operation in LKH, and lets the program learn to make choice at each search step by reinforcement learning.
arXiv Detail & Related papers (2020-12-08T14:58:36Z) - SUNRISE: A Simple Unified Framework for Ensemble Learning in Deep
Reinforcement Learning [102.78958681141577]
We present SUNRISE, a simple unified ensemble method, which is compatible with various off-policy deep reinforcement learning algorithms.
SUNRISE integrates two key ingredients: (a) ensemble-based weighted Bellman backups, which re-weight target Q-values based on uncertainty estimates from a Q-ensemble, and (b) an inference method that selects actions using the highest upper-confidence bounds for efficient exploration.
arXiv Detail & Related papers (2020-07-09T17:08:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.