An End-to-End Deep Reinforcement Learning Approach for Solving the Traveling Salesman Problem with Drones
- URL: http://arxiv.org/abs/2511.05265v1
- Date: Fri, 07 Nov 2025 14:26:29 GMT
- Title: An End-to-End Deep Reinforcement Learning Approach for Solving the Traveling Salesman Problem with Drones
- Authors: Taihelong Zeng, Yun Lin, Yuhe Shi, Yan Li, Zhiqing Wei, Xuanru Ji,
- Abstract summary: This study proposes a hierarchical Actor-Critic deep reinforcement learning framework for solving the Traveling Salesman Problem with Drones (TSP-D)<n>The architecture consists of two primary computation: a Transformer-inspired encoder and an efficient Minimal Gated Unit decoder.<n>The entire framework operates within an asynchronous advantage actor-critic paradigm.
- Score: 12.385878815004283
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The emergence of truck-drone collaborative systems in last-mile logistics has positioned the Traveling Salesman Problem with Drones (TSP-D) as a pivotal extension of classical routing optimization, where synchronized vehicle coordination promises substantial operational efficiency and reduced environmental impact, yet introduces NP-hard combinatorial complexity beyond the reach of conventional optimization paradigms. Deep reinforcement learning offers a theoretically grounded framework to address TSP-D's inherent challenges through self-supervised policy learning and adaptive decision-making. This study proposes a hierarchical Actor-Critic deep reinforcement learning framework for solving the TSP-D problem. The architecture consists of two primary components: a Transformer-inspired encoder and an efficient Minimal Gated Unit decoder. The encoder incorporates a novel, optimized k-nearest neighbors sparse attention mechanism specifically for focusing on relevant spatial relationships, further enhanced by the integration of global node features. The Minimal Gated Unit decoder processes these encoded representations to efficiently generate solution sequences. The entire framework operates within an asynchronous advantage actor-critic paradigm. Experimental results show that, on benchmark TSP-D instances of various scales (N=10 to 100), the proposed model can obtain competitive or even superior solutions in shorter average computation times compared to high-performance heuristic algorithms and existing reinforcement learning methods. Moreover, compared to advanced reinforcement learning algorithm benchmarks, the proposed framework significantly reduces the total training time required while achieving superior final performance, highlighting its notable advantage in training efficiency.
Related papers
- CAMP-HiVe: Cyclic Pair Merging based Efficient DNN Pruning with Hessian-Vector Approximation for Resource-Constrained Systems [3.343542849202802]
We introduce CAMP-HiVe, a cyclic pair merging-based pruning with Hessian Vector approximation.<n>Our experimental results demonstrate that our proposed method achieves significant reductions in computational requirements.<n>It outperforms the existing state-of-the-art neural pruning methods.
arXiv Detail & Related papers (2025-11-09T07:58:36Z) - XQC: Well-conditioned Optimization Accelerates Deep Reinforcement Learning [26.063477716451512]
We introduce XQC: a well-motivated, sample-efficient deep actor-critic algorithm built upon soft actor-critic.<n>We achieve state-of-the-art sample efficiency across 55 proprioception and 15 vision-based continuous control tasks.
arXiv Detail & Related papers (2025-09-29T17:58:53Z) - Efficient Split Federated Learning for Large Language Models over Communication Networks [45.02252893286613]
Fine-tuning pre-trained large language models (LLMs) in a distributed manner poses significant challenges on resource-constrained edge networks.<n>We propose SflLLM, a novel framework that integrates split federated learning with parameter-efficient fine-tuning techniques.<n>By leveraging model splitting and low-rank adaptation (LoRA), SflLLM reduces the computational burden on edge devices.
arXiv Detail & Related papers (2025-04-20T16:16:54Z) - Towards Constraint-Based Adaptive Hypergraph Learning for Solving Vehicle Routing: An End-to-End Solution [4.965709007367529]
Vehicle routing problems are characterized by vast solution spaces and intricate constraints.<n>This study introduces a novel end-to-end framework that combines constraint-oriented hypergraphs with reinforcement learning.
arXiv Detail & Related papers (2025-03-13T14:42:44Z) - Design Optimization of NOMA Aided Multi-STAR-RIS for Indoor Environments: A Convex Approximation Imitated Reinforcement Learning Approach [51.63921041249406]
Non-orthogonal multiple access (NOMA) enables multiple users to share the same frequency band, and simultaneously transmitting and reflecting reconfigurable intelligent surface (STAR-RIS)
deploying STAR-RIS indoors presents challenges in interference mitigation, power consumption, and real-time configuration.
A novel network architecture utilizing multiple access points (APs), STAR-RISs, and NOMA is proposed for indoor communication.
arXiv Detail & Related papers (2024-06-19T07:17:04Z) - An Efficient Learning-based Solver Comparable to Metaheuristics for the
Capacitated Arc Routing Problem [67.92544792239086]
We introduce an NN-based solver to significantly narrow the gap with advanced metaheuristics.
First, we propose direction-aware facilitating attention model (DaAM) to incorporate directionality into the embedding process.
Second, we design a supervised reinforcement learning scheme that involves supervised pre-training to establish a robust initial policy.
arXiv Detail & Related papers (2024-03-11T02:17:42Z) - Stochastic Unrolled Federated Learning [85.6993263983062]
We introduce UnRolled Federated learning (SURF), a method that expands algorithm unrolling to federated learning.
Our proposed method tackles two challenges of this expansion, namely the need to feed whole datasets to the unrolleds and the decentralized nature of federated learning.
arXiv Detail & Related papers (2023-05-24T17:26:22Z) - MARLIN: Soft Actor-Critic based Reinforcement Learning for Congestion
Control in Real Networks [63.24965775030673]
We propose a novel Reinforcement Learning (RL) approach to design generic Congestion Control (CC) algorithms.
Our solution, MARLIN, uses the Soft Actor-Critic algorithm to maximize both entropy and return.
We trained MARLIN on a real network with varying background traffic patterns to overcome the sim-to-real mismatch.
arXiv Detail & Related papers (2023-02-02T18:27:20Z) - Collaborative Multidisciplinary Design Optimization with Neural Networks [1.2691047660244335]
We show that, in the case of Collaborative Optimization, faster and more reliable convergence can be obtained by solving an interesting instance of binary classification.
We propose to train a neural network with an asymmetric loss function, a structure that guarantees Lipshitz continuity, and a regularization towards respecting basic distance function properties.
arXiv Detail & Related papers (2021-06-11T00:03:47Z) - Deep Multi-Task Learning for Cooperative NOMA: System Design and
Principles [52.79089414630366]
We develop a novel deep cooperative NOMA scheme, drawing upon the recent advances in deep learning (DL)
We develop a novel hybrid-cascaded deep neural network (DNN) architecture such that the entire system can be optimized in a holistic manner.
arXiv Detail & Related papers (2020-07-27T12:38:37Z) - Iterative Algorithm Induced Deep-Unfolding Neural Networks: Precoding
Design for Multiuser MIMO Systems [59.804810122136345]
We propose a framework for deep-unfolding, where a general form of iterative algorithm induced deep-unfolding neural network (IAIDNN) is developed.
An efficient IAIDNN based on the structure of the classic weighted minimum mean-square error (WMMSE) iterative algorithm is developed.
We show that the proposed IAIDNN efficiently achieves the performance of the iterative WMMSE algorithm with reduced computational complexity.
arXiv Detail & Related papers (2020-06-15T02:57:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.