Learning Collaborative Policies to Solve NP-hard Routing Problems
- URL: http://arxiv.org/abs/2110.13987v1
- Date: Tue, 26 Oct 2021 19:46:21 GMT
- Title: Learning Collaborative Policies to Solve NP-hard Routing Problems
- Authors: Minsu Kim, Jinkyoo Park and Joungho Kim
- Abstract summary: This paper proposes a novel hierarchical problem-solving strategy called learning collaborative policies (LCP)
It can effectively find the near-optimum solution using two iterative DRL policies: the seeder and reviser.
Extensive experiments demonstrate that the proposed two-policies collaboration scheme improves over single-policy DRL framework on various NP-hard routing problems.
- Score: 13.13675711285772
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, deep reinforcement learning (DRL) frameworks have shown potential
for solving NP-hard routing problems such as the traveling salesman problem
(TSP) without problem-specific expert knowledge. Although DRL can be used to
solve complex problems, DRL frameworks still struggle to compete with
state-of-the-art heuristics showing a substantial performance gap. This paper
proposes a novel hierarchical problem-solving strategy, termed learning
collaborative policies (LCP), which can effectively find the near-optimum
solution using two iterative DRL policies: the seeder and reviser. The seeder
generates as diversified candidate solutions as possible (seeds) while being
dedicated to exploring over the full combinatorial action space (i.e., sequence
of assignment action). To this end, we train the seeder's policy using a simple
yet effective entropy regularization reward to encourage the seeder to find
diverse solutions. On the other hand, the reviser modifies each candidate
solution generated by the seeder; it partitions the full trajectory into
sub-tours and simultaneously revises each sub-tour to minimize its traveling
distance. Thus, the reviser is trained to improve the candidate solution's
quality, focusing on the reduced solution space (which is beneficial for
exploitation). Extensive experiments demonstrate that the proposed two-policies
collaboration scheme improves over single-policy DRL framework on various
NP-hard routing problems, including TSP, prize collecting TSP (PCTSP), and
capacitated vehicle routing problem (CVRP).
Related papers
- Multiobjective Vehicle Routing Optimization with Time Windows: A Hybrid Approach Using Deep Reinforcement Learning and NSGA-II [52.083337333478674]
This paper proposes a weight-aware deep reinforcement learning (WADRL) approach designed to address the multiobjective vehicle routing problem with time windows (MOVRPTW)
The Non-dominated sorting genetic algorithm-II (NSGA-II) method is then employed to optimize the outcomes produced by the WADRL.
arXiv Detail & Related papers (2024-07-18T02:46:06Z) - Discovering Multiple Solutions from a Single Task in Offline Reinforcement Learning [51.00472376469131]
We propose an algorithm that learns multiple solutions from a single task in offline reinforcement learning.
Our experimental results show that the proposed algorithm learns multiple qualitatively and quantitatively distinctive solutions in offline RL.
arXiv Detail & Related papers (2024-06-10T03:25:49Z) - Combinatorial Optimization with Policy Adaptation using Latent Space Search [44.12073954093942]
We present a novel approach for designing performant algorithms to solve complex, typically NP-hard, problems.
We show that our search strategy outperforms state-of-the-art approaches on 11 standard benchmarking tasks.
arXiv Detail & Related papers (2023-11-13T12:24:54Z) - MARLIN: Soft Actor-Critic based Reinforcement Learning for Congestion
Control in Real Networks [63.24965775030673]
We propose a novel Reinforcement Learning (RL) approach to design generic Congestion Control (CC) algorithms.
Our solution, MARLIN, uses the Soft Actor-Critic algorithm to maximize both entropy and return.
We trained MARLIN on a real network with varying background traffic patterns to overcome the sim-to-real mismatch.
arXiv Detail & Related papers (2023-02-02T18:27:20Z) - Online Control of Adaptive Large Neighborhood Search using Deep Reinforcement Learning [4.374837991804085]
We introduce a Deep Reinforcement Learning based approach called DR-ALNS that selects operators, adjusts parameters, and controls the acceptance criterion throughout the search.
We evaluate the proposed method on a problem with orienteering weights and time windows, as presented in an IJCAI competition.
The results show that our approach outperforms vanilla ALNS, ALNS tuned with Bayesian optimization, and two state-of-the-art DRL approaches.
arXiv Detail & Related papers (2022-11-01T21:33:46Z) - Solving the Traveling Salesperson Problem with Precedence Constraints by
Deep Reinforcement Learning [59.14935871979047]
This work presents solutions to the Traveling Salesperson Problem with precedence constraints (TSPPC) using Deep Reinforcement Learning (DRL)
Common to these approaches is the use of graph models based on multi-head attention layers.
arXiv Detail & Related papers (2022-07-04T14:31:47Z) - Reinforcement Learning to Solve NP-hard Problems: an Application to the
CVRP [0.0]
We evaluate the use of Reinforcement Learning (RL) to solve a classic optimization problem.
We compare two of the most promising RL approaches with traditional solving techniques on a set of benchmark instances.
We find that despite not returning the best solution, the RL approach has many advantages over traditional solvers.
arXiv Detail & Related papers (2022-01-14T11:16:17Z) - Semantic-Aware Collaborative Deep Reinforcement Learning Over Wireless
Cellular Networks [82.02891936174221]
Collaborative deep reinforcement learning (CDRL) algorithms in which multiple agents can coordinate over a wireless network is a promising approach.
In this paper, a novel semantic-aware CDRL method is proposed to enable a group of untrained agents with semantically-linked DRL tasks to collaborate efficiently across a resource-constrained wireless cellular network.
arXiv Detail & Related papers (2021-11-23T18:24:47Z) - Deep Policy Dynamic Programming for Vehicle Routing Problems [89.96386273895985]
We propose Deep Policy Dynamic Programming (D PDP) to combine the strengths of learned neurals with those of dynamic programming algorithms.
D PDP prioritizes and restricts the DP state space using a policy derived from a deep neural network, which is trained to predict edges from example solutions.
We evaluate our framework on the travelling salesman problem (TSP) and the vehicle routing problem (VRP) and show that the neural policy improves the performance of (restricted) DP algorithms.
arXiv Detail & Related papers (2021-02-23T15:33:57Z) - Learning Vehicle Routing Problems using Policy Optimisation [4.093722933440819]
State-of-the-art approaches learn a policy using reinforcement learning, and the learnt policy acts as a pseudo solver.
These approaches have demonstrated good performance in some cases, but given the large search space typical of routing problem, they can converge too quickly to poor policy.
We propose entropy regularised reinforcement learning (ERRL) that supports exploration by providing more policies.
arXiv Detail & Related papers (2020-12-24T14:18:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.