RP-DQN: An application of Q-Learning to Vehicle Routing Problems
- URL: http://arxiv.org/abs/2104.12226v1
- Date: Sun, 25 Apr 2021 18:28:35 GMT
- Title: RP-DQN: An application of Q-Learning to Vehicle Routing Problems
- Authors: Ahmad Bdeir, Simon Boeder, Tim Dernedde, Kirill Tkachuk, Jonas K.
Falkner, Lars Schmidt-Thieme
- Abstract summary: We show that our approach achieves state-of-the-art performance for autoregressive policies that sequentially insert nodes to construct solutions on the CVRP.
We are the first to tackle the MDVRP with machine learning methods and demonstrate that this problem type greatly benefits from our approach over other ML methods.
- Score: 2.6750287043724303
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: In this paper we present a new approach to tackle complex routing problems
with an improved state representation that utilizes the model complexity better
than previous methods. We enable this by training from temporal differences.
Specifically Q-Learning is employed. We show that our approach achieves
state-of-the-art performance for autoregressive policies that sequentially
insert nodes to construct solutions on the CVRP. Additionally, we are the first
to tackle the MDVRP with machine learning methods and demonstrate that this
problem type greatly benefits from our approach over other ML methods.
Related papers
- Enhancing Multi-Step Reasoning Abilities of Language Models through Direct Q-Function Optimization [50.485788083202124]
Reinforcement Learning (RL) plays a crucial role in aligning large language models with human preferences and improving their ability to perform complex tasks.
We introduce Direct Q-function Optimization (DQO), which formulates the response generation process as a Markov Decision Process (MDP) and utilizes the soft actor-critic (SAC) framework to optimize a Q-function directly parameterized by the language model.
Experimental results on two math problem-solving datasets, GSM8K and MATH, demonstrate that DQO outperforms previous methods, establishing it as a promising offline reinforcement learning approach for aligning language models.
arXiv Detail & Related papers (2024-10-11T23:29:20Z) - An Efficient Learning-based Solver Comparable to Metaheuristics for the
Capacitated Arc Routing Problem [67.92544792239086]
We introduce an NN-based solver to significantly narrow the gap with advanced metaheuristics.
First, we propose direction-aware facilitating attention model (DaAM) to incorporate directionality into the embedding process.
Second, we design a supervised reinforcement learning scheme that involves supervised pre-training to establish a robust initial policy.
arXiv Detail & Related papers (2024-03-11T02:17:42Z) - Solving the vehicle routing problem with deep reinforcement learning [0.0]
This paper focuses on the application of RL for the Vehicle Routing Problem (VRP), a famous problem that belongs to the class of NP-Hard problems.
In a second phase, the neural architecture behind the Actor and Critic has been established, choosing to adopt a neural architecture based on the Convolutional neural networks.
Experiments performed on a wide range of instances show that the algorithm has good generalization capabilities and can reach good solutions in a short time.
arXiv Detail & Related papers (2022-07-30T12:34:26Z) - Solving the Traveling Salesperson Problem with Precedence Constraints by
Deep Reinforcement Learning [59.14935871979047]
This work presents solutions to the Traveling Salesperson Problem with precedence constraints (TSPPC) using Deep Reinforcement Learning (DRL)
Common to these approaches is the use of graph models based on multi-head attention layers.
arXiv Detail & Related papers (2022-07-04T14:31:47Z) - Sample-Efficient, Exploration-Based Policy Optimisation for Routing
Problems [2.6782615615913348]
This paper presents a new reinforcement learning approach that is based on entropy.
In addition, we design an off-policy-based reinforcement learning technique that maximises the expected return.
We show that our model can generalise to various route problems.
arXiv Detail & Related papers (2022-05-31T09:51:48Z) - Reinforcement Learning to Solve NP-hard Problems: an Application to the
CVRP [0.0]
We evaluate the use of Reinforcement Learning (RL) to solve a classic optimization problem.
We compare two of the most promising RL approaches with traditional solving techniques on a set of benchmark instances.
We find that despite not returning the best solution, the RL approach has many advantages over traditional solvers.
arXiv Detail & Related papers (2022-01-14T11:16:17Z) - SOLO: Search Online, Learn Offline for Combinatorial Optimization
Problems [4.777801093677586]
We study problems with real world applications such as machine scheduling, routing, and assignment.
We propose a method that combines Reinforcement Learning (RL) and planning.
This method can equally be applied to both the offline, as well as online, variants of the problem, in which the problem components are not known in advance, but rather arrive during the decision-making process.
arXiv Detail & Related papers (2021-04-04T17:12:24Z) - Deep Policy Dynamic Programming for Vehicle Routing Problems [89.96386273895985]
We propose Deep Policy Dynamic Programming (D PDP) to combine the strengths of learned neurals with those of dynamic programming algorithms.
D PDP prioritizes and restricts the DP state space using a policy derived from a deep neural network, which is trained to predict edges from example solutions.
We evaluate our framework on the travelling salesman problem (TSP) and the vehicle routing problem (VRP) and show that the neural policy improves the performance of (restricted) DP algorithms.
arXiv Detail & Related papers (2021-02-23T15:33:57Z) - Learning Vehicle Routing Problems using Policy Optimisation [4.093722933440819]
State-of-the-art approaches learn a policy using reinforcement learning, and the learnt policy acts as a pseudo solver.
These approaches have demonstrated good performance in some cases, but given the large search space typical of routing problem, they can converge too quickly to poor policy.
We propose entropy regularised reinforcement learning (ERRL) that supports exploration by providing more policies.
arXiv Detail & Related papers (2020-12-24T14:18:56Z) - Logistic Q-Learning [87.00813469969167]
We propose a new reinforcement learning algorithm derived from a regularized linear-programming formulation of optimal control in MDPs.
The main feature of our algorithm is a convex loss function for policy evaluation that serves as a theoretically sound alternative to the widely used squared Bellman error.
arXiv Detail & Related papers (2020-10-21T17:14:31Z) - Deep Unfolding Network for Image Super-Resolution [159.50726840791697]
This paper proposes an end-to-end trainable unfolding network which leverages both learning-based methods and model-based methods.
The proposed network inherits the flexibility of model-based methods to super-resolve blurry, noisy images for different scale factors via a single model.
arXiv Detail & Related papers (2020-03-23T17:55:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.