Related papers: RP-DQN: An application of Q-Learning to Vehicle Routing Problems

Related papers

Learning for routing: A guided review of recent developments and future directions [3.3629991374416477]
We focus on routing problems such as the traveling salesman problem (TSP) and the vehicle routing problem (VRP)<n>Due to the inherent complexity of these problems, exact algorithms often require excessive computational time to find optimal solutions.<n>We propose a taxonomy ML-based routing methods into construction-based and improvement-based approaches, highlighting their applicability to various problem characteristics.
arXiv Detail & Related papers (2025-06-30T19:39:11Z)
TuneNSearch: a hybrid transfer learning and local search approach for solving vehicle routing problems [43.89334324926175]
TuneNSearch is a hybrid transfer learning and local search approach for addressing different variants of vehicle routing problems (VRP) We first pre-train a reinforcement learning model on the multi-depot VRP, followed by a short fine-tuning phase to adapt it to different variants. Results show that TuneNSearch outperforms many existing state-of-the-art models trained for each VRP variant, requiring only one-fifth of the training epochs.
arXiv Detail & Related papers (2025-03-16T21:34:11Z)
Enhancing Multi-Step Reasoning Abilities of Language Models through Direct Q-Function Optimization [50.485788083202124]
Reinforcement Learning (RL) plays a crucial role in aligning large language models with human preferences and improving their ability to perform complex tasks. We introduce Direct Q-function Optimization (DQO), which formulates the response generation process as a Markov Decision Process (MDP) and utilizes the soft actor-critic (SAC) framework to optimize a Q-function directly parameterized by the language model. Experimental results on two math problem-solving datasets, GSM8K and MATH, demonstrate that DQO outperforms previous methods, establishing it as a promising offline reinforcement learning approach for aligning language models.
arXiv Detail & Related papers (2024-10-11T23:29:20Z)
An Efficient Learning-based Solver Comparable to Metaheuristics for the Capacitated Arc Routing Problem [67.92544792239086]
We introduce an NN-based solver to significantly narrow the gap with advanced metaheuristics. First, we propose direction-aware facilitating attention model (DaAM) to incorporate directionality into the embedding process. Second, we design a supervised reinforcement learning scheme that involves supervised pre-training to establish a robust initial policy.
arXiv Detail & Related papers (2024-03-11T02:17:42Z)
Solving the vehicle routing problem with deep reinforcement learning [0.0]
This paper focuses on the application of RL for the Vehicle Routing Problem (VRP), a famous problem that belongs to the class of NP-Hard problems. In a second phase, the neural architecture behind the Actor and Critic has been established, choosing to adopt a neural architecture based on the Convolutional neural networks. Experiments performed on a wide range of instances show that the algorithm has good generalization capabilities and can reach good solutions in a short time.
arXiv Detail & Related papers (2022-07-30T12:34:26Z)
Solving the Traveling Salesperson Problem with Precedence Constraints by Deep Reinforcement Learning [59.14935871979047]
This work presents solutions to the Traveling Salesperson Problem with precedence constraints (TSPPC) using Deep Reinforcement Learning (DRL) Common to these approaches is the use of graph models based on multi-head attention layers.
arXiv Detail & Related papers (2022-07-04T14:31:47Z)
Sample-Efficient, Exploration-Based Policy Optimisation for Routing Problems [2.6782615615913348]
This paper presents a new reinforcement learning approach that is based on entropy. In addition, we design an off-policy-based reinforcement learning technique that maximises the expected return. We show that our model can generalise to various route problems.
arXiv Detail & Related papers (2022-05-31T09:51:48Z)
Reinforcement Learning to Solve NP-hard Problems: an Application to the CVRP [0.0]
We evaluate the use of Reinforcement Learning (RL) to solve a classic optimization problem. We compare two of the most promising RL approaches with traditional solving techniques on a set of benchmark instances. We find that despite not returning the best solution, the RL approach has many advantages over traditional solvers.
arXiv Detail & Related papers (2022-01-14T11:16:17Z)
SOLO: Search Online, Learn Offline for Combinatorial Optimization Problems [4.777801093677586]
We study problems with real world applications such as machine scheduling, routing, and assignment. We propose a method that combines Reinforcement Learning (RL) and planning. This method can equally be applied to both the offline, as well as online, variants of the problem, in which the problem components are not known in advance, but rather arrive during the decision-making process.
arXiv Detail & Related papers (2021-04-04T17:12:24Z)
Deep Policy Dynamic Programming for Vehicle Routing Problems [89.96386273895985]
We propose Deep Policy Dynamic Programming (D PDP) to combine the strengths of learned neurals with those of dynamic programming algorithms. D PDP prioritizes and restricts the DP state space using a policy derived from a deep neural network, which is trained to predict edges from example solutions. We evaluate our framework on the travelling salesman problem (TSP) and the vehicle routing problem (VRP) and show that the neural policy improves the performance of (restricted) DP algorithms.
arXiv Detail & Related papers (2021-02-23T15:33:57Z)
Learning Vehicle Routing Problems using Policy Optimisation [4.093722933440819]
State-of-the-art approaches learn a policy using reinforcement learning, and the learnt policy acts as a pseudo solver. These approaches have demonstrated good performance in some cases, but given the large search space typical of routing problem, they can converge too quickly to poor policy. We propose entropy regularised reinforcement learning (ERRL) that supports exploration by providing more policies.
arXiv Detail & Related papers (2020-12-24T14:18:56Z)
Logistic Q-Learning [87.00813469969167]
We propose a new reinforcement learning algorithm derived from a regularized linear-programming formulation of optimal control in MDPs. The main feature of our algorithm is a convex loss function for policy evaluation that serves as a theoretically sound alternative to the widely used squared Bellman error.
arXiv Detail & Related papers (2020-10-21T17:14:31Z)
Deep Unfolding Network for Image Super-Resolution [159.50726840791697]
This paper proposes an end-to-end trainable unfolding network which leverages both learning-based methods and model-based methods. The proposed network inherits the flexibility of model-based methods to super-resolve blurry, noisy images for different scale factors via a single model.
arXiv Detail & Related papers (2020-03-23T17:55:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.