Deep Reinforcement Learning for Traveling Purchaser Problems
- URL: http://arxiv.org/abs/2404.02476v5
- Date: Mon, 14 Oct 2024 13:33:42 GMT
- Title: Deep Reinforcement Learning for Traveling Purchaser Problems
- Authors: Haofeng Yuan, Rongping Zhu, Wanlu Yang, Shiji Song, Keyou You, Wei Fan, C. L. Philip Chen,
- Abstract summary: The traveling purchaser problem (TPP) is an important optimization problem with broad applications.
We propose a novel approach based on deep reinforcement learning (DRL), which addresses route construction and purchase planning separately.
By introducing a meta-learning strategy, the policy network can be trained stably on large-sized TPP instances.
- Score: 63.37136587778153
- License:
- Abstract: The traveling purchaser problem (TPP) is an important combinatorial optimization problem with broad applications. Due to the coupling between routing and purchasing, existing works on TPPs commonly address route construction and purchase planning simultaneously, which, however, leads to exact methods with high computational cost and heuristics with sophisticated design but limited performance. In sharp contrast, we propose a novel approach based on deep reinforcement learning (DRL), which addresses route construction and purchase planning separately, while evaluating and optimizing the solution from a global perspective. The key components of our approach include a bipartite graph representation for TPPs to capture the market-product relations, and a policy network that extracts information from the bipartite graph and uses it to sequentially construct the route. One significant benefit of our framework is that we can efficiently construct the route using the policy network, and once the route is determined, the associated purchasing plan can be easily derived through linear programming, while, leveraging DRL, we can train the policy network to optimize the global solution objective. Furthermore, by introducing a meta-learning strategy, the policy network can be trained stably on large-sized TPP instances, and generalize well across instances of varying sizes and distributions, even to much larger instances that are never seen during training. Experiments on various synthetic TPP instances and the TPPLIB benchmark demonstrate that our DRL-based approach can significantly outperform well-established TPP heuristics, reducing the optimality gap by 40%-90%, and also showing an advantage in runtime, especially on large-sized instances.
Related papers
- OffRIPP: Offline RL-based Informative Path Planning [12.705099730591671]
IPP is a crucial task in robotics, where agents must design paths to gather valuable information about a target environment.
We propose an offline RL-based IPP framework that optimized information gain without requiring real-time interaction during training.
We validate the framework through extensive simulations and real-world experiments.
arXiv Detail & Related papers (2024-09-25T11:30:59Z) - An Efficient Learning-based Solver Comparable to Metaheuristics for the
Capacitated Arc Routing Problem [67.92544792239086]
We introduce an NN-based solver to significantly narrow the gap with advanced metaheuristics.
First, we propose direction-aware facilitating attention model (DaAM) to incorporate directionality into the embedding process.
Second, we design a supervised reinforcement learning scheme that involves supervised pre-training to establish a robust initial policy.
arXiv Detail & Related papers (2024-03-11T02:17:42Z) - Pointerformer: Deep Reinforced Multi-Pointer Transformer for the
Traveling Salesman Problem [67.32731657297377]
Traveling Salesman Problem (TSP) is a classic routing optimization problem originally arising in the domain of transportation and logistics.
Recently, Deep Reinforcement Learning has been increasingly employed to solve TSP due to its high inference efficiency.
We propose a novel end-to-end DRL approach, referred to as Pointerformer, based on multi-pointer Transformer.
arXiv Detail & Related papers (2023-04-19T03:48:32Z) - MARLIN: Soft Actor-Critic based Reinforcement Learning for Congestion
Control in Real Networks [63.24965775030673]
We propose a novel Reinforcement Learning (RL) approach to design generic Congestion Control (CC) algorithms.
Our solution, MARLIN, uses the Soft Actor-Critic algorithm to maximize both entropy and return.
We trained MARLIN on a real network with varying background traffic patterns to overcome the sim-to-real mismatch.
arXiv Detail & Related papers (2023-02-02T18:27:20Z) - Higher-Order Generalization Bounds: Learning Deep Probabilistic Programs
via PAC-Bayes Objectives [0.0]
We offer a framework for representing PAC-Bayes generalization bounds as programs using DPP-based methods.
In particular, we show that DPP techniques may be leveraged to derive generalization bounds that draw on the compositionality of DPP representations.
In turn, the bounds we introduce offer principled training objectives for higher-order probabilistic programs.
arXiv Detail & Related papers (2022-03-30T01:14:56Z) - Sequential Information Design: Markov Persuasion Process and Its
Efficient Reinforcement Learning [156.5667417159582]
This paper proposes a novel model of sequential information design, namely the Markov persuasion processes (MPPs)
Planning in MPPs faces the unique challenge in finding a signaling policy that is simultaneously persuasive to the myopic receivers and inducing the optimal long-term cumulative utilities of the sender.
We design a provably efficient no-regret learning algorithm, the Optimism-Pessimism Principle for Persuasion Process (OP4), which features a novel combination of both optimism and pessimism principles.
arXiv Detail & Related papers (2022-02-22T05:41:43Z) - Towards Deployment-Efficient Reinforcement Learning: Lower Bound and
Optimality [141.89413461337324]
Deployment efficiency is an important criterion for many real-world applications of reinforcement learning (RL)
We propose a theoretical formulation for deployment-efficient RL (DE-RL) from an "optimization with constraints" perspective.
arXiv Detail & Related papers (2022-02-14T01:31:46Z) - A Deep Reinforcement Learning Approach for Constrained Online Logistics
Route Assignment [4.367543599338385]
It is crucial for the logistics industry on how to assign a candidate logistics route for each shipping parcel properly.
This online route-assignment problem can be viewed as a constrained online decision-making problem.
We develop a model-free DRL approach named PPO-RA, in which Proximal Policy Optimization (PPO) is improved with dedicated techniques to address the challenges for route assignment (RA)
arXiv Detail & Related papers (2021-09-08T07:27:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.