A Deep Reinforcement Learning Approach for Constrained Online Logistics
Route Assignment
- URL: http://arxiv.org/abs/2109.03467v1
- Date: Wed, 8 Sep 2021 07:27:39 GMT
- Title: A Deep Reinforcement Learning Approach for Constrained Online Logistics
Route Assignment
- Authors: Hao Zeng, Yangdong Liu, Dandan Zhang, Kunpeng Han, Haoyuan Hu
- Abstract summary: It is crucial for the logistics industry on how to assign a candidate logistics route for each shipping parcel properly.
This online route-assignment problem can be viewed as a constrained online decision-making problem.
We develop a model-free DRL approach named PPO-RA, in which Proximal Policy Optimization (PPO) is improved with dedicated techniques to address the challenges for route assignment (RA)
- Score: 4.367543599338385
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As online shopping prevails and e-commerce platforms emerge, there is a
tremendous number of parcels being transported every day. Thus, it is crucial
for the logistics industry on how to assign a candidate logistics route for
each shipping parcel properly as it leaves a significant impact on the total
logistics cost optimization and business constraints satisfaction such as
transit hub capacity and delivery proportion of delivery providers. This online
route-assignment problem can be viewed as a constrained online decision-making
problem. Notably, the large amount (beyond ${10^5}$) of daily parcels, the
variability and non-Markovian characteristics of parcel information impose
difficulties on attaining (near-) optimal solution without violating
constraints excessively. In this paper, we develop a model-free DRL approach
named PPO-RA, in which Proximal Policy Optimization (PPO) is improved with
dedicated techniques to address the challenges for route assignment (RA). The
actor and critic networks use attention mechanism and parameter sharing to
accommodate each incoming parcel with varying numbers and identities of
candidate routes, without modeling non-Markovian parcel arriving dynamics since
we make assumption of i.i.d. parcel arrival. We use recorded delivery parcel
data to evaluate the performance of PPO-RA by comparing it with widely-used
baselines via simulation. The results show the capability of the proposed
approach to achieve considerable cost savings while satisfying most
constraints.
Related papers
Err
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.