A Deep Reinforcement Learning Approach for Constrained Online Logistics
Route Assignment
- URL: http://arxiv.org/abs/2109.03467v1
- Date: Wed, 8 Sep 2021 07:27:39 GMT
- Title: A Deep Reinforcement Learning Approach for Constrained Online Logistics
Route Assignment
- Authors: Hao Zeng, Yangdong Liu, Dandan Zhang, Kunpeng Han, Haoyuan Hu
- Abstract summary: It is crucial for the logistics industry on how to assign a candidate logistics route for each shipping parcel properly.
This online route-assignment problem can be viewed as a constrained online decision-making problem.
We develop a model-free DRL approach named PPO-RA, in which Proximal Policy Optimization (PPO) is improved with dedicated techniques to address the challenges for route assignment (RA)
- Score: 4.367543599338385
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As online shopping prevails and e-commerce platforms emerge, there is a
tremendous number of parcels being transported every day. Thus, it is crucial
for the logistics industry on how to assign a candidate logistics route for
each shipping parcel properly as it leaves a significant impact on the total
logistics cost optimization and business constraints satisfaction such as
transit hub capacity and delivery proportion of delivery providers. This online
route-assignment problem can be viewed as a constrained online decision-making
problem. Notably, the large amount (beyond ${10^5}$) of daily parcels, the
variability and non-Markovian characteristics of parcel information impose
difficulties on attaining (near-) optimal solution without violating
constraints excessively. In this paper, we develop a model-free DRL approach
named PPO-RA, in which Proximal Policy Optimization (PPO) is improved with
dedicated techniques to address the challenges for route assignment (RA). The
actor and critic networks use attention mechanism and parameter sharing to
accommodate each incoming parcel with varying numbers and identities of
candidate routes, without modeling non-Markovian parcel arriving dynamics since
we make assumption of i.i.d. parcel arrival. We use recorded delivery parcel
data to evaluate the performance of PPO-RA by comparing it with widely-used
baselines via simulation. The results show the capability of the proposed
approach to achieve considerable cost savings while satisfying most
constraints.
Related papers
- A Primal-Dual Online Learning Approach for Dynamic Pricing of Sequentially Displayed Complementary Items under Sale Constraints [54.46126953873298]
We address the problem of dynamically pricing complementary items that are sequentially displayed to customers.
Coherent pricing policies for complementary items are essential because optimizing the pricing of each item individually is ineffective.
We empirically evaluate our approach using synthetic settings randomly generated from real-world data, and compare its performance in terms of constraints violation and regret.
arXiv Detail & Related papers (2024-07-08T09:55:31Z) - Deep Reinforcement Learning for Traveling Purchaser Problems [63.37136587778153]
The traveling purchaser problem (TPP) is an important optimization problem with broad applications.
We propose a novel approach based on deep reinforcement learning (DRL), which addresses route construction and purchase planning separately.
By introducing a meta-learning strategy, the policy network can be trained stably on large-sized TPP instances.
arXiv Detail & Related papers (2024-04-03T05:32:10Z) - Learning Logic Specifications for Policy Guidance in POMDPs: an
Inductive Logic Programming Approach [57.788675205519986]
We learn high-quality traces from POMDP executions generated by any solver.
We exploit data- and time-efficient Indu Logic Programming (ILP) to generate interpretable belief-based policy specifications.
We show that learneds expressed in Answer Set Programming (ASP) yield performance superior to neural networks and similar to optimal handcrafted task-specifics within lower computational time.
arXiv Detail & Related papers (2024-02-29T15:36:01Z) - Individually Rational Collaborative Vehicle Routing through
Give-And-Take Exchanges [4.266376725904727]
We introduce a novel multi-agent approach to this problem, focusing on the Collaborative Vehicle Routing Problem (CVRP) through the lens of individual rationality.
By facilitating cooperation among competing logistics agents through a Give-and-Take approach, we show that it is possible to reduce travel distance and increase operational efficiency system-wide.
arXiv Detail & Related papers (2023-08-31T07:18:37Z) - AI-aided Traffic Control Scheme for M2M Communications in the Internet
of Vehicles [61.21359293642559]
The dynamics of traffic and the heterogeneous requirements of different IoV applications are not considered in most existing studies.
We consider a hybrid traffic control scheme and use proximal policy optimization (PPO) method to tackle it.
arXiv Detail & Related papers (2022-03-05T10:54:05Z) - Pessimistic Model Selection for Offline Deep Reinforcement Learning [56.282483586473816]
Deep Reinforcement Learning (DRL) has demonstrated great potentials in solving sequential decision making problems in many applications.
One main barrier is the over-fitting issue that leads to poor generalizability of the policy learned by DRL.
We propose a pessimistic model selection (PMS) approach for offline DRL with a theoretical guarantee.
arXiv Detail & Related papers (2021-11-29T06:29:49Z) - Learning to Optimize Industry-Scale Dynamic Pickup and Delivery Problems [17.076557377480444]
The Dynamic Pickup and Delivery Problem (D PDP) is aimed at dynamically scheduling vehicles among multiple sites in order to minimize the cost when delivery orders are not known a priori.
We propose a data-driven approach, Spatial-Temporal Aided Double Deep Graph Network (ST-DDGN), to solve industry-scale D PDP.
Our method is entirely data driven and thus adaptive, i.e., the relational representation of adjacent vehicles can be learned and corrected by ST-DDGN from data periodically.
arXiv Detail & Related papers (2021-05-27T01:16:00Z) - Dynamic Bicycle Dispatching of Dockless Public Bicycle-sharing Systems
using Multi-objective Reinforcement Learning [79.61517670541863]
How to use AI to provide efficient bicycle dispatching solutions based on dynamic bicycle rental demand is an essential issue for dockless PBS (DL-PBS)
We propose a dynamic bicycle dispatching algorithm based on multi-objective reinforcement learning (MORL-BD) to provide the optimal bicycle dispatching solution for DL-PBS.
arXiv Detail & Related papers (2021-01-19T03:09:51Z) - Mathematical simulation of package delivery optimization using a
combination of carriers [0.0]
Authors analyzed and proposed a solution for the problem of cost optimization for packages delivery for long-distance deliveries using a combination of paths delivered by supplier fleets, worldwide and local carriers.
Experiment is based on data sources of the United States companies using a wide range of carriers for delivery services.
arXiv Detail & Related papers (2020-11-02T18:44:04Z) - A Multi-Agent System for Solving the Dynamic Capacitated Vehicle Routing
Problem with Stochastic Customers using Trajectory Data Mining [0.0]
E-commerce has created new challenges for logistics companies, one of which is being able to deliver products quickly and at low cost.
Our work presents a multi-agent system that uses trajectory data mining techniques to extract territorial patterns and use them in the dynamic creation of last-mile routes.
arXiv Detail & Related papers (2020-09-26T21:36:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.