Enhancing Courier Scheduling in Crowdsourced Last-Mile Delivery through
Dynamic Shift Extensions: A Deep Reinforcement Learning Approach
- URL: http://arxiv.org/abs/2402.09961v1
- Date: Thu, 15 Feb 2024 14:15:51 GMT
- Title: Enhancing Courier Scheduling in Crowdsourced Last-Mile Delivery through
Dynamic Shift Extensions: A Deep Reinforcement Learning Approach
- Authors: Zead Saleh, Ahmad Al Hanbali, and Ahmad Baubaid
- Abstract summary: This study focuses on the problem of dynamically adjusting the offline schedule through shift extensions for committed couriers.
The objective is to maximize platform profit by determining the shift extensions of couriers and the assignments of requests to couriers.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Crowdsourced delivery platforms face complex scheduling challenges to match
couriers and customer orders. We consider two types of crowdsourced couriers,
namely, committed and occasional couriers, each with different compensation
schemes. Crowdsourced delivery platforms usually schedule committed courier
shifts based on predicted demand. Therefore, platforms may devise an offline
schedule for committed couriers before the planning period. However, due to the
unpredictability of demand, there are instances where it becomes necessary to
make online adjustments to the offline schedule. In this study, we focus on the
problem of dynamically adjusting the offline schedule through shift extensions
for committed couriers. This problem is modeled as a sequential decision
process. The objective is to maximize platform profit by determining the shift
extensions of couriers and the assignments of requests to couriers. To solve
the model, a Deep Q-Network (DQN) learning approach is developed. Comparing
this model with the baseline policy where no extensions are allowed
demonstrates the benefits that platforms can gain from allowing shift
extensions in terms of reward, reduced lost order costs, and lost requests.
Additionally, sensitivity analysis showed that the total extension compensation
increases in a nonlinear manner with the arrival rate of requests, and in a
linear manner with the arrival rate of occasional couriers. On the compensation
sensitivity, the results showed that the normal scenario exhibited the highest
average number of shift extensions and, consequently, the fewest average number
of lost requests. These findings serve as evidence of the successful learning
of such dynamics by the DQN algorithm.
Related papers
- Harvesting Efficient On-Demand Order Pooling from Skilled Couriers: Enhancing Graph Representation Learning for Refining Real-time Many-to-One Assignments [11.0829498096027]
On-demand food delivery (OFD) services offer delivery fulfillment within dozens of minutes after an order is placed.
In OFD, pooling multiple orders for simultaneous delivery in real-time order assignment is a pivotal efficiency source.
The complexity and real-time nature of order assignment, making extensive calculations impractical, significantly limit the potential for order consolidation.
A SC delivery network (SCDN) is constructed, based on an enhanced attributed heterogeneous network embedding approach tailored for OFD.
arXiv Detail & Related papers (2024-06-20T18:03:27Z) - Learning with Posterior Sampling for Revenue Management under Time-varying Demand [36.22276574805786]
We discuss the revenue management problem to maximize revenue by pricing items or services.
One challenge in this problem is that the demand distribution is unknown and varies over time in real applications such as airline and retail industries.
arXiv Detail & Related papers (2024-05-08T09:28:26Z) - Self-regulating Prompts: Foundational Model Adaptation without
Forgetting [112.66832145320434]
We introduce a self-regularization framework for prompting called PromptSRC.
PromptSRC guides the prompts to optimize for both task-specific and task-agnostic general representations.
arXiv Detail & Related papers (2023-07-13T17:59:35Z) - Multiagent Reinforcement Learning for Autonomous Routing and Pickup
Problem with Adaptation to Variable Demand [1.8505047763172104]
We derive a learning framework to generate routing/pickup policies for a fleet of autonomous vehicles tasked with appearing requests on a city map.
We focus on policies that give rise to coordination amongst the vehicles, thereby reducing wait times for servicing requests.
We propose a mechanism for switching the originally trained offline approximation when the current demand is outside the original validity region.
arXiv Detail & Related papers (2022-11-28T01:11:11Z) - A Universal Error Measure for Input Predictions Applied to Online Graph
Problems [57.58926849872494]
We introduce a novel measure for quantifying the error in input predictions.
The measure captures errors due to absent predicted requests as well as unpredicted actual requests.
arXiv Detail & Related papers (2022-05-25T15:24:03Z) - Approaching sales forecasting using recurrent neural networks and
transformers [57.43518732385863]
We develop three alternatives to tackle the problem of forecasting the customer sales at day/store/item level using deep learning techniques.
Our empirical results show how good performance can be achieved by using a simple sequence to sequence architecture with minimal data preprocessing effort.
The proposed solution achieves a RMSLE of around 0.54, which is competitive with other more specific solutions to the problem proposed in the Kaggle competition.
arXiv Detail & Related papers (2022-04-16T12:03:52Z) - Learning a Discrete Set of Optimal Allocation Rules in a Queueing System
with Unknown Service Rate [1.4094389874355762]
We study admission control for a system with unknown arrival and service rates.
In our model, at every job arrival, a dispatcher decides to assign the job to an available server or block it.
Our goal is to design a dispatching policy that maximizes the long-term average reward for the dispatcher.
arXiv Detail & Related papers (2022-02-04T22:39:03Z) - Offline-to-Online Reinforcement Learning via Balanced Replay and
Pessimistic Q-Ensemble [135.6115462399788]
Deep offline reinforcement learning has made it possible to train strong robotic agents from offline datasets.
State-action distribution shift may lead to severe bootstrap error during fine-tuning.
We propose a balanced replay scheme that prioritizes samples encountered online while also encouraging the use of near-on-policy samples.
arXiv Detail & Related papers (2021-07-01T16:26:54Z) - Offline Reinforcement Learning as Anti-Exploration [49.72457136766916]
We take inspiration from the literature on bonus-based exploration to design a new offline RL agent.
The core idea is to subtract a prediction-based exploration bonus from the reward, instead of adding it for exploration.
We show that our agent is competitive with the state of the art on a set of continuous control locomotion and manipulation tasks.
arXiv Detail & Related papers (2021-06-11T14:41:30Z) - Causally-motivated Shortcut Removal Using Auxiliary Labels [63.686580185674195]
Key challenge to learning such risk-invariant predictors is shortcut learning.
We propose a flexible, causally-motivated approach to address this challenge.
We show both theoretically and empirically that this causally-motivated regularization scheme yields robust predictors.
arXiv Detail & Related papers (2021-05-13T16:58:45Z) - Reinforcement Learning for Freight Booking Control Problems [5.08128537391027]
Booking control problems are sequential decision-making problems in revenue management.
We train a supervised learning model to predict the objective of an operational problem.
We then deploy the model within reinforcement learning algorithms to compute control policies.
arXiv Detail & Related papers (2021-01-29T22:11:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.