Related papers: Joint Matching and Pricing for Crowd-shipping with In-store Customers

Joint Matching and Pricing for Crowd-shipping with In-store Customers

URL: http://arxiv.org/abs/2507.01749v1
Date: Wed, 02 Jul 2025 14:27:32 GMT
Title: Joint Matching and Pricing for Crowd-shipping with In-store Customers
Authors: Arash Dehghan, Mucahit Cevik, Merve Bodur, Bissan Ghaddar,
Abstract summary: This paper examines the use of in-store customers as delivery couriers in a centralized crowd-shipping system.<n>We propose a Markov Decision Process (MDP) model that captures key uncertainties, including the arrival of orders and crowd-shippers.<n>We show that the integrated NeurADP + DDQN policy achieves notable improvements in delivery cost efficiency.
Score: 2.7950888004779064
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper examines the use of in-store customers as delivery couriers in a centralized crowd-shipping system, targeting the growing need for efficient last-mile delivery in urban areas. We consider a brick-and-mortar retail setting where shoppers are offered compensation to deliver time-sensitive online orders. To manage this process, we propose a Markov Decision Process (MDP) model that captures key uncertainties, including the stochastic arrival of orders and crowd-shippers, and the probabilistic acceptance of delivery offers. Our solution approach integrates Neural Approximate Dynamic Programming (NeurADP) for adaptive order-to-shopper assignment with a Deep Double Q-Network (DDQN) for dynamic pricing. This joint optimization strategy enables multi-drop routing and accounts for offer acceptance uncertainty, aligning more closely with real-world operations. Experimental results demonstrate that the integrated NeurADP + DDQN policy achieves notable improvements in delivery cost efficiency, with up to 6.7\% savings over NeurADP with fixed pricing and approximately 18\% over myopic baselines. We also show that allowing flexible delivery delays and enabling multi-destination routing further reduces operational costs by 8\% and 17\%, respectively. These findings underscore the advantages of dynamic, forward-looking policies in crowd-shipping systems and offer practical guidance for urban logistics operators.

Related papers

Self-Regulation and Requesting Interventions [63.5863047447313]
We propose an offline framework that trains a "helper" policy to request interventions.<n>We score optimal intervention timing with PRMs and train the helper model on these labeled trajectories.<n>This offline approach significantly reduces costly intervention calls during training.
arXiv Detail & Related papers (2025-02-07T00:06:17Z)
Process Reinforcement through Implicit Rewards [95.7442934212076]
Dense process rewards have proven a more effective alternative to the sparse outcome-level rewards in the inference-time scaling of large language models (LLMs)<n>Dense rewards also offer an appealing choice for the reinforcement learning (RL) of LLMs since their fine-grained rewards have the potential to address some inherent issues of outcome rewards.<n>This can be primarily attributed to the challenges of training process reward models (PRMs) online, where collecting high-quality process labels is prohibitively expensive.<n>We propose PRIME, which enables online PRM updates using only policy rollouts and outcome labels through implict process rewards
arXiv Detail & Related papers (2025-02-03T15:43:48Z)
Procurement Auctions via Approximately Optimal Submodular Optimization [53.93943270902349]
We study procurement auctions, where an auctioneer seeks to acquire services from strategic sellers with private costs. Our goal is to design computationally efficient auctions that maximize the difference between the quality of the acquired services and the total cost of the sellers.
arXiv Detail & Related papers (2024-11-20T18:06:55Z)
Dynamic Demand Management for Parcel Lockers [0.0]
We develop a solution framework that orchestrates algorithmic techniques rooted in Sequential Decision Analytics and Reinforcement Learning. Our innovative approach to combine these techniques enables us to address the strong interrelations between the two decision types. Our computational study shows that our method outperforms a myopic benchmark by 13.7% and an industry-inspired policy by 12.6%.
arXiv Detail & Related papers (2024-09-08T11:38:48Z)
A Primal-Dual Online Learning Approach for Dynamic Pricing of Sequentially Displayed Complementary Items under Sale Constraints [54.46126953873298]
We address the problem of dynamically pricing complementary items that are sequentially displayed to customers. Coherent pricing policies for complementary items are essential because optimizing the pricing of each item individually is ineffective. We empirically evaluate our approach using synthetic settings randomly generated from real-world data, and compare its performance in terms of constraints violation and regret.
arXiv Detail & Related papers (2024-07-08T09:55:31Z)
Learning Dynamic Selection and Pricing of Out-of-Home Deliveries [1.2289361708127877]
We propose Dynamic Selection and Pricing of OOH (DSPO), an algorithmic pipeline that uses a novel spatial-temporal state encoding as input to a convolutional neural network. Our extensive numerical study, guided by real-world data, reveals that DSPO can save 19.9%pt in costs compared to a situation without OOH locations. We provide comprehensive insights into the complex interplay between OOH delivery dynamics and customer behavior influenced by pricing strategies.
arXiv Detail & Related papers (2023-11-23T12:55:10Z)
Price-Discrimination Game for Distributed Resource Management in Federated Learning [3.724337025141794]
In vanilla federated learning (FL) such as FedAvg, the parameter server (PS) and multiple distributed clients can form a typical buyer's market. This paper proposes to differentiate the pricing for services provided by different clients rather than simply providing the same service pricing for different clients.
arXiv Detail & Related papers (2023-08-26T10:09:46Z)
Playing hide and seek: tackling in-store picking operations while improving customer experience [0.0]
We formalize a new problem called Dynamic In-store Picker Problem routing (diPRP) In this relevant problem - diPRP - a picker tries to pick online orders while minimizing customer encounters. Our work suggests that retailers should be able to scale the in-store picking of online orders without jeopardizing the experience of offline customers.
arXiv Detail & Related papers (2023-01-05T16:35:17Z)
No-Regret Learning in Two-Echelon Supply Chain with Unknown Demand Distribution [48.27759561064771]
We consider the two-echelon supply chain model introduced in [Cachon and Zipkin, 1999] under two different settings. We design algorithms that achieve favorable guarantees for both regret and convergence to the optimal inventory decision in both settings. Our algorithms are based on Online Gradient Descent and Online Newton Step, together with several new ingredients specifically designed for our problem.
arXiv Detail & Related papers (2022-10-23T08:45:39Z)
PANDA: Prompt Transfer Meets Knowledge Distillation for Efficient Model Adaptation [89.0074567748505]
We propose a new metric to accurately predict the prompt transferability (regarding (i)), and a novel PoT approach (namely PANDA) Our proposed metric works well to predict the prompt transferability; 2) our PANDA consistently outperforms the vanilla PoT approach by 2.3% average score (up to 24.1%) among all tasks and model sizes; 3) with our PANDA approach, prompt-tuning can achieve competitive and even better performance than model-tuning in various PLM scales scenarios.
arXiv Detail & Related papers (2022-08-22T09:14:14Z)
A Deep Reinforcement Learning Approach for Constrained Online Logistics Route Assignment [4.367543599338385]
It is crucial for the logistics industry on how to assign a candidate logistics route for each shipping parcel properly. This online route-assignment problem can be viewed as a constrained online decision-making problem. We develop a model-free DRL approach named PPO-RA, in which Proximal Policy Optimization (PPO) is improved with dedicated techniques to address the challenges for route assignment (RA)
arXiv Detail & Related papers (2021-09-08T07:27:39Z)
Low-Latency Federated Learning over Wireless Channels with Differential Privacy [142.5983499872664]
In federated learning (FL), model training is distributed over clients and local models are aggregated by a central server. In this paper, we aim to minimize FL training delay over wireless channels, constrained by overall training performance as well as each client's differential privacy (DP) requirement.
arXiv Detail & Related papers (2021-06-20T13:51:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.