Solving the Order Batching and Sequencing Problem using Deep
Reinforcement Learning
- URL: http://arxiv.org/abs/2006.09507v1
- Date: Tue, 16 Jun 2020 20:40:41 GMT
- Title: Solving the Order Batching and Sequencing Problem using Deep
Reinforcement Learning
- Authors: Bram Cals, Yingqian Zhang, Remco Dijkman, Claudy van Dorst
- Abstract summary: We present a Deep Reinforcement Learning (DRL) approach for deciding how and when orders should be batched and picked in a warehouse to minimize the number of tardy orders.
In particular, the technique facilitates making decisions on whether an order should be picked individually (pick-by-order) or picked in a batch with other orders (pick-by-batch) and if so with which other orders.
- Score: 2.4565068569913384
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In e-commerce markets, on time delivery is of great importance to customer
satisfaction. In this paper, we present a Deep Reinforcement Learning (DRL)
approach for deciding how and when orders should be batched and picked in a
warehouse to minimize the number of tardy orders. In particular, the technique
facilitates making decisions on whether an order should be picked individually
(pick-by-order) or picked in a batch with other orders (pick-by-batch), and if
so with which other orders. We approach the problem by formulating it as a
semi-Markov decision process and develop a vector-based state representation
that includes the characteristics of the warehouse system. This allows us to
create a deep reinforcement learning solution that learns a strategy by
interacting with the environment and solve the problem with a proximal policy
optimization algorithm. We evaluate the performance of the proposed DRL
approach by comparing it with several batching and sequencing heuristics in
different problem settings. The results show that the DRL approach is able to
develop a strategy that produces consistent, good solutions and performs better
than the proposed heuristics.
Related papers
- Learning to Solve the Min-Max Mixed-Shelves Picker-Routing Problem via Hierarchical and Parallel Decoding [0.3867363075280544]
The Mixed-Shelves Picker Routing Problem (MSPRP) is a fundamental challenge in logistics, where pickers must navigate a mixed-shelves environment to retrieve SKUs efficiently.
We propose a novel hierarchical and parallel decoding approach for solving the min-max variant of the MSPRP via multi-agent reinforcement learning.
Experiments show state-of-the-art performance in both solution quality and inference speed, particularly for large-scale and out-of-distribution instances.
arXiv Detail & Related papers (2025-02-14T15:42:30Z) - Take a Step and Reconsider: Sequence Decoding for Self-Improved Neural Combinatorial Optimization [1.1510009152620668]
We present a simple and problem-independent sequence decoding method for self-improved learning.
By modifying the policy to ignore previously sampled sequences, we force it to consider only unseen alternatives.
Our method outperforms previous NCO approaches on the Job Shop Scheduling Problem.
arXiv Detail & Related papers (2024-07-24T12:06:09Z) - Actions Speak What You Want: Provably Sample-Efficient Reinforcement
Learning of the Quantal Stackelberg Equilibrium from Strategic Feedbacks [94.07688076435818]
We study reinforcement learning for learning a Quantal Stackelberg Equilibrium (QSE) in an episodic Markov game with a leader-follower structure.
Our algorithms are based on (i) learning the quantal response model via maximum likelihood estimation and (ii) model-free or model-based RL for solving the leader's decision making problem.
arXiv Detail & Related papers (2023-07-26T10:24:17Z) - A Machine Learning Approach to Two-Stage Adaptive Robust Optimization [6.943816076962257]
We propose an approach based on machine learning to solve two-stage linear adaptive robust optimization problems.
We encode the optimal here-and-now decisions, the worst-case scenarios associated with the optimal here-and-now decisions, and the optimal wait-and-see decisions.
We train a machine learning model that predicts high-quality strategies for the here-and-now decisions, the worst-case scenarios associated with the optimal here-and-now decisions, and the wait-and-see decisions.
arXiv Detail & Related papers (2023-07-23T19:23:06Z) - Online Control of Adaptive Large Neighborhood Search using Deep Reinforcement Learning [4.374837991804085]
We introduce a Deep Reinforcement Learning based approach called DR-ALNS that selects operators, adjusts parameters, and controls the acceptance criterion throughout the search.
We evaluate the proposed method on a problem with orienteering weights and time windows, as presented in an IJCAI competition.
The results show that our approach outperforms vanilla ALNS, ALNS tuned with Bayesian optimization, and two state-of-the-art DRL approaches.
arXiv Detail & Related papers (2022-11-01T21:33:46Z) - No-Regret Learning in Two-Echelon Supply Chain with Unknown Demand
Distribution [48.27759561064771]
We consider the two-echelon supply chain model introduced in [Cachon and Zipkin, 1999] under two different settings.
We design algorithms that achieve favorable guarantees for both regret and convergence to the optimal inventory decision in both settings.
Our algorithms are based on Online Gradient Descent and Online Newton Step, together with several new ingredients specifically designed for our problem.
arXiv Detail & Related papers (2022-10-23T08:45:39Z) - Deep Policy Iteration with Integer Programming for Inventory Management [8.27175065641495]
We present a framework for optimizing long-term discounted reward problems with large accessible action space and state dependent constraints.
Our proposed Programmable Actor Reinforcement Learning (PARL) uses a deep-policy method that leverages neural networks (NNs) to approximate the value function.
We benchmark the proposed algorithm against state-of-the-art RL algorithms and commonly used replenishments and find it considerably outperforms existing methods by as much as 14.7% on average.
arXiv Detail & Related papers (2021-12-04T01:40:34Z) - Modularity in Reinforcement Learning via Algorithmic Independence in
Credit Assignment [79.5678820246642]
We show that certain action-value methods are more sample efficient than policy-gradient methods on transfer problems that require only sparse changes to a sequence of previously optimal decisions.
We generalize the recently proposed societal decision-making framework as a more granular formalism than the Markov decision process.
arXiv Detail & Related papers (2021-06-28T21:29:13Z) - Adversarial Option-Aware Hierarchical Imitation Learning [89.92994158193237]
We propose Option-GAIL, a novel method to learn skills at long horizon.
The key idea of Option-GAIL is modeling the task hierarchy by options and train the policy via generative adversarial optimization.
Experiments show that Option-GAIL outperforms other counterparts consistently across a variety of tasks.
arXiv Detail & Related papers (2021-06-10T06:42:05Z) - Universal Trading for Order Execution with Oracle Policy Distillation [99.57416828489568]
We propose a novel universal trading policy optimization framework to bridge the gap between the noisy yet imperfect market states and the optimal action sequences for order execution.
We show that our framework can better guide the learning of the common policy towards practically optimal execution by an oracle teacher with perfect information.
arXiv Detail & Related papers (2021-01-28T05:52:18Z) - Combining Deep Learning and Optimization for Security-Constrained
Optimal Power Flow [94.24763814458686]
Security-constrained optimal power flow (SCOPF) is fundamental in power systems.
Modeling of APR within the SCOPF problem results in complex large-scale mixed-integer programs.
This paper proposes a novel approach that combines deep learning and robust optimization techniques.
arXiv Detail & Related papers (2020-07-14T12:38:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.