Solving the Order Batching and Sequencing Problem using Deep
Reinforcement Learning
- URL: http://arxiv.org/abs/2006.09507v1
- Date: Tue, 16 Jun 2020 20:40:41 GMT
- Title: Solving the Order Batching and Sequencing Problem using Deep
Reinforcement Learning
- Authors: Bram Cals, Yingqian Zhang, Remco Dijkman, Claudy van Dorst
- Abstract summary: We present a Deep Reinforcement Learning (DRL) approach for deciding how and when orders should be batched and picked in a warehouse to minimize the number of tardy orders.
In particular, the technique facilitates making decisions on whether an order should be picked individually (pick-by-order) or picked in a batch with other orders (pick-by-batch) and if so with which other orders.
- Score: 2.4565068569913384
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In e-commerce markets, on time delivery is of great importance to customer
satisfaction. In this paper, we present a Deep Reinforcement Learning (DRL)
approach for deciding how and when orders should be batched and picked in a
warehouse to minimize the number of tardy orders. In particular, the technique
facilitates making decisions on whether an order should be picked individually
(pick-by-order) or picked in a batch with other orders (pick-by-batch), and if
so with which other orders. We approach the problem by formulating it as a
semi-Markov decision process and develop a vector-based state representation
that includes the characteristics of the warehouse system. This allows us to
create a deep reinforcement learning solution that learns a strategy by
interacting with the environment and solve the problem with a proximal policy
optimization algorithm. We evaluate the performance of the proposed DRL
approach by comparing it with several batching and sequencing heuristics in
different problem settings. The results show that the DRL approach is able to
develop a strategy that produces consistent, good solutions and performs better
than the proposed heuristics.
Related papers
- Take a Step and Reconsider: Sequence Decoding for Self-Improved Neural Combinatorial Optimization [1.1510009152620668]
We present a simple and problem-independent sequence decoding method for self-improved learning.
By modifying the policy to ignore previously sampled sequences, we force it to consider only unseen alternatives.
Our method outperforms previous NCO approaches on the Job Shop Scheduling Problem.
arXiv Detail & Related papers (2024-07-24T12:06:09Z) - Actions Speak What You Want: Provably Sample-Efficient Reinforcement
Learning of the Quantal Stackelberg Equilibrium from Strategic Feedbacks [94.07688076435818]
We study reinforcement learning for learning a Quantal Stackelberg Equilibrium (QSE) in an episodic Markov game with a leader-follower structure.
Our algorithms are based on (i) learning the quantal response model via maximum likelihood estimation and (ii) model-free or model-based RL for solving the leader's decision making problem.
arXiv Detail & Related papers (2023-07-26T10:24:17Z) - A Machine Learning Approach to Two-Stage Adaptive Robust Optimization [6.943816076962257]
We propose an approach based on machine learning to solve two-stage linear adaptive robust optimization problems.
We encode the optimal here-and-now decisions, the worst-case scenarios associated with the optimal here-and-now decisions, and the optimal wait-and-see decisions.
We train a machine learning model that predicts high-quality strategies for the here-and-now decisions, the worst-case scenarios associated with the optimal here-and-now decisions, and the wait-and-see decisions.
arXiv Detail & Related papers (2023-07-23T19:23:06Z) - Online Control of Adaptive Large Neighborhood Search using Deep Reinforcement Learning [4.374837991804085]
We introduce a Deep Reinforcement Learning based approach called DR-ALNS that selects operators, adjusts parameters, and controls the acceptance criterion throughout the search.
We evaluate the proposed method on a problem with orienteering weights and time windows, as presented in an IJCAI competition.
The results show that our approach outperforms vanilla ALNS, ALNS tuned with Bayesian optimization, and two state-of-the-art DRL approaches.
arXiv Detail & Related papers (2022-11-01T21:33:46Z) - No-Regret Learning in Two-Echelon Supply Chain with Unknown Demand
Distribution [48.27759561064771]
We consider the two-echelon supply chain model introduced in [Cachon and Zipkin, 1999] under two different settings.
We design algorithms that achieve favorable guarantees for both regret and convergence to the optimal inventory decision in both settings.
Our algorithms are based on Online Gradient Descent and Online Newton Step, together with several new ingredients specifically designed for our problem.
arXiv Detail & Related papers (2022-10-23T08:45:39Z) - Math Programming based Reinforcement Learning for Multi-Echelon
Inventory Management [1.9161790404101895]
Reinforcement learning has lead to considerable break-throughs in diverse areas such as robotics, games and many others.
But the application to RL in complex real-world decision making problems remains limited.
These characteristics make the problem considerably harder to solve for existing RL methods that rely on enumeration techniques to solve per step action problems.
We show that a properly selected discretization of the underlying uncertain distribution can yield near optimal actor policy even with very few samples from the underlying uncertainty.
We find that PARL outperforms commonly used base stock by 44.7% and the best performing RL method by up to 12.1% on average
arXiv Detail & Related papers (2021-12-04T01:40:34Z) - A Case Study on Optimization of Warehouses [2.2101681534594237]
In warehouses, order picking is known to be the most labor-intensive and costly task in which the employees account for a large part of the warehouse performance.
In this work, we aim at optimizing the storage assignment and the order picking problem within mezzanine warehouse with regards to their reciprocal influence.
arXiv Detail & Related papers (2021-11-23T07:22:57Z) - Modularity in Reinforcement Learning via Algorithmic Independence in
Credit Assignment [79.5678820246642]
We show that certain action-value methods are more sample efficient than policy-gradient methods on transfer problems that require only sparse changes to a sequence of previously optimal decisions.
We generalize the recently proposed societal decision-making framework as a more granular formalism than the Markov decision process.
arXiv Detail & Related papers (2021-06-28T21:29:13Z) - Adversarial Option-Aware Hierarchical Imitation Learning [89.92994158193237]
We propose Option-GAIL, a novel method to learn skills at long horizon.
The key idea of Option-GAIL is modeling the task hierarchy by options and train the policy via generative adversarial optimization.
Experiments show that Option-GAIL outperforms other counterparts consistently across a variety of tasks.
arXiv Detail & Related papers (2021-06-10T06:42:05Z) - Universal Trading for Order Execution with Oracle Policy Distillation [99.57416828489568]
We propose a novel universal trading policy optimization framework to bridge the gap between the noisy yet imperfect market states and the optimal action sequences for order execution.
We show that our framework can better guide the learning of the common policy towards practically optimal execution by an oracle teacher with perfect information.
arXiv Detail & Related papers (2021-01-28T05:52:18Z) - Combining Deep Learning and Optimization for Security-Constrained
Optimal Power Flow [94.24763814458686]
Security-constrained optimal power flow (SCOPF) is fundamental in power systems.
Modeling of APR within the SCOPF problem results in complex large-scale mixed-integer programs.
This paper proposes a novel approach that combines deep learning and robust optimization techniques.
arXiv Detail & Related papers (2020-07-14T12:38:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.