Related papers: Solving the Order Batching and Sequencing Problem using Deep Reinforcement Learning

Solving the Order Batching and Sequencing Problem using Deep Reinforcement Learning

URL: http://arxiv.org/abs/2006.09507v1
Date: Tue, 16 Jun 2020 20:40:41 GMT
Title: Solving the Order Batching and Sequencing Problem using Deep Reinforcement Learning
Authors: Bram Cals, Yingqian Zhang, Remco Dijkman, Claudy van Dorst
Abstract summary: We present a Deep Reinforcement Learning (DRL) approach for deciding how and when orders should be batched and picked in a warehouse to minimize the number of tardy orders. In particular, the technique facilitates making decisions on whether an order should be picked individually (pick-by-order) or picked in a batch with other orders (pick-by-batch) and if so with which other orders.
Score: 2.4565068569913384
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In e-commerce markets, on time delivery is of great importance to customer satisfaction. In this paper, we present a Deep Reinforcement Learning (DRL) approach for deciding how and when orders should be batched and picked in a warehouse to minimize the number of tardy orders. In particular, the technique facilitates making decisions on whether an order should be picked individually (pick-by-order) or picked in a batch with other orders (pick-by-batch), and if so with which other orders. We approach the problem by formulating it as a semi-Markov decision process and develop a vector-based state representation that includes the characteristics of the warehouse system. This allows us to create a deep reinforcement learning solution that learns a strategy by interacting with the environment and solve the problem with a proximal policy optimization algorithm. We evaluate the performance of the proposed DRL approach by comparing it with several batching and sequencing heuristics in different problem settings. The results show that the DRL approach is able to develop a strategy that produces consistent, good solutions and performs better than the proposed heuristics.

Related papers

Collab: Controlled Decoding using Mixture of Agents for LLM Alignment [90.6117569025754]
Reinforcement learning from human feedback has emerged as an effective technique to align Large Language models. Controlled Decoding provides a mechanism for aligning a model at inference time without retraining. We propose a mixture of agent-based decoding strategies leveraging the existing off-the-shelf aligned LLM policies.
arXiv Detail & Related papers (2025-03-27T17:34:25Z)
PMAT: Optimizing Action Generation Order in Multi-Agent Reinforcement Learning [16.523999372817435]
Action Generation with Plackett-Luce Sampling (AGPS) is a novel mechanism for agent decision order optimization. We propose the Prioritized Multi-Agent Transformer (PMAT), a sequential decision-making MARL algorithm with decision order optimization. Experiments on benchmarks including StarCraft II Multi-Agent Challenge, Google Research Football, and Multi-Agent MuJoCo show that PMAT outperforms state-of-the-art algorithms.
arXiv Detail & Related papers (2025-02-23T08:30:14Z)
Learning to Solve the Min-Max Mixed-Shelves Picker-Routing Problem via Hierarchical and Parallel Decoding [0.3867363075280544]
The Mixed-Shelves Picker Routing Problem (MSPRP) is a fundamental challenge in logistics, where pickers must navigate a mixed-shelves environment to retrieve SKUs efficiently. We propose a novel hierarchical and parallel decoding approach for solving the min-max variant of the MSPRP via multi-agent reinforcement learning. Experiments show state-of-the-art performance in both solution quality and inference speed, particularly for large-scale and out-of-distribution instances.
arXiv Detail & Related papers (2025-02-14T15:42:30Z)
Take a Step and Reconsider: Sequence Decoding for Self-Improved Neural Combinatorial Optimization [1.1510009152620668]
We present a simple and problem-independent sequence decoding method for self-improved learning. By modifying the policy to ignore previously sampled sequences, we force it to consider only unseen alternatives. Our method outperforms previous NCO approaches on the Job Shop Scheduling Problem.
arXiv Detail & Related papers (2024-07-24T12:06:09Z)
Actions Speak What You Want: Provably Sample-Efficient Reinforcement Learning of the Quantal Stackelberg Equilibrium from Strategic Feedbacks [94.07688076435818]
We study reinforcement learning for learning a Quantal Stackelberg Equilibrium (QSE) in an episodic Markov game with a leader-follower structure. Our algorithms are based on (i) learning the quantal response model via maximum likelihood estimation and (ii) model-free or model-based RL for solving the leader's decision making problem.
arXiv Detail & Related papers (2023-07-26T10:24:17Z)
A Machine Learning Approach to Two-Stage Adaptive Robust Optimization [6.943816076962257]
We propose an approach based on machine learning to solve two-stage linear adaptive robust optimization problems. We encode the optimal here-and-now decisions, the worst-case scenarios associated with the optimal here-and-now decisions, and the optimal wait-and-see decisions. We train a machine learning model that predicts high-quality strategies for the here-and-now decisions, the worst-case scenarios associated with the optimal here-and-now decisions, and the wait-and-see decisions.
arXiv Detail & Related papers (2023-07-23T19:23:06Z)
Online Control of Adaptive Large Neighborhood Search using Deep Reinforcement Learning [4.374837991804085]
We introduce a Deep Reinforcement Learning based approach called DR-ALNS that selects operators, adjusts parameters, and controls the acceptance criterion throughout the search. We evaluate the proposed method on a problem with orienteering weights and time windows, as presented in an IJCAI competition. The results show that our approach outperforms vanilla ALNS, ALNS tuned with Bayesian optimization, and two state-of-the-art DRL approaches.
arXiv Detail & Related papers (2022-11-01T21:33:46Z)
No-Regret Learning in Two-Echelon Supply Chain with Unknown Demand Distribution [48.27759561064771]
We consider the two-echelon supply chain model introduced in [Cachon and Zipkin, 1999] under two different settings. We design algorithms that achieve favorable guarantees for both regret and convergence to the optimal inventory decision in both settings. Our algorithms are based on Online Gradient Descent and Online Newton Step, together with several new ingredients specifically designed for our problem.
arXiv Detail & Related papers (2022-10-23T08:45:39Z)
Math Programming based Reinforcement Learning for Multi-Echelon Inventory Management [1.9161790404101895]
Reinforcement learning has lead to considerable break-throughs in diverse areas such as robotics, games and many others. But the application to RL in complex real-world decision making problems remains limited. These characteristics make the problem considerably harder to solve for existing RL methods that rely on enumeration techniques to solve per step action problems. We show that a properly selected discretization of the underlying uncertain distribution can yield near optimal actor policy even with very few samples from the underlying uncertainty. We find that PARL outperforms commonly used base stock by 44.7% and the best performing RL method by up to 12.1% on average
arXiv Detail & Related papers (2021-12-04T01:40:34Z)
A Case Study on Optimization of Warehouses [2.2101681534594237]
In warehouses, order picking is known to be the most labor-intensive and costly task in which the employees account for a large part of the warehouse performance. In this work, we aim at optimizing the storage assignment and the order picking problem within mezzanine warehouse with regards to their reciprocal influence.
arXiv Detail & Related papers (2021-11-23T07:22:57Z)
Modularity in Reinforcement Learning via Algorithmic Independence in Credit Assignment [79.5678820246642]
We show that certain action-value methods are more sample efficient than policy-gradient methods on transfer problems that require only sparse changes to a sequence of previously optimal decisions. We generalize the recently proposed societal decision-making framework as a more granular formalism than the Markov decision process.
arXiv Detail & Related papers (2021-06-28T21:29:13Z)
Adversarial Option-Aware Hierarchical Imitation Learning [89.92994158193237]
We propose Option-GAIL, a novel method to learn skills at long horizon. The key idea of Option-GAIL is modeling the task hierarchy by options and train the policy via generative adversarial optimization. Experiments show that Option-GAIL outperforms other counterparts consistently across a variety of tasks.
arXiv Detail & Related papers (2021-06-10T06:42:05Z)
Universal Trading for Order Execution with Oracle Policy Distillation [99.57416828489568]
We propose a novel universal trading policy optimization framework to bridge the gap between the noisy yet imperfect market states and the optimal action sequences for order execution. We show that our framework can better guide the learning of the common policy towards practically optimal execution by an oracle teacher with perfect information.
arXiv Detail & Related papers (2021-01-28T05:52:18Z)
Combining Deep Learning and Optimization for Security-Constrained Optimal Power Flow [94.24763814458686]
Security-constrained optimal power flow (SCOPF) is fundamental in power systems. Modeling of APR within the SCOPF problem results in complex large-scale mixed-integer programs. This paper proposes a novel approach that combines deep learning and robust optimization techniques.
arXiv Detail & Related papers (2020-07-14T12:38:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.