Related papers: Deep Reinforcement Learning for Dynamic Order Picking in Warehouse Operations

Related papers

Demonstration of effective UCB-based routing in skill-based queues on real-world data [0.4077787659104315]
This paper is about optimally controlling skill-based queueing systems such as data centers, cloud computing networks, and a service systems.<n>By means of a case study using a real-world set data, we investigate the practical implementation of a recently developed reinforcement learning for optimal customer routing.
arXiv Detail & Related papers (2025-06-25T15:36:43Z)
Solving the Pod Repositioning Problem with Deep Reinforced Adaptive Large Neighborhood Search [0.3867363075280544]
The Pod Repositioning Problem (PRP) in Robotic Mobile Fulfillment Systems involves selecting optimal storage locations for pods returning from pick stations.<n>This work presents an improved solution method that integrates Large Neighborhood Search (ALNS) with Deep Reinforcement Learning (DRL)<n>A DRL agent dynamically selects destroy and repair operators and adjusts key parameters such as destruction degree and acceptance thresholds during the search.
arXiv Detail & Related papers (2025-06-03T11:07:41Z)
Review, Refine, Repeat: Understanding Iterative Decoding of AI Agents with Dynamic Evaluation and Selection [71.92083784393418]
Inference-time methods such as Best-of-N (BON) sampling offer a simple yet effective alternative to improve performance. We propose Iterative Agent Decoding (IAD) which combines iterative refinement with dynamic candidate evaluation and selection guided by a verifier.
arXiv Detail & Related papers (2025-04-02T17:40:47Z)
DARS: Dynamic Action Re-Sampling to Enhance Coding Agent Performance by Adaptive Tree Traversal [55.13854171147104]
Large Language Models (LLMs) have revolutionized various domains, including natural language processing, data analysis, and software development. We present Dynamic Action Re-Sampling (DARS), a novel inference time compute scaling approach for coding agents. We evaluate our approach on SWE-Bench Lite benchmark, demonstrating that this scaling strategy achieves a pass@k score of 55% with Claude 3.5 Sonnet V2.
arXiv Detail & Related papers (2025-03-18T14:02:59Z)
Integrated trucks assignment and scheduling problem with mixed service mode docks: A Q-learning based adaptive large neighborhood search algorithm [1.4693397031205022]
Mixed service mode docks enhance efficiency by flexibly handling both loading and unloading trucks in warehouses. This paper proposes a new model integrating dock mode decision, truck assignment, and scheduling. We introduce a Q-learning-based adaptive large neighborhood search algorithm to address the integrated problem.
arXiv Detail & Related papers (2024-12-12T09:17:35Z)
Memory-Enhanced Neural Solvers for Efficient Adaptation in Combinatorial Optimization [6.713974813995327]
We present MEMENTO, an approach that leverages memory to improve the adaptation of neural solvers at time. We successfully train all RL auto-regressive solvers on large instances, and show that MEMENTO can scale and is data-efficient. Overall, MEMENTO enables to push the state-of-the-art on 11 out of 12 evaluated tasks.
arXiv Detail & Related papers (2024-06-24T08:18:19Z)
Switchable Decision: Dynamic Neural Generation Networks [98.61113699324429]
We propose a switchable decision to accelerate inference by dynamically assigning resources for each data instance. Our method benefits from less cost during inference while keeping the same accuracy.
arXiv Detail & Related papers (2024-05-07T17:44:54Z)
Fractional Deep Reinforcement Learning for Age-Minimal Mobile Edge Computing [11.403989519949173]
This work focuses on the timeliness of computational-intensive updates, measured by Age-ofInformation (AoI) We study how to jointly optimize the task updating and offloading policies for AoI with fractional form. Experimental results show that our proposed algorithms reduce the average AoI by up to 57.6% compared with several non-fractional benchmarks.
arXiv Detail & Related papers (2023-12-16T11:13:40Z)
Neural Approximate Dynamic Programming for the Ultra-fast Order Dispatching Problem [1.519321208145928]
We focus on the ultrafast Order Dispatch Problem (ODP), which involves matching and dispatching orders to couriers within a centralized warehouse setting. We introduce important extensions to ultra-fast ODP such as order policies and explicit courier assignments to provide a more realistic representation of operations and improve delivery efficiency. We test our proposed approach using four distinct realistic datasets tailored for ODP and compare the performance of NeurADP against myopic and DRL baselines.
arXiv Detail & Related papers (2023-11-21T20:23:58Z)
Action-Quantized Offline Reinforcement Learning for Robotic Skill Learning [68.16998247593209]
offline reinforcement learning (RL) paradigm provides recipe to convert static behavior datasets into policies that can perform better than the policy that collected the data. In this paper, we propose an adaptive scheme for action quantization. We show that several state-of-the-art offline RL methods such as IQL, CQL, and BRAC improve in performance on benchmarks when combined with our proposed discretization scheme.
arXiv Detail & Related papers (2023-10-18T06:07:10Z)
Towards General and Efficient Online Tuning for Spark [55.30868031221838]
We present a general and efficient Spark tuning framework that can deal with the three issues simultaneously. We have implemented this framework as an independent cloud service, and applied it to the data platform in Tencent.
arXiv Detail & Related papers (2023-09-05T02:16:45Z)
Prompt-Tuning Decision Transformer with Preference Ranking [83.76329715043205]
We propose the Prompt-Tuning DT algorithm to address challenges by using trajectory segments as prompts to guide RL agents in acquiring environmental information. Our approach involves randomly sampling a Gaussian distribution to fine-tune the elements of the prompt trajectory and using preference ranking function to find the optimization direction. Our work contributes to the advancement of prompt-tuning approaches in RL, providing a promising direction for optimizing large RL agents for specific preference tasks.
arXiv Detail & Related papers (2023-05-16T17:49:04Z)
Mastering the Unsupervised Reinforcement Learning Benchmark from Pixels [112.63440666617494]
Reinforcement learning algorithms can succeed but require large amounts of interactions between the agent and the environment. We propose a new method to solve it, using unsupervised model-based RL, for pre-training the agent. We show robust performance on the Real-Word RL benchmark, hinting at resiliency to environment perturbations during adaptation.
arXiv Detail & Related papers (2022-09-24T14:22:29Z)
Tuning Mixed Input Hyperparameters on the Fly for Efficient Population Based AutoRL [12.135280422000635]
We introduce a new efficient hierarchical approach for optimizing both continuous and categorical variables. We show that explicitly modelling dependence between data augmentation and other hyper parameters improves generalization.
arXiv Detail & Related papers (2021-06-30T08:15:59Z)
AdaPool: A Diurnal-Adaptive Fleet Management Framework using Model-Free Deep Reinforcement Learning and Change Point Detection [34.77250498401055]
This paper introduces an adaptive model-free deep reinforcement approach that can recognize and adapt to the diurnal patterns in the ride-sharing environment with car-pooling. In addition to the adaptation logic in dispatching, this paper also proposes a dynamic, demand-aware vehicle-passenger matching and route planning framework.
arXiv Detail & Related papers (2021-04-01T02:14:01Z)
Fast Approximate Solutions using Reinforcement Learning for Dynamic Capacitated Vehicle Routing with Time Windows [3.5232085374661284]
This paper develops an inherently parallelised, fast, approximate learning-based solution to the generic class of Capacitated Vehicle Routing with Time Windows and Dynamic Routing (CVRP-TWDR) Considering vehicles in a fleet as decentralised agents, we postulate that using reinforcement learning (RL) based adaptation is a key enabler for real-time route formation in a dynamic environment.
arXiv Detail & Related papers (2021-02-24T06:30:16Z)
Dynamics Generalization via Information Bottleneck in Deep Reinforcement Learning [90.93035276307239]
We propose an information theoretic regularization objective and an annealing-based optimization method to achieve better generalization ability in RL agents. We demonstrate the extreme generalization benefits of our approach in different domains ranging from maze navigation to robotic tasks. This work provides a principled way to improve generalization in RL by gradually removing information that is redundant for task-solving.
arXiv Detail & Related papers (2020-08-03T02:24:20Z)
Combining Deep Learning and Optimization for Security-Constrained Optimal Power Flow [94.24763814458686]
Security-constrained optimal power flow (SCOPF) is fundamental in power systems. Modeling of APR within the SCOPF problem results in complex large-scale mixed-integer programs. This paper proposes a novel approach that combines deep learning and robust optimization techniques.
arXiv Detail & Related papers (2020-07-14T12:38:21Z)
Solving the Order Batching and Sequencing Problem using Deep Reinforcement Learning [2.4565068569913384]
We present a Deep Reinforcement Learning (DRL) approach for deciding how and when orders should be batched and picked in a warehouse to minimize the number of tardy orders. In particular, the technique facilitates making decisions on whether an order should be picked individually (pick-by-order) or picked in a batch with other orders (pick-by-batch) and if so with which other orders.
arXiv Detail & Related papers (2020-06-16T20:40:41Z)
Tracking Performance of Online Stochastic Learners [57.14673504239551]
Online algorithms are popular in large-scale learning settings due to their ability to compute updates on the fly, without the need to store and process data in large batches. When a constant step-size is used, these algorithms also have the ability to adapt to drifts in problem parameters, such as data or model properties, and track the optimal solution with reasonable accuracy. We establish a link between steady-state performance derived under stationarity assumptions and the tracking performance of online learners under random walk models.
arXiv Detail & Related papers (2020-04-04T14:16:27Z)
Guided Constrained Policy Optimization for Dynamic Quadrupedal Robot Locomotion [78.46388769788405]
We introduce guided constrained policy optimization (GCPO), an RL framework based upon our implementation of constrained policy optimization (CPPO) We show that guided constrained RL offers faster convergence close to the desired optimum resulting in an optimal, yet physically feasible, robotic control behavior without the need for precise reward function tuning.
arXiv Detail & Related papers (2020-02-22T10:15:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.