Policy-Based Deep Reinforcement Learning Hyperheuristics for Job-Shop Scheduling Problems
- URL: http://arxiv.org/abs/2601.11189v1
- Date: Fri, 16 Jan 2026 11:03:47 GMT
- Title: Policy-Based Deep Reinforcement Learning Hyperheuristics for Job-Shop Scheduling Problems
- Authors: Sofiene Lassoued, Asrat Gobachew, Stefan Lier, Andreas Schwung,
- Abstract summary: This paper proposes a policy-based deep reinforcement learning framework for solving the Job Shop Scheduling Problem.<n>We extend the hyper-heuristic framework with two key mechanisms.<n>We show that the proposed approach outperforms traditional deterministics, metaheuristics, and recent neural network-based scheduling methods.
- Score: 3.0098885383612104
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper proposes a policy-based deep reinforcement learning hyper-heuristic framework for solving the Job Shop Scheduling Problem. The hyper-heuristic agent learns to switch scheduling rules based on the system state dynamically. We extend the hyper-heuristic framework with two key mechanisms. First, action prefiltering restricts decision-making to feasible low-level actions, enabling low-level heuristics to be evaluated independently of environmental constraints and providing an unbiased assessment. Second, a commitment mechanism regulates the frequency of heuristic switching. We investigate the impact of different commitment strategies, from step-wise switching to full-episode commitment, on both training behavior and makespan. Additionally, we compare two action selection strategies at the policy level: deterministic greedy selection and stochastic sampling. Computational experiments on standard JSSP benchmarks demonstrate that the proposed approach outperforms traditional heuristics, metaheuristics, and recent neural network-based scheduling methods
Related papers
- Variational Approach for Job Shop Scheduling [2.256375838037721]
This paper proposes a novel Variational Graph-to-Scheduler (VG2S) framework for solving the Job Shop Scheduling Problem (JSSP)<n>The proposed method exhibits superior zero-shot generalization compared with state-of-the-art DRL baselines and traditional dispatching rules.
arXiv Detail & Related papers (2026-01-30T23:55:18Z) - Policy-Based Reinforcement Learning with Action Masking for Dynamic Job Shop Scheduling under Uncertainty: Handling Random Arrivals and Machine Failures [3.2880869992413246]
We present a novel framework for solving Dynamic Job Shop Scheduling Problems under uncertainty.<n>Our approach follows a model-based paradigm, using Coloured Timed Petri Nets to represent the scheduling environment.<n>We conduct experiments on dynamic JSSP benchmarks, demonstrating that our method consistently outperforms traditional minimization and rule-based approaches in terms of makespan.
arXiv Detail & Related papers (2026-01-14T08:53:46Z) - Learning Deterministic Policies with Policy Gradients in Constrained Markov Decision Processes [59.27926064817273]
We introduce an exploration-agnostic algorithm, called C-PG, which enjoys global last-iterate convergence guarantees under domination assumptions.<n>We empirically validate both the action-based (C-PGAE) and parameter-based (C-PGPE) variants of C-PG on constrained control tasks.
arXiv Detail & Related papers (2025-06-06T10:29:05Z) - Hierarchical Reinforcement Learning with Uncertainty-Guided Diffusional Subgoals [12.894271401094615]
A key challenge in HRL is that the low-level policy changes over time, making it difficult for the high-level policy to generate effective subgoals.<n>We propose an approach that trains a conditional diffusion model regularized by a Gaussian Process (GP) prior to generate a complex variety of subgoals.<n>Building on this framework, we develop a strategy that selects subgoals from both the diffusion policy and GP's predictive mean.
arXiv Detail & Related papers (2025-05-27T20:38:44Z) - Offline Multi-agent Reinforcement Learning via Score Decomposition [51.23590397383217]
offline cooperative multi-agent reinforcement learning (MARL) faces unique challenges due to distributional shifts.<n>This work is the first work to explicitly address the distributional gap between offline and online MARL.
arXiv Detail & Related papers (2025-05-09T11:42:31Z) - Bidirectional Task-Motion Planning Based on Hierarchical Reinforcement Learning for Strategic Confrontation [12.278121909070485]
In swarm robotics, confrontation scenarios, including strategic confrontations, require efficient decision-making.<n>Traditional task and motion planning methods separate decision-making into two layers, but their unidirectional structure fails to capture the interdependence between these layers.<n>Here, we propose a novel bidirectional approach based on hierarchical reinforcement learning, enabling dynamic interaction between the layers.
arXiv Detail & Related papers (2025-04-22T13:22:58Z) - Direct Preference Optimization for Primitive-Enabled Hierarchical Reinforcement Learning [75.9729413703531]
DIPPER is a novel HRL framework that formulates hierarchical policy learning as a bi-level optimization problem.<n>We show that DIPPER achieves up to 40% improvement over state-of-the-art baselines in sparse reward scenarios.
arXiv Detail & Related papers (2024-11-01T04:58:40Z) - Hierarchical Decision Making Based on Structural Information Principles [19.82391136775341]
We propose a novel Structural Information principles-based framework, namely SIDM, for hierarchical Decision Making.<n>We present an abstraction mechanism that processes historical state-action trajectories to construct abstract representations of states and actions.<n>We develop a skill-based learning method for single-agent scenarios and a role-based collaboration method for multi-agent scenarios, both of which can flexibly integrate various underlying algorithms for enhanced performance.
arXiv Detail & Related papers (2024-04-15T13:02:00Z) - Safe Multi-agent Learning via Trapping Regions [89.24858306636816]
We apply the concept of trapping regions, known from qualitative theory of dynamical systems, to create safety sets in the joint strategy space for decentralized learning.
We propose a binary partitioning algorithm for verification that candidate sets form trapping regions in systems with known learning dynamics, and a sampling algorithm for scenarios where learning dynamics are not known.
arXiv Detail & Related papers (2023-02-27T14:47:52Z) - Learning Dynamics and Generalization in Reinforcement Learning [59.530058000689884]
We show theoretically that temporal difference learning encourages agents to fit non-smooth components of the value function early in training.
We show that neural networks trained using temporal difference algorithms on dense reward tasks exhibit weaker generalization between states than randomly networks and gradient networks trained with policy methods.
arXiv Detail & Related papers (2022-06-05T08:49:16Z) - Planning to Practice: Efficient Online Fine-Tuning by Composing Goals in
Latent Space [76.46113138484947]
General-purpose robots require diverse repertoires of behaviors to complete challenging tasks in real-world unstructured environments.
To address this issue, goal-conditioned reinforcement learning aims to acquire policies that can reach goals for a wide range of tasks on command.
We propose Planning to Practice, a method that makes it practical to train goal-conditioned policies for long-horizon tasks.
arXiv Detail & Related papers (2022-05-17T06:58:17Z) - Hybrid intelligence for dynamic job-shop scheduling with deep
reinforcement learning and attention mechanism [28.28095225164155]
We formulate the DJSP as a Markov decision process (MDP) to be tackled by reinforcement learning (RL)
We propose a flexible hybrid framework that takes disjunctive graphs as states and a set of general dispatching rules as the action space with minimum prior domain knowledge.
We present Gymjsp, a public benchmark based on the well-known OR-Library, to provide a standardized off-the-shelf facility for RL and DJSP research communities.
arXiv Detail & Related papers (2022-01-03T09:38:13Z) - Evolutionary Stochastic Policy Distillation [139.54121001226451]
We propose a new method called Evolutionary Policy Distillation (ESPD) to solve GCRS tasks.
ESPD enables a target policy to learn from a series of its variants through the technique of policy distillation (PD)
The experiments based on the MuJoCo control suite show the high learning efficiency of the proposed method.
arXiv Detail & Related papers (2020-04-27T16:19:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.