Offline Reinforcement Learning for Learning to Dispatch for Job Shop Scheduling
- URL: http://arxiv.org/abs/2409.10589v3
- Date: Sun, 13 Apr 2025 14:52:43 GMT
- Title: Offline Reinforcement Learning for Learning to Dispatch for Job Shop Scheduling
- Authors: Jesse van Remmerden, Zaharah Bukhsh, Yingqian Zhang,
- Abstract summary: Job Shop Scheduling Problem (JSSP) is a complex optimization problem.<n>Online Reinforcement Learning (RL) has shown promise by quickly finding acceptable solutions for JSSP.<n>We introduce Offline Reinforcement Learning for Learning to Dispatch (Offline-LD)
- Score: 0.9831489366502301
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The Job Shop Scheduling Problem (JSSP) is a complex combinatorial optimization problem. While online Reinforcement Learning (RL) has shown promise by quickly finding acceptable solutions for JSSP, it faces key limitations: it requires extensive training interactions from scratch leading to sample inefficiency, cannot leverage existing high-quality solutions, and often yields suboptimal results compared to traditional methods like Constraint Programming (CP). We introduce Offline Reinforcement Learning for Learning to Dispatch (Offline-LD), which addresses these limitations by learning from previously generated solutions. Our approach is motivated by scenarios where historical scheduling data and expert solutions are available, although our current evaluation focuses on benchmark problems. Offline-LD adapts two CQL-based Q-learning methods (mQRDQN and discrete mSAC) for maskable action spaces, introduces a novel entropy bonus modification for discrete SAC, and exploits reward normalization through preprocessing. Our experiments demonstrate that Offline-LD outperforms online RL on both generated and benchmark instances. Notably, by introducing noise into the expert dataset, we achieve similar or better results than those obtained from the expert dataset, suggesting that a more diverse training set is preferable because it contains counterfactual information.
Related papers
- Leveraging Skills from Unlabeled Prior Data for Efficient Online Exploration [54.8229698058649]
We study how unlabeled prior trajectory data can be leveraged to learn efficient exploration strategies.
Our method SUPE (Skills from Unlabeled Prior data for Exploration) demonstrates that a careful combination of these ideas compounds their benefits.
We empirically show that SUPE reliably outperforms prior strategies, successfully solving a suite of long-horizon, sparse-reward tasks.
arXiv Detail & Related papers (2024-10-23T17:58:45Z) - Offline reinforcement learning for job-shop scheduling problems [1.3927943269211593]
This paper introduces a novel offline RL method designed for optimization problems with complex constraints.
Our approach encodes actions in edge attributes and balances expected rewards with the imitation of expert solutions.
We demonstrate the effectiveness of this method on job-shop scheduling and flexible job-shop scheduling benchmarks.
arXiv Detail & Related papers (2024-10-21T07:33:42Z) - Equivariant Offline Reinforcement Learning [7.822389399560674]
We investigate the use of $SO(2)$-equivariant neural networks for offline RL with a limited number of demonstrations.
Our experimental results show that equivariant versions of Conservative Q-Learning (CQL) and Implicit Q-Learning (IQL) outperform their non-equivariant counterparts.
arXiv Detail & Related papers (2024-06-20T03:02:49Z) - ATraDiff: Accelerating Online Reinforcement Learning with Imaginary Trajectories [27.5648276335047]
Training autonomous agents with sparse rewards is a long-standing problem in online reinforcement learning (RL)
We propose a novel approach that leverages offline data to learn a generative diffusion model, coined as Adaptive Trajectory diffuser (ATraDiff)
ATraDiff consistently achieves state-of-the-art performance across a variety of environments, with particularly pronounced improvements in complicated settings.
arXiv Detail & Related papers (2024-06-06T17:58:15Z) - Offline Reinforcement Learning from Datasets with Structured Non-Stationarity [50.35634234137108]
Current Reinforcement Learning (RL) is often limited by the large amount of data needed to learn a successful policy.
We address a novel Offline RL problem setting in which, while collecting the dataset, the transition and reward functions gradually change between episodes but stay constant within each episode.
We propose a method based on Contrastive Predictive Coding that identifies this non-stationarity in the offline dataset, accounts for it when training a policy, and predicts it during evaluation.
arXiv Detail & Related papers (2024-05-23T02:41:36Z) - Leveraging Constraint Programming in a Deep Learning Approach for Dynamically Solving the Flexible Job-Shop Scheduling Problem [1.3927943269211593]
This paper aims to integrate constraint programming (CP) within a deep learning (DL) based methodology, leveraging the benefits of both.
We introduce a method that involves training a DL model using optimal solutions generated by CP, ensuring the model learns from high-quality data.
Our hybrid approach has been extensively tested on three public FJSSP benchmarks, demonstrating superior performance over five state-of-the-art DRL approaches.
arXiv Detail & Related papers (2024-03-14T10:16:57Z) - Action-Quantized Offline Reinforcement Learning for Robotic Skill
Learning [68.16998247593209]
offline reinforcement learning (RL) paradigm provides recipe to convert static behavior datasets into policies that can perform better than the policy that collected the data.
In this paper, we propose an adaptive scheme for action quantization.
We show that several state-of-the-art offline RL methods such as IQL, CQL, and BRAC improve in performance on benchmarks when combined with our proposed discretization scheme.
arXiv Detail & Related papers (2023-10-18T06:07:10Z) - Bridging Imitation and Online Reinforcement Learning: An Optimistic Tale [27.02990488317357]
Given an offline demonstration dataset from an imperfect expert, what is the best way to leverage it to bootstrap online learning performance in MDPs?
We first propose an Informed Posterior Sampling-based RL (iPSRL) algorithm that uses the offline dataset, and information about the expert's behavioral policy used to generate the offline dataset.
Since this algorithm is computationally impractical, we then propose the iRLSVI algorithm that can be seen as a combination of the RLSVI algorithm for online RL, and imitation learning.
arXiv Detail & Related papers (2023-03-20T18:16:25Z) - Efficient Online Reinforcement Learning with Offline Data [78.92501185886569]
We show that we can simply apply existing off-policy methods to leverage offline data when learning online.
We extensively ablate these design choices, demonstrating the key factors that most affect performance.
We see that correct application of these simple recommendations can provide a $mathbf2.5times$ improvement over existing approaches.
arXiv Detail & Related papers (2023-02-06T17:30:22Z) - Boosting Offline Reinforcement Learning via Data Rebalancing [104.3767045977716]
offline reinforcement learning (RL) is challenged by the distributional shift between learning policies and datasets.
We propose a simple yet effective method to boost offline RL algorithms based on the observation that resampling a dataset keeps the distribution support unchanged.
We dub our method ReD (Return-based Data Rebalance), which can be implemented with less than 10 lines of code change and adds negligible running time.
arXiv Detail & Related papers (2022-10-17T16:34:01Z) - Discriminator-Weighted Offline Imitation Learning from Suboptimal
Demonstrations [5.760034336327491]
We study the problem of offline Learning (IL) where an agent aims to learn an optimal expert behavior policy without additional online environment interactions.
We introduce an additional discriminator to distinguish expert and non-expert data.
Our proposed algorithm achieves higher returns and faster training speed compared to baseline algorithms.
arXiv Detail & Related papers (2022-07-20T17:29:04Z) - Hybrid Learning for Orchestrating Deep Learning Inference in Multi-user
Edge-cloud Networks [3.7630209350186807]
Collaborative end-edge-cloud computing for deep learning provides a range of performance and efficiency.
Deep Learning inference orchestration strategy employs reinforcement learning to find the optimal orchestration policy.
We demonstrate efficacy of our HL strategy through experimental comparison with state-of-the-art RL-based inference orchestration.
arXiv Detail & Related papers (2022-02-21T21:50:50Z) - RvS: What is Essential for Offline RL via Supervised Learning? [77.91045677562802]
Recent work has shown that supervised learning alone, without temporal difference (TD) learning, can be remarkably effective for offline RL.
In every environment suite we consider simply maximizing likelihood with two-layer feedforward is competitive.
They also probe the limits of existing RvS methods, which are comparatively weak on random data.
arXiv Detail & Related papers (2021-12-20T18:55:16Z) - Offline Meta-Reinforcement Learning with Online Self-Supervision [66.42016534065276]
We propose a hybrid offline meta-RL algorithm, which uses offline data with rewards to meta-train an adaptive policy.
Our method uses the offline data to learn the distribution of reward functions, which is then sampled to self-supervise reward labels for the additional online data.
We find that using additional data and self-generated rewards significantly improves an agent's ability to generalize.
arXiv Detail & Related papers (2021-07-08T17:01:32Z) - Text Generation with Efficient (Soft) Q-Learning [91.47743595382758]
Reinforcement learning (RL) offers a more flexible solution by allowing users to plug in arbitrary task metrics as reward.
We introduce a new RL formulation for text generation from the soft Q-learning perspective.
We apply the approach to a wide range of tasks, including learning from noisy/negative examples, adversarial attacks, and prompt generation.
arXiv Detail & Related papers (2021-06-14T18:48:40Z) - Combining Deep Learning and Optimization for Security-Constrained
Optimal Power Flow [94.24763814458686]
Security-constrained optimal power flow (SCOPF) is fundamental in power systems.
Modeling of APR within the SCOPF problem results in complex large-scale mixed-integer programs.
This paper proposes a novel approach that combines deep learning and robust optimization techniques.
arXiv Detail & Related papers (2020-07-14T12:38:21Z) - Conservative Q-Learning for Offline Reinforcement Learning [106.05582605650932]
We show that CQL substantially outperforms existing offline RL methods, often learning policies that attain 2-5 times higher final return.
We theoretically show that CQL produces a lower bound on the value of the current policy and that it can be incorporated into a policy learning procedure with theoretical improvement guarantees.
arXiv Detail & Related papers (2020-06-08T17:53:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.