Smaller Batches, Bigger Gains? Investigating the Impact of Batch Sizes on Reinforcement Learning Based Real-World Production Scheduling
- URL: http://arxiv.org/abs/2406.02294v1
- Date: Tue, 4 Jun 2024 13:16:08 GMT
- Title: Smaller Batches, Bigger Gains? Investigating the Impact of Batch Sizes on Reinforcement Learning Based Real-World Production Scheduling
- Authors: Arthur Müller, Felix Grumbach, Matthia Sabatelli,
- Abstract summary: This work explores the effects of varying batch sizes on Reinforcement Learning solutions.
It provides the practitioner with the ability to make an informed decision regarding the selection of an appropriate batch size.
The findings of this work offer the potential for application in several industrial use cases with comparable scheduling problems.
- Score: 2.2530496464901106
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Production scheduling is an essential task in manufacturing, with Reinforcement Learning (RL) emerging as a key solution. In a previous work, RL was utilized to solve an extended permutation flow shop scheduling problem (PFSSP) for a real-world production line with two stages, linked by a central buffer. The RL agent was trained to sequence equallysized product batches to minimize setup efforts and idle times. However, the substantial impact caused by varying the size of these product batches has not yet been explored. In this follow-up study, we investigate the effects of varying batch sizes, exploring both the quality of solutions and the training dynamics of the RL agent. The results demonstrate that it is possible to methodically identify reasonable boundaries for the batch size. These boundaries are determined on one side by the increasing sample complexity associated with smaller batch sizes, and on the other side by the decreasing flexibility of the agent when dealing with larger batch sizes. This provides the practitioner the ability to make an informed decision regarding the selection of an appropriate batch size. Moreover, we introduce and investigate two new curriculum learning strategies to enable the training with small batch sizes. The findings of this work offer the potential for application in several industrial use cases with comparable scheduling problems.
Related papers
- Sample-Efficient Reinforcement Learning with Temporal Logic Objectives: Leveraging the Task Specification to Guide Exploration [13.053013407015628]
This paper addresses the problem of learning optimal control policies for systems with uncertain dynamics.
We propose an accelerated RL algorithm that can learn control policies significantly faster than competitive approaches.
arXiv Detail & Related papers (2024-10-16T00:53:41Z) - Sample Efficient Myopic Exploration Through Multitask Reinforcement
Learning with Diverse Tasks [53.44714413181162]
This paper shows that when an agent is trained on a sufficiently diverse set of tasks, a generic policy-sharing algorithm with myopic exploration design can be sample-efficient.
To the best of our knowledge, this is the first theoretical demonstration of the "exploration benefits" of MTRL.
arXiv Detail & Related papers (2024-03-03T22:57:44Z) - Improving Large Language Models via Fine-grained Reinforcement Learning with Minimum Editing Constraint [104.53687944498155]
Reinforcement learning (RL) has been widely used in training large language models (LLMs)
We propose a new RL method named RLMEC that incorporates a generative model as the reward model.
Based on the generative reward model, we design the token-level RL objective for training and an imitation-based regularization for stabilizing RL process.
arXiv Detail & Related papers (2024-01-11T17:58:41Z) - Provable Benefits of Multi-task RL under Non-Markovian Decision Making
Processes [56.714690083118406]
In multi-task reinforcement learning (RL) under Markov decision processes (MDPs), the presence of shared latent structures has been shown to yield significant benefits to the sample efficiency compared to single-task RL.
We investigate whether such a benefit can extend to more general sequential decision making problems, such as partially observable MDPs (POMDPs) and more general predictive state representations (PSRs)
We propose a provably efficient algorithm UMT-PSR for finding near-optimal policies for all PSRs, and demonstrate that the advantage of multi-task learning manifests if the joint model class of PSR
arXiv Detail & Related papers (2023-10-20T14:50:28Z) - LaGR-SEQ: Language-Guided Reinforcement Learning with Sample-Efficient
Querying [71.86163159193327]
Large language models (LLMs) have recently demonstrated their impressive ability to provide context-aware responses via text.
This ability could potentially be used to predict plausible solutions in sequential decision making tasks pertaining to pattern completion.
We introduce LaGR, which uses this predictive ability of LLMs to propose solutions to tasks that have been partially completed by a primary reinforcement learning (RL) agent.
arXiv Detail & Related papers (2023-08-21T02:07:35Z) - Curriculum Learning in Job Shop Scheduling using Reinforcement Learning [0.3867363075280544]
Deep Reinforcement Learning (DRL) dynamically adjusts an agent's planning strategy in response to difficult instances.
We further improve DLR as an underlying method by actively incorporating the variability of difficulty within the same problem size into the design of the learning process.
arXiv Detail & Related papers (2023-05-17T13:15:27Z) - Flexible Job Shop Scheduling via Dual Attention Network Based
Reinforcement Learning [73.19312285906891]
In flexible job shop scheduling problem (FJSP), operations can be processed on multiple machines, leading to intricate relationships between operations and machines.
Recent works have employed deep reinforcement learning (DRL) to learn priority dispatching rules (PDRs) for solving FJSP.
This paper presents a novel end-to-end learning framework that weds the merits of self-attention models for deep feature extraction and DRL for scalable decision-making.
arXiv Detail & Related papers (2023-05-09T01:35:48Z) - Q-Ensemble for Offline RL: Don't Scale the Ensemble, Scale the Batch
Size [58.762959061522736]
We show that scaling mini-batch sizes with appropriate learning rate adjustments can speed up the training process by orders of magnitude.
We show that scaling the mini-batch size and naively adjusting the learning rate allows for (1) a reduced size of the Q-ensemble, (2) stronger penalization of out-of-distribution actions, and (3) improved convergence time.
arXiv Detail & Related papers (2022-11-20T21:48:25Z) - Existence and Estimation of Critical Batch Size for Training Generative
Adversarial Networks with Two Time-Scale Update Rule [0.2741266294612775]
Previous results have shown that a two time-scale update rule (TTUR) using different learning rates is useful for training generative adversarial networks (GANs) in theory and in practice.
This paper studies the relationship between batch size and the number of steps needed for training GANs with TTURs based on constant learning rates.
arXiv Detail & Related papers (2022-01-28T08:52:01Z) - Exact and Metaheuristic Approaches for the Production Leveling Problem [5.510992382274775]
This paper introduces a new problem in the field of production planning which we call the Production Leveling Problem.
The task is to assign orders to production periods such that the load in each period and on each production resource is balanced, capacity limits are not exceeded and the orders' priorities are taken into account.
A formal model of the problem is proposed and NP-hardness is shown by reduction from Bin Backing.
arXiv Detail & Related papers (2020-06-15T20:04:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.