Heuristic Algorithm-based Action Masking Reinforcement Learning (HAAM-RL) with Ensemble Inference Method
- URL: http://arxiv.org/abs/2403.14110v1
- Date: Thu, 21 Mar 2024 03:42:39 GMT
- Title: Heuristic Algorithm-based Action Masking Reinforcement Learning (HAAM-RL) with Ensemble Inference Method
- Authors: Kyuwon Choi, Cheolkyun Rho, Taeyoun Kim, Daewoo Choi,
- Abstract summary: This paper presents a novel reinforcement learning approach called HAAMRL (Heuristic ensemble-based Action Masking Reinforcement Learning)
The proposed approach exhibits superior performance and capability generalization, indicating superior effectiveness in optimizing complex manufacturing processes.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper presents a novel reinforcement learning (RL) approach called HAAM-RL (Heuristic Algorithm-based Action Masking Reinforcement Learning) for optimizing the color batching re-sequencing problem in automobile painting processes. The existing heuristic algorithms have limitations in adequately reflecting real-world constraints and accurately predicting logistics performance. Our methodology incorporates several key techniques including a tailored Markov Decision Process (MDP) formulation, reward setting including Potential-Based Reward Shaping, action masking using heuristic algorithms (HAAM-RL), and an ensemble inference method that combines multiple RL models. The RL agent is trained and evaluated using FlexSim, a commercial 3D simulation software, integrated with our RL MLOps platform BakingSoDA. Experimental results across 30 scenarios demonstrate that HAAM-RL with an ensemble inference method achieves a 16.25% performance improvement over the conventional heuristic algorithm, with stable and consistent results. The proposed approach exhibits superior performance and generalization capability, indicating its effectiveness in optimizing complex manufacturing processes. The study also discusses future research directions, including alternative state representations, incorporating model-based RL methods, and integrating additional real-world constraints.
Related papers
- Model-Free Robust Reinforcement Learning with Sample Complexity Analysis [16.477827600825428]
This paper proposes a model-free DR-RL algorithm leveraging the Multi-level Monte Carlo technique.
We develop algorithms for uncertainty sets defined by total variation, Chi-square divergence, and KL divergence.
Remarkably, our algorithms represent the first model-free DR-RL approach featuring finite sample complexity.
arXiv Detail & Related papers (2024-06-24T19:35:26Z) - REX: Rapid Exploration and eXploitation for AI Agents [103.68453326880456]
We propose an enhanced approach for Rapid Exploration and eXploitation for AI Agents called REX.
REX introduces an additional layer of rewards and integrates concepts similar to Upper Confidence Bound (UCB) scores, leading to more robust and efficient AI agent performance.
arXiv Detail & Related papers (2023-07-18T04:26:33Z) - Maximize to Explore: One Objective Function Fusing Estimation, Planning,
and Exploration [87.53543137162488]
We propose an easy-to-implement online reinforcement learning (online RL) framework called textttMEX.
textttMEX integrates estimation and planning components while balancing exploration exploitation automatically.
It can outperform baselines by a stable margin in various MuJoCo environments with sparse rewards.
arXiv Detail & Related papers (2023-05-29T17:25:26Z) - Provable Reward-Agnostic Preference-Based Reinforcement Learning [61.39541986848391]
Preference-based Reinforcement Learning (PbRL) is a paradigm in which an RL agent learns to optimize a task using pair-wise preference-based feedback over trajectories.
We propose a theoretical reward-agnostic PbRL framework where exploratory trajectories that enable accurate learning of hidden reward functions are acquired.
arXiv Detail & Related papers (2023-05-29T15:00:09Z) - Latent Variable Representation for Reinforcement Learning [131.03944557979725]
It remains unclear theoretically and empirically how latent variable models may facilitate learning, planning, and exploration to improve the sample efficiency of model-based reinforcement learning.
We provide a representation view of the latent variable models for state-action value functions, which allows both tractable variational learning algorithm and effective implementation of the optimism/pessimism principle.
In particular, we propose a computationally efficient planning algorithm with UCB exploration by incorporating kernel embeddings of latent variable models.
arXiv Detail & Related papers (2022-12-17T00:26:31Z) - Ensemble Reinforcement Learning in Continuous Spaces -- A Hierarchical
Multi-Step Approach for Policy Training [4.982806898121435]
We propose a new technique to train an ensemble of base learners based on an innovative multi-step integration method.
This training technique enables us to develop a new hierarchical learning algorithm for ensemble DRL that effectively promotes inter-learner collaboration.
The algorithm is also shown empirically to outperform several state-of-the-art DRL algorithms on multiple benchmark RL problems.
arXiv Detail & Related papers (2022-09-29T00:42:44Z) - Contrastive UCB: Provably Efficient Contrastive Self-Supervised Learning in Online Reinforcement Learning [92.18524491615548]
Contrastive self-supervised learning has been successfully integrated into the practice of (deep) reinforcement learning (RL)
We study how RL can be empowered by contrastive learning in a class of Markov decision processes (MDPs) and Markov games (MGs) with low-rank transitions.
Under the online setting, we propose novel upper confidence bound (UCB)-type algorithms that incorporate such a contrastive loss with online RL algorithms for MDPs or MGs.
arXiv Detail & Related papers (2022-07-29T17:29:08Z) - An Actor-Critic Method for Simulation-Based Optimization [6.261751912603047]
We focus on a simulation-based optimization problem of choosing the best design from the feasible space.
We formulate the sampling process as a policy searching problem and give a solution from the perspective of Reinforcement Learning (RL)
Some experiments are designed to validate the effectiveness of proposed algorithms.
arXiv Detail & Related papers (2021-10-31T09:04:23Z) - Progressive extension of reinforcement learning action dimension for
asymmetric assembly tasks [7.4642148614421995]
In this paper, a progressive extension of action dimension (PEAD) mechanism is proposed to optimize the convergence of RL algorithms.
The results demonstrate the PEAD method will enhance the data-efficiency and time-efficiency of RL algorithms as well as increase the stable reward.
arXiv Detail & Related papers (2021-04-06T11:48:54Z) - A Two-stage Framework and Reinforcement Learning-based Optimization
Algorithms for Complex Scheduling Problems [54.61091936472494]
We develop a two-stage framework, in which reinforcement learning (RL) and traditional operations research (OR) algorithms are combined together.
The scheduling problem is solved in two stages, including a finite Markov decision process (MDP) and a mixed-integer programming process, respectively.
Results show that the proposed algorithms could stably and efficiently obtain satisfactory scheduling schemes for agile Earth observation satellite scheduling problems.
arXiv Detail & Related papers (2021-03-10T03:16:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.