Deep reinforcement learning applied to an assembly sequence planning
problem with user preferences
- URL: http://arxiv.org/abs/2304.06567v1
- Date: Thu, 13 Apr 2023 14:25:15 GMT
- Title: Deep reinforcement learning applied to an assembly sequence planning
problem with user preferences
- Authors: Miguel Neves, Pedro Neto
- Abstract summary: We propose an approach to the implementation of DRL methods in assembly sequence planning problems.
The proposed approach introduces in the RL environment parametric actions to improve training time and sample efficiency.
The results support the potential for the application of deep reinforcement learning in assembly sequence planning problems with human interaction.
- Score: 1.0558951653323283
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deep reinforcement learning (DRL) has demonstrated its potential in solving
complex manufacturing decision-making problems, especially in a context where
the system learns over time with actual operation in the absence of training
data. One interesting and challenging application for such methods is the
assembly sequence planning (ASP) problem. In this paper, we propose an approach
to the implementation of DRL methods in ASP. The proposed approach introduces
in the RL environment parametric actions to improve training time and sample
efficiency and uses two different reward signals: (1) user's preferences and
(2) total assembly time duration. The user's preferences signal addresses the
difficulties and non-ergonomic properties of the assembly faced by the human
and the total assembly time signal enforces the optimization of the assembly.
Three of the most powerful deep RL methods were studied, Advantage Actor-Critic
(A2C), Deep Q-Learning (DQN), and Rainbow, in two different scenarios: a
stochastic and a deterministic one. Finally, the performance of the DRL
algorithms was compared to tabular Q-Learnings performance. After 10,000
episodes, the system achieved near optimal behaviour for the algorithms tabular
Q-Learning, A2C, and Rainbow. Though, for more complex scenarios, the algorithm
tabular Q-Learning is expected to underperform in comparison to the other 2
algorithms. The results support the potential for the application of deep
reinforcement learning in assembly sequence planning problems with human
interaction.
Related papers
- Enhancing Multi-Step Reasoning Abilities of Language Models through Direct Q-Function Optimization [50.485788083202124]
Reinforcement Learning (RL) plays a crucial role in aligning large language models with human preferences and improving their ability to perform complex tasks.
We introduce Direct Q-function Optimization (DQO), which formulates the response generation process as a Markov Decision Process (MDP) and utilizes the soft actor-critic (SAC) framework to optimize a Q-function directly parameterized by the language model.
Experimental results on two math problem-solving datasets, GSM8K and MATH, demonstrate that DQO outperforms previous methods, establishing it as a promising offline reinforcement learning approach for aligning language models.
arXiv Detail & Related papers (2024-10-11T23:29:20Z) - Tractable Offline Learning of Regular Decision Processes [50.11277112628193]
This work studies offline Reinforcement Learning (RL) in a class of non-Markovian environments called Regular Decision Processes (RDPs)
Ins, the unknown dependency of future observations and rewards from the past interactions can be captured experimentally.
Many algorithms first reconstruct this unknown dependency using automata learning techniques.
arXiv Detail & Related papers (2024-09-04T14:26:58Z) - Accelerating Exact Combinatorial Optimization via RL-based
Initialization -- A Case Study in Scheduling [1.3053649021965603]
This research aims to develop an innovative approach that employs machine learning (ML) for addressing optimization problems.
We introduce a novel two-phase RL-to-ILP scheduling framework, which includes three steps: 1) solver as coarse-grain scheduler, 2) solution relaxation and 3) exact solving via ILP.
Our framework demonstrates the same scheduling performance compared with using exact scheduling methods while achieving up to 128 $times$ speed improvements.
arXiv Detail & Related papers (2023-08-19T15:52:43Z) - A study on a Q-Learning algorithm application to a manufacturing
assembly problem [0.8937905773981699]
This study focuses on the implementation of a reinforcement learning algorithm in an assembly problem of a given object.
A model-free Q-Learning algorithm is applied, considering the learning of a matrix of Q-values (Q-table) from the successive interactions with the environment.
The optimisation approach achieved very promising results by learning the optimal assembly sequence 98.3% of the times.
arXiv Detail & Related papers (2023-04-17T15:38:34Z) - Supervised Advantage Actor-Critic for Recommender Systems [76.7066594130961]
We propose negative sampling strategy for training the RL component and combine it with supervised sequential learning.
Based on sampled (negative) actions (items), we can calculate the "advantage" of a positive action over the average case.
We instantiate SNQN and SA2C with four state-of-the-art sequential recommendation models and conduct experiments on two real-world datasets.
arXiv Detail & Related papers (2021-11-05T12:51:15Z) - A Heuristically Assisted Deep Reinforcement Learning Approach for
Network Slice Placement [0.7885276250519428]
We introduce a hybrid placement solution based on Deep Reinforcement Learning (DRL) and a dedicated optimization based on the Power of Two Choices principle.
The proposed Heuristically-Assisted DRL (HA-DRL) allows to accelerate the learning process and gain in resource usage when compared against other state-of-the-art approaches.
arXiv Detail & Related papers (2021-05-14T10:04:17Z) - A Two-stage Framework and Reinforcement Learning-based Optimization
Algorithms for Complex Scheduling Problems [54.61091936472494]
We develop a two-stage framework, in which reinforcement learning (RL) and traditional operations research (OR) algorithms are combined together.
The scheduling problem is solved in two stages, including a finite Markov decision process (MDP) and a mixed-integer programming process, respectively.
Results show that the proposed algorithms could stably and efficiently obtain satisfactory scheduling schemes for agile Earth observation satellite scheduling problems.
arXiv Detail & Related papers (2021-03-10T03:16:12Z) - Geometric Deep Reinforcement Learning for Dynamic DAG Scheduling [8.14784681248878]
In this paper, we propose a reinforcement learning approach to solve a realistic scheduling problem.
We apply it to an algorithm commonly executed in the high performance computing community, the Cholesky factorization.
Our algorithm uses graph neural networks in combination with an actor-critic algorithm (A2C) to build an adaptive representation of the problem on the fly.
arXiv Detail & Related papers (2020-11-09T10:57:21Z) - Combining Deep Learning and Optimization for Security-Constrained
Optimal Power Flow [94.24763814458686]
Security-constrained optimal power flow (SCOPF) is fundamental in power systems.
Modeling of APR within the SCOPF problem results in complex large-scale mixed-integer programs.
This paper proposes a novel approach that combines deep learning and robust optimization techniques.
arXiv Detail & Related papers (2020-07-14T12:38:21Z) - SUNRISE: A Simple Unified Framework for Ensemble Learning in Deep
Reinforcement Learning [102.78958681141577]
We present SUNRISE, a simple unified ensemble method, which is compatible with various off-policy deep reinforcement learning algorithms.
SUNRISE integrates two key ingredients: (a) ensemble-based weighted Bellman backups, which re-weight target Q-values based on uncertainty estimates from a Q-ensemble, and (b) an inference method that selects actions using the highest upper-confidence bounds for efficient exploration.
arXiv Detail & Related papers (2020-07-09T17:08:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.