Strategic Data Augmentation with CTGAN for Smart Manufacturing:
Enhancing Machine Learning Predictions of Paper Breaks in Pulp-and-Paper
Production
- URL: http://arxiv.org/abs/2311.09333v1
- Date: Wed, 15 Nov 2023 19:47:15 GMT
- Title: Strategic Data Augmentation with CTGAN for Smart Manufacturing:
Enhancing Machine Learning Predictions of Paper Breaks in Pulp-and-Paper
Production
- Authors: Hamed Khosravi, Sarah Farhadpour, Manikanta Grandhi, Ahmed Shoyeb
Raihan, Srinjoy Das, Imtiaz Ahmed
- Abstract summary: A significant challenge for predictive maintenance in the pulp-and-paper industry is the infrequency of paper breaks during the production process.
In this article, operational data is analyzed from a paper manufacturing machine in which paper breaks are relatively rare but have a high economic impact.
With the help of Conditional Generative Adrial Networks (CTGAN) and Synthetic Minority Oversampling Technique (SMOTE), we implement a novel data augmentation framework.
- Score: 3.2381236440149257
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: A significant challenge for predictive maintenance in the pulp-and-paper
industry is the infrequency of paper breaks during the production process. In
this article, operational data is analyzed from a paper manufacturing machine
in which paper breaks are relatively rare but have a high economic impact.
Utilizing a dataset comprising 18,398 instances derived from a quality
assurance protocol, we address the scarcity of break events (124 cases) that
pose a challenge for machine learning predictive models. With the help of
Conditional Generative Adversarial Networks (CTGAN) and Synthetic Minority
Oversampling Technique (SMOTE), we implement a novel data augmentation
framework. This method ensures that the synthetic data mirrors the distribution
of the real operational data but also seeks to enhance the performance metrics
of predictive modeling. Before and after the data augmentation, we evaluate
three different machine learning algorithms-Decision Trees (DT), Random Forest
(RF), and Logistic Regression (LR). Utilizing the CTGAN-enhanced dataset, our
study achieved significant improvements in predictive maintenance performance
metrics. The efficacy of CTGAN in addressing data scarcity was evident, with
the models' detection of machine breaks (Class 1) improving by over 30% for
Decision Trees, 20% for Random Forest, and nearly 90% for Logistic Regression.
With this methodological advancement, this study contributes to industrial
quality control and maintenance scheduling by addressing rare event prediction
in manufacturing processes.
Related papers
- PUMA: margin-based data pruning [51.12154122266251]
We focus on data pruning, where some training samples are removed based on the distance to the model classification boundary (i.e., margin)
We propose PUMA, a new data pruning strategy that computes the margin using DeepFool.
We show that PUMA can be used on top of the current state-of-the-art methodology in robustness, and it is able to significantly improve the model performance unlike the existing data pruning strategies.
arXiv Detail & Related papers (2024-05-10T08:02:20Z) - Comprehensive Study Of Predictive Maintenance In Industries Using Classification Models And LSTM Model [0.0]
The study aims to delve into various machine learning classification techniques, including Support Vector Machine (SVM), Random Forest, Logistic Regression, and Convolutional Neural Network LSTM-Based, for predicting and analyzing machine performance.
The primary objective of the study is to assess these algorithms' performance in predicting and analyzing machine performance, considering factors such as accuracy, precision, recall, and F1 score.
arXiv Detail & Related papers (2024-03-15T12:47:45Z) - Retrosynthesis prediction enhanced by in-silico reaction data
augmentation [66.5643280109899]
We present RetroWISE, a framework that employs a base model inferred from real paired data to perform in-silico reaction generation and augmentation.
On three benchmark datasets, RetroWISE achieves the best overall performance against state-of-the-art models.
arXiv Detail & Related papers (2024-01-31T07:40:37Z) - EsaCL: Efficient Continual Learning of Sparse Models [10.227171407348326]
Key challenge in the continual learning setting is to efficiently learn a sequence of tasks without forgetting how to perform previously learned tasks.
We propose a new method for efficient continual learning of sparse models (EsaCL) that can automatically prune redundant parameters without adversely impacting the model's predictive power.
arXiv Detail & Related papers (2024-01-11T04:59:44Z) - TranDRL: A Transformer-Driven Deep Reinforcement Learning Enabled Prescriptive Maintenance Framework [58.474610046294856]
Industrial systems demand reliable predictive maintenance strategies to enhance operational efficiency and reduce downtime.
This paper introduces an integrated framework that leverages the capabilities of the Transformer model-based neural networks and deep reinforcement learning (DRL) algorithms to optimize system maintenance actions.
arXiv Detail & Related papers (2023-09-29T02:27:54Z) - Advanced Deep Regression Models for Forecasting Time Series Oil
Production [3.1383147856975633]
This research aims to develop advanced data-driven regression models using sequential convolutions and long short-term memory (LSTM) units.
It reveals that the LSTM-based sequence learning model can predict oil production better than the 1-D convolutional neural network (CNN) with mean absolute error (MAE) and R2 score of 111.16 and 0.98, respectively.
arXiv Detail & Related papers (2023-08-30T15:54:06Z) - A Scalable and Efficient Iterative Method for Copying Machine Learning
Classifiers [0.802904964931021]
This paper introduces a novel sequential approach that significantly reduces the amount of computational resources needed to train or maintain a copy of a machine learning model.
The effectiveness of the sequential approach is demonstrated through experiments with synthetic and real-world datasets, showing significant reductions in time and resources, while maintaining or improving accuracy.
arXiv Detail & Related papers (2023-02-06T10:07:41Z) - Privacy Adhering Machine Un-learning in NLP [66.17039929803933]
In real world industry use Machine Learning to build models on user data.
Such mandates require effort both in terms of data as well as model retraining.
continuous removal of data and model retraining steps do not scale.
We propose textitMachine Unlearning to tackle this challenge.
arXiv Detail & Related papers (2022-12-19T16:06:45Z) - Robust Trajectory Prediction against Adversarial Attacks [84.10405251683713]
Trajectory prediction using deep neural networks (DNNs) is an essential component of autonomous driving systems.
These methods are vulnerable to adversarial attacks, leading to serious consequences such as collisions.
In this work, we identify two key ingredients to defend trajectory prediction models against adversarial attacks.
arXiv Detail & Related papers (2022-07-29T22:35:05Z) - Masked Self-Supervision for Remaining Useful Lifetime Prediction in
Machine Tools [3.175781028910441]
Prediction of Remaining Useful Lifetime (RUL) in the modern manufacturing workplace is essential in Industry 4.0.
With the availability of deep learning approaches, the great potential and prospect of utilizing these for RUL prediction have resulted in several models.
This is designed to seek to build a deep learning model for RUL prediction by utilizing unlabeled data.
arXiv Detail & Related papers (2022-07-04T06:08:01Z) - Leveraging Unlabeled Data to Predict Out-of-Distribution Performance [63.740181251997306]
Real-world machine learning deployments are characterized by mismatches between the source (training) and target (test) distributions.
In this work, we investigate methods for predicting the target domain accuracy using only labeled source data and unlabeled target data.
We propose Average Thresholded Confidence (ATC), a practical method that learns a threshold on the model's confidence, predicting accuracy as the fraction of unlabeled examples.
arXiv Detail & Related papers (2022-01-11T23:01:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.