Evaluating the Role of Data Enrichment Approaches Towards Rare Event Analysis in Manufacturing
- URL: http://arxiv.org/abs/2407.01644v1
- Date: Mon, 1 Jul 2024 00:05:56 GMT
- Title: Evaluating the Role of Data Enrichment Approaches Towards Rare Event Analysis in Manufacturing
- Authors: Chathurangi Shyalika, Ruwan Wickramarachchi, Fadi El Kalach, Ramy Harik, Amit Sheth,
- Abstract summary: Rare events are occurrences that take place with a significantly lower frequency than more common regular events.
In manufacturing, predicting such events is particularly important, as they lead to unplanned downtime, shortening equipment lifespan, and high energy consumption.
This paper evaluates the role of data enrichment techniques combined with supervised machine-learning techniques for rare event detection and prediction.
- Score: 1.3980986259786223
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Rare events are occurrences that take place with a significantly lower frequency than more common regular events. In manufacturing, predicting such events is particularly important, as they lead to unplanned downtime, shortening equipment lifespan, and high energy consumption. The occurrence of events is considered frequently-rare if observed in more than 10% of all instances, very-rare if it is 1-5%, moderately-rare if it is 5-10%, and extremely-rare if less than 1%. The rarity of events is inversely correlated with the maturity of a manufacturing industry. Typically, the rarity of events affects the multivariate data generated within a manufacturing process to be highly imbalanced, which leads to bias in predictive models. This paper evaluates the role of data enrichment techniques combined with supervised machine-learning techniques for rare event detection and prediction. To address the data scarcity, we use time series data augmentation and sampling methods to amplify the dataset with more multivariate features and data points while preserving the underlying time series patterns in the combined alterations. Imputation techniques are used in handling null values in datasets. Considering 15 learning models ranging from statistical learning to machine learning to deep learning methods, the best-performing model for the selected datasets is obtained and the efficacy of data enrichment is evaluated. Based on this evaluation, our results find that the enrichment procedure enhances up to 48% of F1 measure in rare failure event detection and prediction of supervised prediction models. We also conduct empirical and ablation experiments on the datasets to derive dataset-specific novel insights. Finally, we investigate the interpretability aspect of models for rare event prediction, considering multiple methods.
Related papers
- ReFine: Boosting Time Series Prediction of Extreme Events by Reweighting and Fine-tuning [0.0]
Extreme events are of great importance since they represent impactive occurrences.
accurately predicting these extreme events is challenging due to their rarity and irregularity.
We propose two strategies, reweighting and fine-tuning, to tackle the challenge.
arXiv Detail & Related papers (2024-09-21T19:29:29Z) - Measuring and Improving Attentiveness to Partial Inputs with Counterfactuals [91.59906995214209]
We propose a new evaluation method, Counterfactual Attentiveness Test (CAT)
CAT uses counterfactuals by replacing part of the input with its counterpart from a different example, expecting an attentive model to change its prediction.
We show that GPT3 becomes less attentive with an increased number of demonstrations, while its accuracy on the test data improves.
arXiv Detail & Related papers (2023-11-16T06:27:35Z) - A Comprehensive Survey on Rare Event Prediction [1.6385815610837167]
Rare event prediction involves identifying and forecasting events with a low probability using machine learning (ML) and data analysis.
This paper aims to identify gaps in the current literature and highlight the challenges of predicting rare events.
arXiv Detail & Related papers (2023-09-20T14:36:57Z) - DeepVol: Volatility Forecasting from High-Frequency Data with Dilated Causal Convolutions [53.37679435230207]
We propose DeepVol, a model based on Dilated Causal Convolutions that uses high-frequency data to forecast day-ahead volatility.
Our empirical results suggest that the proposed deep learning-based approach effectively learns global features from high-frequency data.
arXiv Detail & Related papers (2022-09-23T16:13:47Z) - Impact of Pretraining Term Frequencies on Few-Shot Reasoning [51.990349528930125]
We investigate how well pretrained language models reason with terms that are less frequent in the pretraining data.
We measure the strength of this correlation for a number of GPT-based language models on various numerical deduction tasks.
Although LMs exhibit strong performance at few-shot numerical reasoning tasks, our results raise the question of how much models actually generalize beyond pretraining data.
arXiv Detail & Related papers (2022-02-15T05:43:54Z) - Monte Carlo EM for Deep Time Series Anomaly Detection [6.312089019297173]
Time series data are often corrupted by outliers or other kinds of anomalies.
Recent approaches to anomaly detection and forecasting assume that the proportion of anomalies in the training data is small enough to ignore.
We present a technique for augmenting existing time series models so that they explicitly account for anomalies in the training data.
arXiv Detail & Related papers (2021-12-29T07:52:36Z) - Outlier Detection as Instance Selection Method for Feature Selection in
Time Series Classification [0.0]
Filter instances provided to feature selection methods for rare instances.
For some data sets, the resulting increase in performance was only a few percent.
For other datasets, we were able to achieve increases in performance of up to 16 percent.
arXiv Detail & Related papers (2021-11-16T14:44:33Z) - Towards Synthetic Multivariate Time Series Generation for Flare
Forecasting [5.098461305284216]
One of the limiting factors in training data-driven, rare-event prediction algorithms is the scarcity of the events of interest.
In this study, we explore the usefulness of the conditional generative adversarial network (CGAN) as a means to perform data-informed oversampling.
arXiv Detail & Related papers (2021-05-16T22:23:23Z) - Ambiguity in Sequential Data: Predicting Uncertain Futures with
Recurrent Models [110.82452096672182]
We propose an extension of the Multiple Hypothesis Prediction (MHP) model to handle ambiguous predictions with sequential data.
We also introduce a novel metric for ambiguous problems, which is better suited to account for uncertainties.
arXiv Detail & Related papers (2020-03-10T09:15:42Z) - A Multi-Channel Neural Graphical Event Model with Negative Evidence [76.51278722190607]
Event datasets are sequences of events of various types occurring irregularly over the time-line.
We propose a non-parametric deep neural network approach in order to estimate the underlying intensity functions.
arXiv Detail & Related papers (2020-02-21T23:10:50Z) - Transformer Hawkes Process [79.16290557505211]
We propose a Transformer Hawkes Process (THP) model, which leverages the self-attention mechanism to capture long-term dependencies.
THP outperforms existing models in terms of both likelihood and event prediction accuracy by a notable margin.
We provide a concrete example, where THP achieves improved prediction performance for learning multiple point processes when incorporating their relational information.
arXiv Detail & Related papers (2020-02-21T13:48:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.