Big Machinery Data Preprocessing Methodology for Data-Driven Models in
Prognostics and Health Management
- URL: http://arxiv.org/abs/2110.04256v1
- Date: Fri, 8 Oct 2021 17:10:12 GMT
- Title: Big Machinery Data Preprocessing Methodology for Data-Driven Models in
Prognostics and Health Management
- Authors: Sergio Cofre-Martel, Enrique Lopez Droguett, Mohammad Modarres
- Abstract summary: This paper presents a comprehensive, step-by-step pipeline for the preprocessing of monitoring data from complex systems.
The importance of expert knowledge is discussed in the context of data selection and label generation.
Two case studies are presented for validation, with the end goal of creating clean data sets with healthy and unhealthy labels.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Sensor monitoring networks and advances in big data analytics have guided the
reliability engineering landscape to a new era of big machinery data. Low-cost
sensors, along with the evolution of the internet of things and industry 4.0,
have resulted in rich databases that can be analyzed through prognostics and
health management (PHM) frameworks. Several da-ta-driven models (DDMs) have
been proposed and applied for diagnostics and prognostics purposes in complex
systems. However, many of these models are developed using simulated or
experimental data sets, and there is still a knowledge gap for applications in
real operating systems. Furthermore, little attention has been given to the
required data preprocessing steps compared to the training processes of these
DDMs. Up to date, research works do not follow a formal and consistent data
preprocessing guideline for PHM applications. This paper presents a
comprehensive, step-by-step pipeline for the preprocessing of monitoring data
from complex systems aimed for DDMs. The importance of expert knowledge is
discussed in the context of data selection and label generation. Two case
studies are presented for validation, with the end goal of creating clean data
sets with healthy and unhealthy labels that are then used to train machinery
health state classifiers.
Related papers
- Extracting Training Data from Unconditional Diffusion Models [76.85077961718875]
diffusion probabilistic models (DPMs) are being employed as mainstream models for generative artificial intelligence (AI)
We aim to establish a theoretical understanding of memorization in DPMs with 1) a memorization metric for theoretical analysis, 2) an analysis of conditional memorization with informative and random labels, and 3) two better evaluation metrics for measuring memorization.
Based on the theoretical analysis, we propose a novel data extraction method called textbfSurrogate condItional Data Extraction (SIDE) that leverages a trained on generated data as a surrogate condition to extract training data directly from unconditional diffusion models.
arXiv Detail & Related papers (2024-06-18T16:20:12Z) - An Integrated Data Processing Framework for Pretraining Foundation Models [57.47845148721817]
Researchers and practitioners often have to manually curate datasets from difference sources.
We propose a data processing framework that integrates a Processing Module and an Analyzing Module.
The proposed framework is easy to use and highly flexible.
arXiv Detail & Related papers (2024-02-26T07:22:51Z) - Recent Advances in Predictive Modeling with Electronic Health Records [71.19967863320647]
utilizing EHR data for predictive modeling presents several challenges due to its unique characteristics.
Deep learning has demonstrated its superiority in various applications, including healthcare.
arXiv Detail & Related papers (2024-02-02T00:31:01Z) - Design & Implementation of Automatic Machine Condition Monitoring and
Maintenance System in Limited Resource Situations [0.0]
In the era of the fourth industrial revolution, it is essential to automate fault detection and diagnosis of machineries.
Some machines health monitoring systems are used globally but they are expensive and need trained personnel to operate and analyse.
Predictive maintenance and occupational health and safety culture are not available due to inadequate infrastructure, lack of skilled manpower, financial crisis, and others in developing countries.
arXiv Detail & Related papers (2024-01-22T08:06:04Z) - Integration of Domain Expert-Centric Ontology Design into the CRISP-DM for Cyber-Physical Production Systems [45.05372822216111]
Methods from Machine Learning (ML) and Data Mining (DM) have proven to be promising in extracting complex and hidden patterns from the data collected.
However, such data-driven projects, usually performed with the Cross-Industry Standard Process for Data Mining (CRISPDM), often fail due to the disproportionate amount of time needed for understanding and preparing the data.
This contribution intends present an integrated approach so that data scientists are able to more quickly and reliably gain insights into the CPPS challenges.
arXiv Detail & Related papers (2023-07-21T15:04:00Z) - Optimizing the AI Development Process by Providing the Best Support
Environment [0.756282840161499]
Main stages of machine learning are problem understanding, data management, model building, model deployment and maintenance.
The framework was built using python language to perform data augmentation using deep learning advancements.
arXiv Detail & Related papers (2023-04-29T00:44:50Z) - SensorSCAN: Self-Supervised Learning and Deep Clustering for Fault
Diagnosis in Chemical Processes [2.398451252047814]
We propose SensorSCAN, a novel method for unsupervised fault detection and diagnosis.
We demonstrate our model's performance on two publicly available datasets of the Tennessee Eastman Process with various faults.
Our method is suitable for real-world applications where the number of faults is not known in advance.
arXiv Detail & Related papers (2022-08-17T10:24:37Z) - Deep Reinforcement Learning Assisted Federated Learning Algorithm for
Data Management of IIoT [82.33080550378068]
The continuous expanded scale of the industrial Internet of Things (IIoT) leads to IIoT equipments generating massive amounts of user data every moment.
How to manage these time series data in an efficient and safe way in the field of IIoT is still an open issue.
This paper studies the FL technology applications to manage IIoT equipment data in wireless network environments.
arXiv Detail & Related papers (2022-02-03T07:12:36Z) - Data Mining with Big Data in Intrusion Detection Systems: A Systematic
Literature Review [68.15472610671748]
Cloud computing has become a powerful and indispensable technology for complex, high performance and scalable computation.
The rapid rate and volume of data creation has begun to pose significant challenges for data management and security.
The design and deployment of intrusion detection systems (IDS) in the big data setting has, therefore, become a topic of importance.
arXiv Detail & Related papers (2020-05-23T20:57:12Z) - Sampling for Deep Learning Model Diagnosis (Technical Report) [5.8057675678464555]
Black-box nature of deep neural networks is a barrier to adoption in applications such as medical diagnosis.
We develop a novel data sampling technique that produce approximate but accurate results for these model debug queries.
We evaluate our techniques on one standard computer vision and one scientific data set and demonstrate that our sampling technique outperforms a variety of state-of-the-art alternatives in terms of query accuracy.
arXiv Detail & Related papers (2020-02-22T19:24:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.