Integration of Domain Expert-Centric Ontology Design into the CRISP-DM for Cyber-Physical Production Systems
- URL: http://arxiv.org/abs/2307.11637v2
- Date: Tue, 9 Jul 2024 09:34:26 GMT
- Title: Integration of Domain Expert-Centric Ontology Design into the CRISP-DM for Cyber-Physical Production Systems
- Authors: Milapji Singh Gill, Tom Westermann, Marvin Schieseck, Alexander Fay,
- Abstract summary: Methods from Machine Learning (ML) and Data Mining (DM) have proven to be promising in extracting complex and hidden patterns from the data collected.
However, such data-driven projects, usually performed with the Cross-Industry Standard Process for Data Mining (CRISPDM), often fail due to the disproportionate amount of time needed for understanding and preparing the data.
This contribution intends present an integrated approach so that data scientists are able to more quickly and reliably gain insights into the CPPS challenges.
- Score: 45.05372822216111
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In the age of Industry 4.0 and Cyber-Physical Production Systems (CPPSs) vast amounts of potentially valuable data are being generated. Methods from Machine Learning (ML) and Data Mining (DM) have proven to be promising in extracting complex and hidden patterns from the data collected. The knowledge obtained can in turn be used to improve tasks like diagnostics or maintenance planning. However, such data-driven projects, usually performed with the Cross-Industry Standard Process for Data Mining (CRISP-DM), often fail due to the disproportionate amount of time needed for understanding and preparing the data. The application of domain-specific ontologies has demonstrated its advantageousness in a wide variety of Industry 4.0 application scenarios regarding the aforementioned challenges. However, workflows and artifacts from ontology design for CPPSs have not yet been systematically integrated into the CRISP-DM. Accordingly, this contribution intends to present an integrated approach so that data scientists are able to more quickly and reliably gain insights into the CPPS. The result is exemplarily applied to an anomaly detection use case.
Related papers
- Integrating Ontology Design with the CRISP-DM in the context of Cyber-Physical Systems Maintenance [41.85920785319125]
The proposed method is divided into three phases.
In phase one, ontology requirements are systematically specified, defining the relevant knowledge scope.
In phase two, CPS life cycle data is contextualized using domain-specific ontological artifacts.
This formalized domain knowledge is then utilized in the Cross-Industry Standard Process for Data Mining (CRISP-DM) to efficiently extract new insights from the data.
arXiv Detail & Related papers (2024-07-09T15:06:47Z) - AI Competitions and Benchmarks: Dataset Development [42.164845505628506]
This chapter provides a comprehensive overview of established methodological tools, enriched by our practical experience.
We develop the tasks involved in dataset development and offer insights into their effective management.
Then, we provide more details about the implementation process which includes data collection, transformation, and quality evaluation.
arXiv Detail & Related papers (2024-04-15T12:01:42Z) - Leveraging Large Language Model for Automatic Evolving of Industrial
Data-Centric R&D Cycle [20.30730316993658]
Data-driven solutions are emerging as powerful tools to address multifarious industrial tasks.
Although data-centric R&D has been pivotal in harnessing these solutions, it often comes with significant costs in terms of human, computational, and time resources.
This paper delves into the potential of large language models (LLMs) to expedite the evolution cycle of data-centric R&D.
arXiv Detail & Related papers (2023-10-17T13:18:02Z) - TSGM: A Flexible Framework for Generative Modeling of Synthetic Time Series [61.436361263605114]
Time series data are often scarce or highly sensitive, which precludes the sharing of data between researchers and industrial organizations.
We introduce Time Series Generative Modeling (TSGM), an open-source framework for the generative modeling of synthetic time series.
arXiv Detail & Related papers (2023-05-19T10:11:21Z) - Time-Series Pattern Recognition in Smart Manufacturing Systems: A
Literature Review and Ontology [3.5097082077065003]
This paper provides a structured perspective of the current state of time-series pattern recognition in manufacturing.
It aims to provide practical and actionable guidelines for application and recommendations for advancing time-series analytics.
arXiv Detail & Related papers (2023-01-29T17:18:59Z) - Semantic Segmentation of Vegetation in Remote Sensing Imagery Using Deep
Learning [77.34726150561087]
We propose an approach for creating a multi-modal and large-temporal dataset comprised of publicly available Remote Sensing data.
We use Convolutional Neural Networks (CNN) models that are capable of separating different classes of vegetation.
arXiv Detail & Related papers (2022-09-28T18:51:59Z) - Leveraging the structure of dynamical systems for data-driven modeling [111.45324708884813]
We consider the impact of the training set and its structure on the quality of the long-term prediction.
We show how an informed design of the training set, based on invariants of the system and the structure of the underlying attractor, significantly improves the resulting models.
arXiv Detail & Related papers (2021-12-15T20:09:20Z) - Big Machinery Data Preprocessing Methodology for Data-Driven Models in
Prognostics and Health Management [0.0]
This paper presents a comprehensive, step-by-step pipeline for the preprocessing of monitoring data from complex systems.
The importance of expert knowledge is discussed in the context of data selection and label generation.
Two case studies are presented for validation, with the end goal of creating clean data sets with healthy and unhealthy labels.
arXiv Detail & Related papers (2021-10-08T17:10:12Z) - Automated Machine Learning Techniques for Data Streams [91.3755431537592]
This paper surveys the state-of-the-art open-source AutoML tools, applies them to data collected from streams, and measures how their performance changes over time.
The results show that off-the-shelf AutoML tools can provide satisfactory results but in the presence of concept drift, detection or adaptation techniques have to be applied to maintain the predictive accuracy over time.
arXiv Detail & Related papers (2021-06-14T11:42:46Z) - Data Mining with Big Data in Intrusion Detection Systems: A Systematic
Literature Review [68.15472610671748]
Cloud computing has become a powerful and indispensable technology for complex, high performance and scalable computation.
The rapid rate and volume of data creation has begun to pose significant challenges for data management and security.
The design and deployment of intrusion detection systems (IDS) in the big data setting has, therefore, become a topic of importance.
arXiv Detail & Related papers (2020-05-23T20:57:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.