Related papers: Ensuring the Robustness and Reliability of Data-Driven Knowledge Discovery Models in Production and Manufacturing

Ensuring the Robustness and Reliability of Data-Driven Knowledge Discovery Models in Production and Manufacturing

URL: http://arxiv.org/abs/2007.14791v1
Date: Tue, 28 Jul 2020 14:21:14 GMT
Title: Ensuring the Robustness and Reliability of Data-Driven Knowledge Discovery Models in Production and Manufacturing
Authors: Shailesh Tripathi, David Muhr, Brunner Manuel, Frank Emmert-Streib, Herbert Jodlbauer, and Matthias Dehmer
Abstract summary: Cross-industry standard process for data mining (CRISP-DM) designed to address data- and model-related issues. This paper reviews several extensions of CRISP-DM models and various data-robustness-- and model-robustness-related problems in machine learning.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The implementation of robust, stable, and user-centered data analytics and machine learning models is confronted by numerous challenges in production and manufacturing. Therefore, a systematic approach is required to develop, evaluate, and deploy such models. The data-driven knowledge discovery framework provides an orderly partition of the data-mining processes to ensure the practical implementation of data analytics and machine learning models. However, the practical application of robust industry-specific data-driven knowledge discovery models faces multiple data-- and model-development--related issues. These issues should be carefully addressed by allowing a flexible, customized, and industry-specific knowledge discovery framework; in our case, this takes the form of the cross-industry standard process for data mining (CRISP-DM). This framework is designed to ensure active cooperation between different phases to adequately address data- and model-related issues. In this paper, we review several extensions of CRISP-DM models and various data-robustness-- and model-robustness--related problems in machine learning, which currently lacks proper cooperation between data experts and business experts because of the limitations of data-driven knowledge discovery models.

Related papers

Anomaly Detection and Generation with Diffusion Models: A Survey [51.61574868316922]
Anomaly detection (AD) plays a pivotal role across diverse domains, including cybersecurity, finance, healthcare, and industrial manufacturing.<n>Recent advancements in deep learning, specifically diffusion models (DMs), have sparked significant interest.<n>This survey aims to guide researchers and practitioners in leveraging DMs for innovative AD solutions across diverse applications.
arXiv Detail & Related papers (2025-06-11T03:29:18Z)
AdvKT: An Adversarial Multi-Step Training Framework for Knowledge Tracing [64.79967583649407]
Knowledge Tracing (KT) monitors students' knowledge states and simulates their responses to question sequences. Existing KT models typically follow a single-step training paradigm, which leads to significant error accumulation. We propose a novel Adversarial Multi-Step Training Framework for Knowledge Tracing (AdvKT) which focuses on the multi-step KT task.
arXiv Detail & Related papers (2025-04-07T03:31:57Z)
Empowering Time Series Analysis with Synthetic Data: A Survey and Outlook in the Era of Foundation Models [104.17057231661371]
Time series analysis is crucial for understanding dynamics of complex systems. Recent advances in foundation models have led to task-agnostic Time Series Foundation Models (TSFMs) and Large Language Model-based Time Series Models (TSLLMs) Their success depends on large, diverse, and high-quality datasets, which are challenging to build due to regulatory, diversity, quality, and quantity constraints. This survey provides a comprehensive review of synthetic data for TSFMs and TSLLMs, analyzing data generation strategies, their role in model pretraining, fine-tuning, and evaluation, and identifying future research directions.
arXiv Detail & Related papers (2025-03-14T13:53:46Z)
Mitigating Attrition: Data-Driven Approach Using Machine Learning and Data Engineering [0.0]
This paper presents a novel data-driven approach to mitigating employee attrition using machine learning and data engineering techniques. The proposed framework integrates data from various human resources systems and leverages advanced feature engineering to capture a comprehensive set of factors influencing attrition.
arXiv Detail & Related papers (2025-02-25T05:29:45Z)
iNNspector: Visual, Interactive Deep Model Debugging [8.997568393450768]
We propose a conceptual framework structuring the data space of deep learning experiments. Our framework captures design dimensions and proposes mechanisms to make this data explorable and tractable. We present the iNNspector system, which enables tracking of deep learning experiments and provides interactive visualizations of the data.
arXiv Detail & Related papers (2024-07-25T12:48:41Z)
Data-Juicer Sandbox: A Comprehensive Suite for Multimodal Data-Model Co-development [67.55944651679864]
We present a novel sandbox suite tailored for integrated data-model co-development. This sandbox provides a comprehensive experimental platform, enabling rapid iteration and insight-driven refinement of both data and models. We also uncover fruitful insights gleaned from exhaustive benchmarks, shedding light on the critical interplay between data quality, diversity, and model behavior.
arXiv Detail & Related papers (2024-07-16T14:40:07Z)
HEMM: Holistic Evaluation of Multimodal Foundation Models [91.60364024897653]
Multimodal foundation models can holistically process text alongside images, video, audio, and other sensory modalities. It is challenging to characterize and study progress in multimodal foundation models, given the range of possible modeling decisions, tasks, and domains.
arXiv Detail & Related papers (2024-07-03T18:00:48Z)
Learning Paradigms and Modelling Methodologies for Digital Twins in Process Industry [1.1060425537315088]
Digital Twins (DTs) are virtual replicas of physical manufacturing systems that combine sensor data with sophisticated data-based or physics-based models, or a combination thereof, to tackle a variety of industrial-relevant tasks like process monitoring, predictive control or decision support. The backbone of a DT, i.e. the concrete modelling methodologies and architectural frameworks supporting these models, are complex, diverse and evolve fast, necessitating a thorough understanding of the latest state-of-the-art methods and trends to stay on top of a highly competitive market.
arXiv Detail & Related papers (2024-07-02T14:05:10Z)
AI Competitions and Benchmarks: Dataset Development [42.164845505628506]
This chapter provides a comprehensive overview of established methodological tools, enriched by our practical experience. We develop the tasks involved in dataset development and offer insights into their effective management. Then, we provide more details about the implementation process which includes data collection, transformation, and quality evaluation.
arXiv Detail & Related papers (2024-04-15T12:01:42Z)
Federated Learning with Projected Trajectory Regularization [65.6266768678291]
Federated learning enables joint training of machine learning models from distributed clients without sharing their local data. One key challenge in federated learning is to handle non-identically distributed data across the clients. We propose a novel federated learning framework with projected trajectory regularization (FedPTR) for tackling the data issue.
arXiv Detail & Related papers (2023-12-22T02:12:08Z)
TSGM: A Flexible Framework for Generative Modeling of Synthetic Time Series [61.436361263605114]
Time series data are often scarce or highly sensitive, which precludes the sharing of data between researchers and industrial organizations. We introduce Time Series Generative Modeling (TSGM), an open-source framework for the generative modeling of synthetic time series.
arXiv Detail & Related papers (2023-05-19T10:11:21Z)
Knowledge Augmented Machine Learning with Applications in Autonomous Driving: A Survey [37.84106999449108]
This work provides an overview of existing techniques and methods that combine data-driven models with existing knowledge. The identified approaches are structured according to the categories knowledge integration, extraction and conformity. In particular, we address the application of the presented methods in the field of autonomous driving.
arXiv Detail & Related papers (2022-05-10T07:25:32Z)
Data and its (dis)contents: A survey of dataset development and use in machine learning research [11.042648980854487]
We survey the many concerns raised about the way we collect and use data in machine learning. We advocate that a more cautious and thorough understanding of data is necessary to address several of the practical and ethical issues of the field.
arXiv Detail & Related papers (2020-12-09T22:13:13Z)
Relating by Contrasting: A Data-efficient Framework for Multimodal Generative Models [86.9292779620645]
We develop a contrastive framework for generative model learning, allowing us to train the model not just by the commonality between modalities, but by the distinction between "related" and "unrelated" multimodal data. Under our proposed framework, the generative model can accurately identify related samples from unrelated ones, making it possible to make use of the plentiful unlabeled, unpaired multimodal data.
arXiv Detail & Related papers (2020-07-02T15:08:11Z)
Towards CRISP-ML(Q): A Machine Learning Process Model with Quality Assurance Methodology [53.063411515511056]
We propose a process model for the development of machine learning applications. The first phase combines business and data understanding as data availability oftentimes affects the feasibility of the project. The sixth phase covers state-of-the-art approaches for monitoring and maintenance of a machine learning applications.
arXiv Detail & Related papers (2020-03-11T08:25:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.