Ensuring the Robustness and Reliability of Data-Driven Knowledge
Discovery Models in Production and Manufacturing
- URL: http://arxiv.org/abs/2007.14791v1
- Date: Tue, 28 Jul 2020 14:21:14 GMT
- Title: Ensuring the Robustness and Reliability of Data-Driven Knowledge
Discovery Models in Production and Manufacturing
- Authors: Shailesh Tripathi, David Muhr, Brunner Manuel, Frank Emmert-Streib,
Herbert Jodlbauer, and Matthias Dehmer
- Abstract summary: Cross-industry standard process for data mining (CRISP-DM) designed to address data- and model-related issues.
This paper reviews several extensions of CRISP-DM models and various data-robustness-- and model-robustness-related problems in machine learning.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The implementation of robust, stable, and user-centered data analytics and
machine learning models is confronted by numerous challenges in production and
manufacturing. Therefore, a systematic approach is required to develop,
evaluate, and deploy such models. The data-driven knowledge discovery framework
provides an orderly partition of the data-mining processes to ensure the
practical implementation of data analytics and machine learning models.
However, the practical application of robust industry-specific data-driven
knowledge discovery models faces multiple data-- and model-development--related
issues. These issues should be carefully addressed by allowing a flexible,
customized, and industry-specific knowledge discovery framework; in our case,
this takes the form of the cross-industry standard process for data mining
(CRISP-DM). This framework is designed to ensure active cooperation between
different phases to adequately address data- and model-related issues. In this
paper, we review several extensions of CRISP-DM models and various
data-robustness-- and model-robustness--related problems in machine learning,
which currently lacks proper cooperation between data experts and business
experts because of the limitations of data-driven knowledge discovery models.
Related papers
- iNNspector: Visual, Interactive Deep Model Debugging [8.997568393450768]
We propose a conceptual framework structuring the data space of deep learning experiments.
Our framework captures design dimensions and proposes mechanisms to make this data explorable and tractable.
We present the iNNspector system, which enables tracking of deep learning experiments and provides interactive visualizations of the data.
arXiv Detail & Related papers (2024-07-25T12:48:41Z) - Data-Juicer Sandbox: A Comprehensive Suite for Multimodal Data-Model Co-development [67.55944651679864]
We present a novel sandbox suite tailored for integrated data-model co-development.
This sandbox provides a comprehensive experimental platform, enabling rapid iteration and insight-driven refinement of both data and models.
We also uncover fruitful insights gleaned from exhaustive benchmarks, shedding light on the critical interplay between data quality, diversity, and model behavior.
arXiv Detail & Related papers (2024-07-16T14:40:07Z) - HEMM: Holistic Evaluation of Multimodal Foundation Models [91.60364024897653]
Multimodal foundation models can holistically process text alongside images, video, audio, and other sensory modalities.
It is challenging to characterize and study progress in multimodal foundation models, given the range of possible modeling decisions, tasks, and domains.
arXiv Detail & Related papers (2024-07-03T18:00:48Z) - Learning Paradigms and Modelling Methodologies for Digital Twins in Process Industry [1.1060425537315088]
Digital Twins (DTs) are virtual replicas of physical manufacturing systems that combine sensor data with sophisticated data-based or physics-based models, or a combination thereof, to tackle a variety of industrial-relevant tasks like process monitoring, predictive control or decision support.
The backbone of a DT, i.e. the concrete modelling methodologies and architectural frameworks supporting these models, are complex, diverse and evolve fast, necessitating a thorough understanding of the latest state-of-the-art methods and trends to stay on top of a highly competitive market.
arXiv Detail & Related papers (2024-07-02T14:05:10Z) - AI Competitions and Benchmarks: Dataset Development [42.164845505628506]
This chapter provides a comprehensive overview of established methodological tools, enriched by our practical experience.
We develop the tasks involved in dataset development and offer insights into their effective management.
Then, we provide more details about the implementation process which includes data collection, transformation, and quality evaluation.
arXiv Detail & Related papers (2024-04-15T12:01:42Z) - Federated Learning with Projected Trajectory Regularization [65.6266768678291]
Federated learning enables joint training of machine learning models from distributed clients without sharing their local data.
One key challenge in federated learning is to handle non-identically distributed data across the clients.
We propose a novel federated learning framework with projected trajectory regularization (FedPTR) for tackling the data issue.
arXiv Detail & Related papers (2023-12-22T02:12:08Z) - TSGM: A Flexible Framework for Generative Modeling of Synthetic Time Series [61.436361263605114]
Time series data are often scarce or highly sensitive, which precludes the sharing of data between researchers and industrial organizations.
We introduce Time Series Generative Modeling (TSGM), an open-source framework for the generative modeling of synthetic time series.
arXiv Detail & Related papers (2023-05-19T10:11:21Z) - Knowledge Augmented Machine Learning with Applications in Autonomous
Driving: A Survey [37.84106999449108]
This work provides an overview of existing techniques and methods that combine data-driven models with existing knowledge.
The identified approaches are structured according to the categories knowledge integration, extraction and conformity.
In particular, we address the application of the presented methods in the field of autonomous driving.
arXiv Detail & Related papers (2022-05-10T07:25:32Z) - Data and its (dis)contents: A survey of dataset development and use in
machine learning research [11.042648980854487]
We survey the many concerns raised about the way we collect and use data in machine learning.
We advocate that a more cautious and thorough understanding of data is necessary to address several of the practical and ethical issues of the field.
arXiv Detail & Related papers (2020-12-09T22:13:13Z) - Relating by Contrasting: A Data-efficient Framework for Multimodal
Generative Models [86.9292779620645]
We develop a contrastive framework for generative model learning, allowing us to train the model not just by the commonality between modalities, but by the distinction between "related" and "unrelated" multimodal data.
Under our proposed framework, the generative model can accurately identify related samples from unrelated ones, making it possible to make use of the plentiful unlabeled, unpaired multimodal data.
arXiv Detail & Related papers (2020-07-02T15:08:11Z) - Towards CRISP-ML(Q): A Machine Learning Process Model with Quality
Assurance Methodology [53.063411515511056]
We propose a process model for the development of machine learning applications.
The first phase combines business and data understanding as data availability oftentimes affects the feasibility of the project.
The sixth phase covers state-of-the-art approaches for monitoring and maintenance of a machine learning applications.
arXiv Detail & Related papers (2020-03-11T08:25:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.