Understanding and Preparing Data of Industrial Processes for Machine
Learning Applications
- URL: http://arxiv.org/abs/2109.03469v1
- Date: Wed, 8 Sep 2021 07:39:11 GMT
- Title: Understanding and Preparing Data of Industrial Processes for Machine
Learning Applications
- Authors: Philipp Fleck, Manfred K\"ugel, Michael Kommenda
- Abstract summary: This paper addresses the challenge of missing values due to sensor unavailability at different production units of nonlinear production lines.
In cases where only a small proportion of the data is missing, those missing values can often be imputed.
This paper presents a technique, that allows to utilize all of the available data without the need of removing large amounts of observations.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Industrial applications of machine learning face unique challenges due to the
nature of raw industry data. Preprocessing and preparing raw industrial data
for machine learning applications is a demanding task that often takes more
time and work than the actual modeling process itself and poses additional
challenges. This paper addresses one of those challenges, specifically, the
challenge of missing values due to sensor unavailability at different
production units of nonlinear production lines. In cases where only a small
proportion of the data is missing, those missing values can often be imputed.
In cases of large proportions of missing data, imputing is often not feasible,
and removing observations containing missing values is often the only option.
This paper presents a technique, that allows to utilize all of the available
data without the need of removing large amounts of observations where data is
only partially available. We do not only discuss the principal idea of the
presented method, but also show different possible implementations that can be
applied depending on the data at hand. Finally, we demonstrate the application
of the presented method with data from a steel production plant.
Related papers
- The Frontier of Data Erasure: Machine Unlearning for Large Language Models [56.26002631481726]
Large Language Models (LLMs) are foundational to AI advancements.
LLMs pose risks by potentially memorizing and disseminating sensitive, biased, or copyrighted information.
Machine unlearning emerges as a cutting-edge solution to mitigate these concerns.
arXiv Detail & Related papers (2024-03-23T09:26:15Z) - Can LLMs Separate Instructions From Data? And What Do We Even Mean By That? [60.50127555651554]
Large Language Models (LLMs) show impressive results in numerous practical applications, but they lack essential safety features.
This makes them vulnerable to manipulations such as indirect prompt injections and generally unsuitable for safety-critical tasks.
We introduce a formal measure for instruction-data separation and an empirical variant that is calculable from a model's outputs.
arXiv Detail & Related papers (2024-03-11T15:48:56Z) - Controllable Image Synthesis of Industrial Data Using Stable Diffusion [2.021800129069459]
We propose a new approach for reusing general-purpose pre-trained generative models on industrial data.
First, we let the model learn the new concept, entailing the novel data distribution.
Then, we force it to learn to condition the generative process, producing industrial images that satisfy well-defined topological characteristics.
arXiv Detail & Related papers (2024-01-06T08:09:24Z) - How to Do Machine Learning with Small Data? -- A Review from an
Industrial Perspective [1.443696537295348]
Authors focus on interpreting the general term of "small data" and their engineering and industrial application role.
Small data is defined in terms of various characteristics compared to big data, and a machine learning formalism was introduced.
Five critical challenges of machine learning with small data in industrial applications are presented.
arXiv Detail & Related papers (2023-11-13T07:39:13Z) - Solving Data Quality Problems with Desbordante: a Demo [35.75243108496634]
Desbordante is an open-source data profiler that aims to close this gap.
It is built with emphasis on industrial application: it is efficient, scalable, resilient to crashes, and provides explanations.
In this demonstration, we show several scenarios that allow end users to solve different data quality problems.
arXiv Detail & Related papers (2023-07-27T15:26:26Z) - Privacy Adhering Machine Un-learning in NLP [66.17039929803933]
In real world industry use Machine Learning to build models on user data.
Such mandates require effort both in terms of data as well as model retraining.
continuous removal of data and model retraining steps do not scale.
We propose textitMachine Unlearning to tackle this challenge.
arXiv Detail & Related papers (2022-12-19T16:06:45Z) - Learnware: Small Models Do Big [69.88234743773113]
The prevailing big model paradigm, which has achieved impressive results in natural language processing and computer vision applications, has not yet addressed those issues, whereas becoming a serious source of carbon emissions.
This article offers an overview of the learnware paradigm, which attempts to enable users not need to build machine learning models from scratch, with the hope of reusing small models to do things even beyond their original purposes.
arXiv Detail & Related papers (2022-10-07T15:55:52Z) - Deep Learning based pipeline for anomaly detection and quality
enhancement in industrial binder jetting processes [68.8204255655161]
Anomaly detection describes methods of finding abnormal states, instances or data points that differ from a normal value space.
This paper contributes to a data-centric way of approaching artificial intelligence in industrial production.
arXiv Detail & Related papers (2022-09-21T08:14:34Z) - PROMISSING: Pruning Missing Values in Neural Networks [0.0]
We propose a simple and intuitive yet effective method for pruning missing values (PROMISSING) during learning and inference steps in neural networks.
Our experiments show that PROMISSING results in similar prediction performance compared to various imputation techniques.
arXiv Detail & Related papers (2022-06-03T15:37:27Z) - SSSE: Efficiently Erasing Samples from Trained Machine Learning Models [103.43466657962242]
We propose an efficient and effective algorithm, SSSE, for samples erasure.
In certain cases SSSE can erase samples almost as well as the optimal, yet impractical, gold standard of training a new model from scratch with only the permitted data.
arXiv Detail & Related papers (2021-07-08T14:17:24Z) - MAIN: Multihead-Attention Imputation Networks [4.427447378048202]
We propose a novel mechanism based on multi-head attention which can be applied effortlessly in any model.
Our method inductively models patterns of missingness in the input data in order to increase the performance of the downstream task.
arXiv Detail & Related papers (2021-02-10T13:50:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.