Related papers: Data Quality Over Quantity: Pitfalls and Guidelines for Process Analytics

Data Quality Over Quantity: Pitfalls and Guidelines for Process Analytics

URL: http://arxiv.org/abs/2211.06440v2
Date: Wed, 5 Apr 2023 23:56:44 GMT
Title: Data Quality Over Quantity: Pitfalls and Guidelines for Process Analytics
Authors: Lim C. Siang, Shams Elnawawi, Lee D. Rippon, Daniel L. O'Connor and R. Bhushan Gopaluni
Abstract summary: Data pre-processing has out-sized influence on the success of real-world artificial intelligence applications. We present practical considerations for pre-processing industrial time series data to inform the efficient development of reliable soft sensors.
Score: 0.8399688944263843
License: http://creativecommons.org/licenses/by/4.0/
Abstract: A significant portion of the effort involved in advanced process control, process analytics, and machine learning involves acquiring and preparing data. Literature often emphasizes increasingly complex modelling techniques with incremental performance improvements. However, when industrial case studies are published they often lack important details on data acquisition and preparation. Although data pre-processing is unfairly maligned as trivial and technically uninteresting, in practice it has an out-sized influence on the success of real-world artificial intelligence applications. This work describes best practices for acquiring and preparing operating data to pursue data-driven modelling and control opportunities in industrial processes. We present practical considerations for pre-processing industrial time series data to inform the efficient development of reliable soft sensors that provide valuable process insights.

Related papers

Advancements in synthetic data extraction for industrial injection molding [0.0]
We investigate the feasibility of incorporating synthetic data into the training process of the injection molding process.<n>Our results suggest that the inclusion of synthetic data improves the model's ability to handle different scenarios.
arXiv Detail & Related papers (2025-11-11T11:19:54Z)
More Data or Better Data? A Critical Analysis of Data Selection and Synthesis for Mathematical Reasoning [47.13636836547429]
We conduct a comprehensive analysis of open-source datasets and data synthesis techniques for mathematical reasoning.<n>Our findings highlight that structuring data in more interpretable formats, or distilling from stronger models often outweighs simply scaling up data volume.
arXiv Detail & Related papers (2025-10-08T16:07:26Z)
Sparse Attention-driven Quality Prediction for Production Process Optimization in Digital Twins [53.70191138561039]
We propose to deploy a digital twin of the production line by encoding its operational logic in a data-driven approach. We adopt a quality prediction model for production process based on self-attention-enabled temporal convolutional neural networks. Our operation experiments on a specific tobacco shredding line demonstrate that the proposed digital twin-based production process optimization method fosters seamless integration between virtual and real production lines.
arXiv Detail & Related papers (2024-05-20T09:28:23Z)
AI Competitions and Benchmarks: Dataset Development [42.164845505628506]
This chapter provides a comprehensive overview of established methodological tools, enriched by our practical experience. We develop the tasks involved in dataset development and offer insights into their effective management. Then, we provide more details about the implementation process which includes data collection, transformation, and quality evaluation.
arXiv Detail & Related papers (2024-04-15T12:01:42Z)
Leveraging Data Augmentation for Process Information Extraction [0.0]
We investigate the application of data augmentation for natural language text data. Data augmentation is an important component in enabling machine learning methods for the task of business process model generation from natural language text.
arXiv Detail & Related papers (2024-04-11T06:32:03Z)
The Artificial Neural Twin -- Process Optimization and Continual Learning in Distributed Process Chains [3.79770624632814]
We propose the Artificial Neural Twin, which combines concepts from model predictive control, deep learning, and sensor networks. Our approach introduces differentiable data fusion to estimate the state of distributed process steps. By treating the interconnected process steps as a quasi neural-network, we can backpropagate loss gradients for process optimization or model fine-tuning to process parameters.
arXiv Detail & Related papers (2024-03-27T08:34:39Z)
Machine learning for industrial sensing and control: A survey and practical perspective [7.678648424345052]
We identify key statistical and machine learning techniques that have seen practical success in the process industries. Soft sensing contains a wealth of industrial applications of statistical and machine learning methods. We consider two distinct flavors for data-driven optimization and control: hybrid modeling in conjunction with mathematical programming techniques and reinforcement learning.
arXiv Detail & Related papers (2024-01-24T22:27:04Z)
On Efficient Training of Large-Scale Deep Learning Models: A Literature Review [90.87691246153612]
The field of deep learning has witnessed significant progress, particularly in computer vision (CV), natural language processing (NLP), and speech. The use of large-scale models trained on vast amounts of data holds immense promise for practical applications. With the increasing demands on computational capacity, a comprehensive summarization on acceleration techniques of training deep learning models is still much anticipated.
arXiv Detail & Related papers (2023-04-07T11:13:23Z)
Deep Learning based pipeline for anomaly detection and quality enhancement in industrial binder jetting processes [68.8204255655161]
Anomaly detection describes methods of finding abnormal states, instances or data points that differ from a normal value space. This paper contributes to a data-centric way of approaching artificial intelligence in industrial production.
arXiv Detail & Related papers (2022-09-21T08:14:34Z)
Process-BERT: A Framework for Representation Learning on Educational Process Data [68.8204255655161]
We propose a framework for learning representations of educational process data. Our framework consists of a pre-training step that uses BERT-type objectives to learn representations from sequential process data. We apply our framework to the 2019 nation's report card data mining competition dataset.
arXiv Detail & Related papers (2022-04-28T16:07:28Z)
Generating Privacy-Preserving Process Data with Deep Generative Models [7.3268099910347715]
We introduce an adversarial generative network for process data generation (ProcessGAN) We evaluate ProcessGAN and traditional models on six real-world datasets. We conclude that ProcessGAN can generate a large amount of sharable synthetic process data indistinguishable from authentic data.
arXiv Detail & Related papers (2022-03-15T14:29:54Z)
Automated Machine Learning Techniques for Data Streams [91.3755431537592]
This paper surveys the state-of-the-art open-source AutoML tools, applies them to data collected from streams, and measures how their performance changes over time. The results show that off-the-shelf AutoML tools can provide satisfactory results but in the presence of concept drift, detection or adaptation techniques have to be applied to maintain the predictive accuracy over time.
arXiv Detail & Related papers (2021-06-14T11:42:46Z)
AI-based Modeling and Data-driven Evaluation for Smart Manufacturing Processes [56.65379135797867]
We propose a dynamic algorithm for gaining useful insights about semiconductor manufacturing processes. We elaborate on the utilization of a Genetic Algorithm and Neural Network to propose an intelligent feature selection algorithm.
arXiv Detail & Related papers (2020-08-29T14:57:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.