Data Quality Over Quantity: Pitfalls and Guidelines for Process
Analytics
- URL: http://arxiv.org/abs/2211.06440v2
- Date: Wed, 5 Apr 2023 23:56:44 GMT
- Title: Data Quality Over Quantity: Pitfalls and Guidelines for Process
Analytics
- Authors: Lim C. Siang, Shams Elnawawi, Lee D. Rippon, Daniel L. O'Connor and R.
Bhushan Gopaluni
- Abstract summary: Data pre-processing has out-sized influence on the success of real-world artificial intelligence applications.
We present practical considerations for pre-processing industrial time series data to inform the efficient development of reliable soft sensors.
- Score: 0.8399688944263843
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: A significant portion of the effort involved in advanced process control,
process analytics, and machine learning involves acquiring and preparing data.
Literature often emphasizes increasingly complex modelling techniques with
incremental performance improvements. However, when industrial case studies are
published they often lack important details on data acquisition and
preparation. Although data pre-processing is unfairly maligned as trivial and
technically uninteresting, in practice it has an out-sized influence on the
success of real-world artificial intelligence applications. This work describes
best practices for acquiring and preparing operating data to pursue data-driven
modelling and control opportunities in industrial processes. We present
practical considerations for pre-processing industrial time series data to
inform the efficient development of reliable soft sensors that provide valuable
process insights.
Related papers
- Sparse Attention-driven Quality Prediction for Production Process Optimization in Digital Twins [53.70191138561039]
We propose to deploy a digital twin of the production line by encoding its operational logic in a data-driven approach.
We adopt a quality prediction model for production process based on self-attention-enabled temporal convolutional neural networks.
Our operation experiments on a specific tobacco shredding line demonstrate that the proposed digital twin-based production process optimization method fosters seamless integration between virtual and real production lines.
arXiv Detail & Related papers (2024-05-20T09:28:23Z) - AI Competitions and Benchmarks: Dataset Development [42.164845505628506]
This chapter provides a comprehensive overview of established methodological tools, enriched by our practical experience.
We develop the tasks involved in dataset development and offer insights into their effective management.
Then, we provide more details about the implementation process which includes data collection, transformation, and quality evaluation.
arXiv Detail & Related papers (2024-04-15T12:01:42Z) - Leveraging Data Augmentation for Process Information Extraction [0.0]
We investigate the application of data augmentation for natural language text data.
Data augmentation is an important component in enabling machine learning methods for the task of business process model generation from natural language text.
arXiv Detail & Related papers (2024-04-11T06:32:03Z) - The Artificial Neural Twin -- Process Optimization and Continual Learning in Distributed Process Chains [3.79770624632814]
We propose the Artificial Neural Twin, which combines concepts from model predictive control, deep learning, and sensor networks.
Our approach introduces differentiable data fusion to estimate the state of distributed process steps.
By treating the interconnected process steps as a quasi neural-network, we can backpropagate loss gradients for process optimization or model fine-tuning to process parameters.
arXiv Detail & Related papers (2024-03-27T08:34:39Z) - Machine learning for industrial sensing and control: A survey and
practical perspective [7.678648424345052]
We identify key statistical and machine learning techniques that have seen practical success in the process industries.
Soft sensing contains a wealth of industrial applications of statistical and machine learning methods.
We consider two distinct flavors for data-driven optimization and control: hybrid modeling in conjunction with mathematical programming techniques and reinforcement learning.
arXiv Detail & Related papers (2024-01-24T22:27:04Z) - On Efficient Training of Large-Scale Deep Learning Models: A Literature
Review [90.87691246153612]
The field of deep learning has witnessed significant progress, particularly in computer vision (CV), natural language processing (NLP), and speech.
The use of large-scale models trained on vast amounts of data holds immense promise for practical applications.
With the increasing demands on computational capacity, a comprehensive summarization on acceleration techniques of training deep learning models is still much anticipated.
arXiv Detail & Related papers (2023-04-07T11:13:23Z) - Deep Learning based pipeline for anomaly detection and quality
enhancement in industrial binder jetting processes [68.8204255655161]
Anomaly detection describes methods of finding abnormal states, instances or data points that differ from a normal value space.
This paper contributes to a data-centric way of approaching artificial intelligence in industrial production.
arXiv Detail & Related papers (2022-09-21T08:14:34Z) - Process-BERT: A Framework for Representation Learning on Educational
Process Data [68.8204255655161]
We propose a framework for learning representations of educational process data.
Our framework consists of a pre-training step that uses BERT-type objectives to learn representations from sequential process data.
We apply our framework to the 2019 nation's report card data mining competition dataset.
arXiv Detail & Related papers (2022-04-28T16:07:28Z) - Generating Privacy-Preserving Process Data with Deep Generative Models [7.3268099910347715]
We introduce an adversarial generative network for process data generation (ProcessGAN)
We evaluate ProcessGAN and traditional models on six real-world datasets.
We conclude that ProcessGAN can generate a large amount of sharable synthetic process data indistinguishable from authentic data.
arXiv Detail & Related papers (2022-03-15T14:29:54Z) - Automated Machine Learning Techniques for Data Streams [91.3755431537592]
This paper surveys the state-of-the-art open-source AutoML tools, applies them to data collected from streams, and measures how their performance changes over time.
The results show that off-the-shelf AutoML tools can provide satisfactory results but in the presence of concept drift, detection or adaptation techniques have to be applied to maintain the predictive accuracy over time.
arXiv Detail & Related papers (2021-06-14T11:42:46Z) - AI-based Modeling and Data-driven Evaluation for Smart Manufacturing
Processes [56.65379135797867]
We propose a dynamic algorithm for gaining useful insights about semiconductor manufacturing processes.
We elaborate on the utilization of a Genetic Algorithm and Neural Network to propose an intelligent feature selection algorithm.
arXiv Detail & Related papers (2020-08-29T14:57:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.