Related papers: End-to-End Data Quality-Driven Framework for Machine Learning in Production Environment

End-to-End Data Quality-Driven Framework for Machine Learning in Production Environment

URL: http://arxiv.org/abs/2512.19723v1
Date: Tue, 16 Dec 2025 20:11:23 GMT
Title: End-to-End Data Quality-Driven Framework for Machine Learning in Production Environment
Authors: Firas Bayram, Bestoun S. Ahmed, Erik Hallin,
Abstract summary: This paper introduces a novel end-to-end framework that efficiently integrates data quality assessment with machine learning (ML) model operations in real-time production environments.<n>Key innovation lies in its operational efficiency, enabling real-time, quality-driven ML decision-making with minimal computational overhead.
Score: 2.24303609250571
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper introduces a novel end-to-end framework that efficiently integrates data quality assessment with machine learning (ML) model operations in real-time production environments. While existing approaches treat data quality assessment and ML systems as isolated processes, our framework addresses the critical gap between theoretical methods and practical implementation by combining dynamic drift detection, adaptive data quality metrics, and MLOps into a cohesive, lightweight system. The key innovation lies in its operational efficiency, enabling real-time, quality-driven ML decision-making with minimal computational overhead. We validate the framework in a steel manufacturing company's Electroslag Remelting (ESR) vacuum pumping process, demonstrating a 12% improvement in model performance (R2 = 94%) and a fourfold reduction in prediction latency. By exploring the impact of data quality acceptability thresholds, we provide actionable insights into balancing data quality standards and predictive performance in industrial applications. This framework represents a significant advancement in MLOps, offering a robust solution for time-sensitive, data-driven decision-making in dynamic industrial environments.

Related papers

Smart Manufacturing: MLOps-Enabled Event-Driven Architecture for Enhanced Control in Steel Production [2.087827281461409]
We explore a Digital Twin-Based Approach for Smart Manufacturing to improve Sustainability, Efficiency, and Cost-Effectiveness for a steel production plant.<n>Our system is based on a micro-service edge-compute platform that ingests real-time sensor data from the process into a digital twin over a converged network infrastructure.<n>Key to our approach is a Deep Reinforcement learning-based agent used in our machine learning operation (MLOps) driven system to autonomously correlate the system state with its digital twin to identify correction actions that aim to optimize power settings for the plant.
arXiv Detail & Related papers (2025-11-19T05:29:43Z)
Out of Distribution Detection for Efficient Continual Learning in Quality Prediction for Arc Welding [10.828273858204431]
Modern manufacturing relies heavily on fusion welding processes, including gas metal arc welding (GMAW)<n>Current models exhibit critical limitations when confronted with the inherent distribution shifts that occur in dynamic manufacturing environments.<n>We extend the VQ-VAE Transformer architecture by leveraging its autoregressive loss as a reliable out-of-distribution (OOD) detection mechanism.
arXiv Detail & Related papers (2025-08-22T23:09:21Z)
Data-Driven Differential Evolution in Tire Industry Extrusion: Leveraging Surrogate Models [0.0]
This study proposes a surrogate-based, data-driven methodology for optimizing complex real-world manufacturing systems.<n>Machine learning models are employed to approximate system behavior and construct surrogate models, which are integrated into a tailored metaheuristic approach.<n>Results show that the surrogate-based optimization approach outperforms historical best configurations.
arXiv Detail & Related papers (2025-07-15T10:52:45Z)
Automatic Evaluation for Text-to-image Generation: Task-decomposed Framework, Distilled Training, and Meta-evaluation Benchmark [62.58869921806019]
We propose a task decomposition evaluation framework based on GPT-4o to automatically construct a new training dataset. We design innovative training strategies to effectively distill GPT-4o's evaluation capabilities into a 7B open-source MLLM, MiniCPM-V-2.6. Experimental results demonstrate that our distilled open-source MLLM significantly outperforms the current state-of-the-art GPT-4o-base baseline.
arXiv Detail & Related papers (2024-11-23T08:06:06Z)
Star-Agents: Automatic Data Optimization with LLM Agents for Instruction Tuning [71.2981957820888]
We propose a novel Star-Agents framework, which automates the enhancement of data quality across datasets. The framework initially generates diverse instruction data with multiple LLM agents through a bespoke sampling method. The generated data undergo a rigorous evaluation using a dual-model method that assesses both difficulty and quality.
arXiv Detail & Related papers (2024-11-21T02:30:53Z)
Adaptive Data Quality Scoring Operations Framework using Drift-Aware Mechanism for Industrial Applications [0.0]
We introduce a novel framework to address the challenges posed by dynamic quality dimensions in industrial data streams. The framework integrates a dynamic change detector mechanism that actively monitors and adapts to changes in data quality. The experimental results reveal high predictive performance and efficient processing time, highlighting its effectiveness in practical quality-driven AI applications.
arXiv Detail & Related papers (2024-08-13T08:32:06Z)
Sparse Attention-driven Quality Prediction for Production Process Optimization in Digital Twins [53.70191138561039]
We propose to deploy a digital twin of the production line by encoding its operational logic in a data-driven approach. We adopt a quality prediction model for production process based on self-attention-enabled temporal convolutional neural networks. Our operation experiments on a specific tobacco shredding line demonstrate that the proposed digital twin-based production process optimization method fosters seamless integration between virtual and real production lines.
arXiv Detail & Related papers (2024-05-20T09:28:23Z)
Evaluating the Energy Efficiency of Few-Shot Learning for Object Detection in Industrial Settings [6.611985866622974]
This paper presents a finetuning approach to adapt standard object detection models to downstream tasks. Case study and evaluation of the energy demands of the developed models are presented. Finally, this paper introduces a novel way to quantify this trade-off through a customized Efficiency Factor metric.
arXiv Detail & Related papers (2024-03-11T11:41:30Z)
MOTO: Offline Pre-training to Online Fine-tuning for Model-based Robot Learning [52.101643259906915]
We study the problem of offline pre-training and online fine-tuning for reinforcement learning from high-dimensional observations. Existing model-based offline RL methods are not suitable for offline-to-online fine-tuning in high-dimensional domains. We propose an on-policy model-based method that can efficiently reuse prior data through model-based value expansion and policy regularization.
arXiv Detail & Related papers (2024-01-06T21:04:31Z)
QualEval: Qualitative Evaluation for Model Improvement [82.73561470966658]
We propose QualEval, which augments quantitative scalar metrics with automated qualitative evaluation as a vehicle for model improvement. QualEval uses a powerful LLM reasoner and our novel flexible linear programming solver to generate human-readable insights. We demonstrate that leveraging its insights, for example, improves the absolute performance of the Llama 2 model by up to 15% points relative.
arXiv Detail & Related papers (2023-11-06T00:21:44Z)
A Comparative Study of Machine Learning Algorithms for Anomaly Detection in Industrial Environments: Performance and Environmental Impact [62.997667081978825]
This study seeks to address the demands of high-performance machine learning models with environmental sustainability. Traditional machine learning algorithms, such as Decision Trees and Random Forests, demonstrate robust efficiency and performance. However, superior outcomes were obtained with optimised configurations, albeit with a commensurate increase in resource consumption.
arXiv Detail & Related papers (2023-07-01T15:18:00Z)
Quality In / Quality Out: Data quality more relevant than model choice in anomaly detection with the UGR'16 [0.29998889086656577]
We show that relatively minor modifications on a benchmark dataset cause significantly more impact on model performance than the specific ML technique considered.<n>We also show that the measured model performance is uncertain, as a result of labelling inaccuracies.
arXiv Detail & Related papers (2023-05-31T12:03:12Z)

This list is automatically generated from the titles and abstracts of the papers in this site.