From Limited Annotated Raw Material Data to Quality Production Data: A
Case Study in the Milk Industry (Technical Report)
- URL: http://arxiv.org/abs/2204.12302v1
- Date: Tue, 26 Apr 2022 13:31:37 GMT
- Title: From Limited Annotated Raw Material Data to Quality Production Data: A
Case Study in the Milk Industry (Technical Report)
- Authors: Roee Shraga, Gil Katz, Yael Badian, Nitay Calderon, Avigdor Gal
- Abstract summary: We propose a design methodology, using active learning to enhance learning capabilities, for building a model of production outcome using a constrained amount of raw material training data.
The proposed methodology is demonstrated using an actual application in the milk industry, where milk is gathered from small milk farms and brought to a dairy production plant to be processed into cottage cheese.
- Score: 12.160299682018636
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Industry 4.0 offers opportunities to combine multiple sensor data sources
using IoT technologies for better utilization of raw material in production
lines. A common belief that data is readily available (the big data
phenomenon), is oftentimes challenged by the need to effectively acquire
quality data under severe constraints. In this paper we propose a design
methodology, using active learning to enhance learning capabilities, for
building a model of production outcome using a constrained amount of raw
material training data. The proposed methodology extends existing active
learning methods to effectively solve regression-based learning problems and
may serve settings where data acquisition requires excessive resources in the
physical world. We further suggest a set of qualitative measures to analyze
learners performance. The proposed methodology is demonstrated using an actual
application in the milk industry, where milk is gathered from multiple small
milk farms and brought to a dairy production plant to be processed into cottage
cheese.
Related papers
- Learning with Less: Knowledge Distillation from Large Language Models via Unlabeled Data [54.934578742209716]
In real-world NLP applications, Large Language Models (LLMs) offer promising solutions due to their extensive training on vast datasets.
LLKD is an adaptive sample selection method that incorporates signals from both the teacher and student.
Our comprehensive experiments show that LLKD achieves superior performance across various datasets with higher data efficiency.
arXiv Detail & Related papers (2024-11-12T18:57:59Z) - Forewarned is Forearmed: Leveraging LLMs for Data Synthesis through Failure-Inducing Exploration [90.41908331897639]
Large language models (LLMs) have significantly benefited from training on diverse, high-quality task-specific data.
We present a novel approach, ReverseGen, designed to automatically generate effective training samples.
arXiv Detail & Related papers (2024-10-22T06:43:28Z) - The Frontier of Data Erasure: Machine Unlearning for Large Language Models [56.26002631481726]
Large Language Models (LLMs) are foundational to AI advancements.
LLMs pose risks by potentially memorizing and disseminating sensitive, biased, or copyrighted information.
Machine unlearning emerges as a cutting-edge solution to mitigate these concerns.
arXiv Detail & Related papers (2024-03-23T09:26:15Z) - Machine learning for industrial sensing and control: A survey and
practical perspective [7.678648424345052]
We identify key statistical and machine learning techniques that have seen practical success in the process industries.
Soft sensing contains a wealth of industrial applications of statistical and machine learning methods.
We consider two distinct flavors for data-driven optimization and control: hybrid modeling in conjunction with mathematical programming techniques and reinforcement learning.
arXiv Detail & Related papers (2024-01-24T22:27:04Z) - Benchmarking Automated Machine Learning Methods for Price Forecasting
Applications [58.720142291102135]
We show the possibility of substituting manually created ML pipelines with automated machine learning (AutoML) solutions.
Based on the CRISP-DM process, we split the manual ML pipeline into a machine learning and non-machine learning part.
We show in a case study for the industrial use case of price forecasting, that domain knowledge combined with AutoML can weaken the dependence on ML experts.
arXiv Detail & Related papers (2023-04-28T10:27:38Z) - Flexible, Model-Agnostic Method for Materials Data Extraction from Text Using General Purpose Language Models [5.748877272090607]
Large language models (LLMs) are transforming the way humans interact with text.
We demonstrate a simple and efficient method for extracting materials data from full-text research papers.
This approach requires minimal to no coding or prior knowledge about the extracted property.
It offers high recall and nearly perfect precision in the resulting database.
arXiv Detail & Related papers (2023-02-09T19:56:37Z) - Utilizing Domain Knowledge: Robust Machine Learning for Building Energy
Prediction with Small, Inconsistent Datasets [1.1081836812143175]
The demand for a huge amount of data for machine learning (ML) applications is currently a bottleneck.
We propose a method to combine prior knowledge with data-driven methods to significantly reduce their data dependency.
CBML as the knowledge-encoded data-driven method is examined in the context of energy-efficient building engineering.
arXiv Detail & Related papers (2023-01-23T08:56:11Z) - Deep Learning based pipeline for anomaly detection and quality
enhancement in industrial binder jetting processes [68.8204255655161]
Anomaly detection describes methods of finding abnormal states, instances or data points that differ from a normal value space.
This paper contributes to a data-centric way of approaching artificial intelligence in industrial production.
arXiv Detail & Related papers (2022-09-21T08:14:34Z) - Understanding and Preparing Data of Industrial Processes for Machine
Learning Applications [0.0]
This paper addresses the challenge of missing values due to sensor unavailability at different production units of nonlinear production lines.
In cases where only a small proportion of the data is missing, those missing values can often be imputed.
This paper presents a technique, that allows to utilize all of the available data without the need of removing large amounts of observations.
arXiv Detail & Related papers (2021-09-08T07:39:11Z) - Graph-based Reinforcement Learning for Active Learning in Real Time: An
Application in Modeling River Networks [2.8631830115500394]
We develop a real-time active learning method that uses the spatial and temporal contextual information to select representative query samples in a reinforcement learning framework.
We demonstrate the effectiveness of the proposed method by predicting streamflow and water temperature in the Delaware River Basin given a limited budget for collecting labeled data.
arXiv Detail & Related papers (2020-10-27T02:19:40Z) - Bayesian active learning for production, a systematic study and a
reusable library [85.32971950095742]
In this paper, we analyse the main drawbacks of current active learning techniques.
We do a systematic study on the effects of the most common issues of real-world datasets on the deep active learning process.
We derive two techniques that can speed up the active learning loop such as partial uncertainty sampling and larger query size.
arXiv Detail & Related papers (2020-06-17T14:51:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.