Data science on industrial data -- Today's challenges in brown field
applications
- URL: http://arxiv.org/abs/2006.05757v1
- Date: Wed, 10 Jun 2020 10:05:16 GMT
- Title: Data science on industrial data -- Today's challenges in brown field
applications
- Authors: Tilman Klaeger, Sebastian Gottschall, Lukas Oehm
- Abstract summary: This paper shows state of the art and what to expect when working with stock machines in the field.
A major focus in this paper is on data collection which can be more cumbersome than most people might expect.
Data quality for machine learning applications is a challenge once leaving the laboratory.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Much research is done on data analytics and machine learning. In industrial
processes large amounts of data are available and many researchers are trying
to work with this data. In practical approaches one finds many pitfalls
restraining the application of modern technologies especially in brown field
applications. With this paper we want to show state of the art and what to
expect when working with stock machines in the field. A major focus in this
paper is on data collection which can be more cumbersome than most people might
expect. Also data quality for machine learning applications is a challenge once
leaving the laboratory. In this area one has to expect the lack of semantic
description of the data as well as very little ground truth being available for
training and verification of machine learning models. A last challenge is IT
security and passing data through firewalls.
Related papers
- IPAD: Industrial Process Anomaly Detection Dataset [71.39058003212614]
Video anomaly detection (VAD) is a challenging task aiming to recognize anomalies in video frames.
We propose a new dataset, IPAD, specifically designed for VAD in industrial scenarios.
This dataset covers 16 different industrial devices and contains over 6 hours of both synthetic and real-world video footage.
arXiv Detail & Related papers (2024-04-23T13:38:01Z) - The Frontier of Data Erasure: Machine Unlearning for Large Language Models [56.26002631481726]
Large Language Models (LLMs) are foundational to AI advancements.
LLMs pose risks by potentially memorizing and disseminating sensitive, biased, or copyrighted information.
Machine unlearning emerges as a cutting-edge solution to mitigate these concerns.
arXiv Detail & Related papers (2024-03-23T09:26:15Z) - Cheap Learning: Maximising Performance of Language Models for Social
Data Science Using Minimal Data [1.8692054990918079]
We review three cheap' techniques that have developed in recent years: weak supervision, transfer learning and prompt engineering.
For the latter, we review the particular case of zero-shot prompting of large language models.
We show good performance for all techniques, and in particular we demonstrate how prompting of large language models can achieve high accuracy at very low cost.
arXiv Detail & Related papers (2024-01-22T19:00:11Z) - How to Do Machine Learning with Small Data? -- A Review from an
Industrial Perspective [1.443696537295348]
Authors focus on interpreting the general term of "small data" and their engineering and industrial application role.
Small data is defined in terms of various characteristics compared to big data, and a machine learning formalism was introduced.
Five critical challenges of machine learning with small data in industrial applications are presented.
arXiv Detail & Related papers (2023-11-13T07:39:13Z) - Data-Copilot: Bridging Billions of Data and Humans with Autonomous Workflow [49.724842920942024]
Industries such as finance, meteorology, and energy generate vast amounts of data daily.
We propose Data-Copilot, a data analysis agent that autonomously performs querying, processing, and visualization of massive data tailored to diverse human requests.
arXiv Detail & Related papers (2023-06-12T16:12:56Z) - KGLiDS: A Platform for Semantic Abstraction, Linking, and Automation of Data Science [4.120803087965204]
This paper presents a scalable platform, KGLiDS, that employs machine learning and knowledge graph technologies to abstract and capture the semantics of data science artifacts and their connections.
Based on this information, KGLiDS enables various downstream applications, such as data discovery and pipeline automation.
arXiv Detail & Related papers (2023-03-03T20:31:04Z) - A Survey of Machine Unlearning [56.017968863854186]
Recent regulations now require that, on request, private information about a user must be removed from computer systems.
ML models often remember' the old data.
Recent works on machine unlearning have not been able to completely solve the problem.
arXiv Detail & Related papers (2022-09-06T08:51:53Z) - Maximizing information from chemical engineering data sets: Applications
to machine learning [61.442473332320176]
We identify four characteristics of data arising in chemical engineering applications that make applying classical artificial intelligence approaches difficult.
For each of these data characteristics, we discuss applications where these data characteristics arise and show how current chemical engineering research is extending the fields of data science and machine learning to incorporate these challenges.
arXiv Detail & Related papers (2022-01-25T01:25:45Z) - Data Collection and Quality Challenges in Deep Learning: A Data-Centric
AI Perspective [16.480530590466472]
Data-centric AI practices are now becoming mainstream.
Many datasets in the real world are small, dirty, biased, and even poisoned.
For data quality, we study data validation and data cleaning techniques.
arXiv Detail & Related papers (2021-12-13T03:57:36Z) - Synthetic Data: Opening the data floodgates to enable faster, more
directed development of machine learning methods [96.92041573661407]
Many ground-breaking advancements in machine learning can be attributed to the availability of a large volume of rich data.
Many large-scale datasets are highly sensitive, such as healthcare data, and are not widely available to the machine learning community.
Generating synthetic data with privacy guarantees provides one such solution.
arXiv Detail & Related papers (2020-12-08T17:26:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.