Auto-encoder based Model for High-dimensional Imbalanced Industrial Data
- URL: http://arxiv.org/abs/2108.02083v2
- Date: Thu, 5 Aug 2021 21:47:05 GMT
- Title: Auto-encoder based Model for High-dimensional Imbalanced Industrial Data
- Authors: Chao Zhang, Sthitie Bom
- Abstract summary: We introduce a variance weighted multi-headed auto-encoder classification model that fits well into the high-dimensional and highly imbalanced data.
The model also simultaneously predicts multiple outputs by exploiting output-supervised representation learning and multi-task weighting.
- Score: 6.339700878842761
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: With the proliferation of IoT devices, the distributed control systems are
now capturing and processing more sensors at higher frequency than ever before.
These new data, due to their volume and novelty, cannot be effectively consumed
without the help of data-driven techniques. Deep learning is emerging as a
promising technique to analyze these data, particularly in soft sensor
modeling. The strong representational capabilities of complex data and the
flexibility it offers from an architectural perspective make it a topic of
active applied research in industrial settings. However, the successful
applications of deep learning in soft sensing are still not widely integrated
in factory control systems, because most of the research on soft sensing do not
have access to large scale industrial data which are varied, noisy and
incomplete. The results published in most research papers are therefore not
easily reproduced when applied to the variety of data in industrial settings.
Here we provide manufacturing data sets that are much larger and more complex
than public open soft sensor data. Moreover, the data sets are from Seagate
factories on active service with only necessary anonymization, so that they
reflect the complex and noisy nature of real-world data. We introduce a
variance weighted multi-headed auto-encoder classification model that fits well
into the high-dimensional and highly imbalanced data. Besides the use of
weighting or sampling methods to handle the highly imbalanced data, the model
also simultaneously predicts multiple outputs by exploiting output-supervised
representation learning and multi-task weighting.
Related papers
- Unsupervised Multimodal Fusion of In-process Sensor Data for Advanced Manufacturing Process Monitoring [0.0]
This paper presents a novel approach to multimodal sensor data fusion in manufacturing processes.
We leverage contrastive learning techniques to correlate different data modalities without the need for labeled data.
Our approach facilitates downstream tasks such as process control, anomaly detection, and quality assurance.
arXiv Detail & Related papers (2024-10-29T21:52:04Z) - Quanv4EO: Empowering Earth Observation by means of Quanvolutional Neural Networks [62.12107686529827]
This article highlights a significant shift towards leveraging quantum computing techniques in processing large volumes of remote sensing data.
The proposed Quanv4EO model introduces a quanvolution method for preprocessing multi-dimensional EO data.
Key findings suggest that the proposed model not only maintains high precision in image classification but also shows improvements of around 5% in EO use cases.
arXiv Detail & Related papers (2024-07-24T09:11:34Z) - A deep latent variable model for semi-supervised multi-unit soft sensing in industrial processes [0.0]
We introduce a deep latent variable model for semi-supervised multi-unit soft sensing.
This hierarchical, generative model is able to jointly model different units, as well as learning from both labeled and unlabeled data.
We show that by combining semi-supervised and multi-task learning, the proposed model achieves superior results.
arXiv Detail & Related papers (2024-07-18T09:13:22Z) - IPAD: Industrial Process Anomaly Detection Dataset [71.39058003212614]
Video anomaly detection (VAD) is a challenging task aiming to recognize anomalies in video frames.
We propose a new dataset, IPAD, specifically designed for VAD in industrial scenarios.
This dataset covers 16 different industrial devices and contains over 6 hours of both synthetic and real-world video footage.
arXiv Detail & Related papers (2024-04-23T13:38:01Z) - FairGen: Fair Synthetic Data Generation [0.3149883354098941]
We propose a pipeline to generate fairer synthetic data independent of the GAN architecture.
We claim that while generating synthetic data most GANs amplify bias present in the training data but by removing these bias inducing samples, GANs essentially focuses more on real informative samples.
arXiv Detail & Related papers (2022-10-24T08:13:47Z) - Deep Learning based pipeline for anomaly detection and quality
enhancement in industrial binder jetting processes [68.8204255655161]
Anomaly detection describes methods of finding abnormal states, instances or data points that differ from a normal value space.
This paper contributes to a data-centric way of approaching artificial intelligence in industrial production.
arXiv Detail & Related papers (2022-09-21T08:14:34Z) - TRoVE: Transforming Road Scene Datasets into Photorealistic Virtual
Environments [84.6017003787244]
This work proposes a synthetic data generation pipeline to address the difficulties and domain-gaps present in simulated datasets.
We show that using annotations and visual cues from existing datasets, we can facilitate automated multi-modal data generation.
arXiv Detail & Related papers (2022-08-16T20:46:08Z) - Soft Sensing Transformer: Hundreds of Sensors are Worth a Single Word [4.829772176792801]
We demonstrate the challenges and effectiveness of modeling industrial big data by a Soft Sensing Transformer model.
We observe the similarity of a sentence structure to the sensor readings and process the multi-dimensional sensor readings in a time series in a similar manner of sentences in natural language.
The results show that transformer model outperforms the benchmark models in soft sensing field based on auto-encoder and long short-term memory (LSTM) models.
arXiv Detail & Related papers (2021-11-10T22:31:32Z) - Anomaly Detection Based on Selection and Weighting in Latent Space [73.01328671569759]
We propose a novel selection-and-weighting-based anomaly detection framework called SWAD.
Experiments on both benchmark and real-world datasets have shown the effectiveness and superiority of SWAD.
arXiv Detail & Related papers (2021-03-08T10:56:38Z) - Synthetic Data: Opening the data floodgates to enable faster, more
directed development of machine learning methods [96.92041573661407]
Many ground-breaking advancements in machine learning can be attributed to the availability of a large volume of rich data.
Many large-scale datasets are highly sensitive, such as healthcare data, and are not widely available to the machine learning community.
Generating synthetic data with privacy guarantees provides one such solution.
arXiv Detail & Related papers (2020-12-08T17:26:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.