Sequential IoT Data Augmentation using Generative Adversarial Networks
- URL: http://arxiv.org/abs/2101.05003v1
- Date: Wed, 13 Jan 2021 11:08:07 GMT
- Title: Sequential IoT Data Augmentation using Generative Adversarial Networks
- Authors: Maximilian Ernst Tschuchnig and Cornelia Ferner and Stefan Wegenkittl
- Abstract summary: Sequential data in industrial applications can be used to train and evaluate machine learning models.
Since gathering representative amounts of data is difficult and time consuming, there is an incentive to generate it from a small ground truth.
This paper investigates the possibility of using GANs in order to augment sequential Internet of Things (IoT) data.
- Score: 5.8010446129208155
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Sequential data in industrial applications can be used to train and evaluate
machine learning models (e.g. classifiers). Since gathering representative
amounts of data is difficult and time consuming, there is an incentive to
generate it from a small ground truth. Data augmentation is a common method to
generate more data through a priori knowledge with one specific method, so
called generative adversarial networks (GANs), enabling data generation from
noise. This paper investigates the possibility of using GANs in order to
augment sequential Internet of Things (IoT) data, with an example
implementation that generates household energy consumption data with and
without swimming pools. The results of the example implementation seem
subjectively similar to the original data. Additionally to this subjective
evaluation, the paper also introduces a quantitative evaluation technique for
GANs if labels are provided. The positive results from the evaluation support
the initial assumption that generating sequential data from a small ground
truth is possible. This means that tedious data acquisition of sequential data
can be shortened. In the future, the results of this paper may be included as a
tool in machine learning, tackling the small data challenge.
Related papers
- An information theoretic limit to data amplification [0.0]
Generative Adversarial Networks (GANs) have been trained using Monte Carlo simulated input and then used to generate data for the same problem.
N training events for a GAN can result in generated events with the gain factor, G, being more than one.
It is shown that a gain of greater than one is possible whilst keeping the information content of the data unchanged.
arXiv Detail & Related papers (2024-12-23T23:27:51Z) - Generating Realistic Tabular Data with Large Language Models [49.03536886067729]
Large language models (LLM) have been used for diverse tasks, but do not capture the correct correlation between the features and the target variable.
We propose a LLM-based method with three important improvements to correctly capture the ground-truth feature-class correlation in the real data.
Our experiments show that our method significantly outperforms 10 SOTA baselines on 20 datasets in downstream tasks.
arXiv Detail & Related papers (2024-10-29T04:14:32Z) - Deep Generative Modeling-based Data Augmentation with Demonstration
using the BFBT Benchmark Void Fraction Datasets [3.341975883864341]
This paper explores the applications of deep generative models (DGMs) that have been widely used for image data generation to scientific data augmentation.
Once trained, DGMs can be used to generate synthetic data that are similar to the training data and significantly expand the dataset size.
arXiv Detail & Related papers (2023-08-19T22:19:41Z) - STAR: Boosting Low-Resource Information Extraction by Structure-to-Text
Data Generation with Large Language Models [56.27786433792638]
STAR is a data generation method that leverages Large Language Models (LLMs) to synthesize data instances.
We design fine-grained step-by-step instructions to obtain the initial data instances.
Our experiments show that the data generated by STAR significantly improve the performance of low-resource event extraction and relation extraction tasks.
arXiv Detail & Related papers (2023-05-24T12:15:19Z) - Synthetic data, real errors: how (not) to publish and use synthetic data [86.65594304109567]
We show how the generative process affects the downstream ML task.
We introduce Deep Generative Ensemble (DGE) to approximate the posterior distribution over the generative process model parameters.
arXiv Detail & Related papers (2023-05-16T07:30:29Z) - FairGen: Fair Synthetic Data Generation [0.3149883354098941]
We propose a pipeline to generate fairer synthetic data independent of the GAN architecture.
We claim that while generating synthetic data most GANs amplify bias present in the training data but by removing these bias inducing samples, GANs essentially focuses more on real informative samples.
arXiv Detail & Related papers (2022-10-24T08:13:47Z) - Negative Data Augmentation [127.28042046152954]
We show that negative data augmentation samples provide information on the support of the data distribution.
We introduce a new GAN training objective where we use NDA as an additional source of synthetic data for the discriminator.
Empirically, models trained with our method achieve improved conditional/unconditional image generation along with improved anomaly detection capabilities.
arXiv Detail & Related papers (2021-02-09T20:28:35Z) - Generative Adversarial Networks (GANs): An Overview of Theoretical
Model, Evaluation Metrics, and Recent Developments [9.023847175654602]
Generative Adversarial Network (GAN) is an effective method to produce samples of large-scale data distribution.
GANs provide an appropriate way to learn deep representations without widespread use of labeled training data.
In GANs, the generative model is estimated via a competitive process where the generator and discriminator networks are trained simultaneously.
arXiv Detail & Related papers (2020-05-27T05:56:53Z) - Omni-supervised Facial Expression Recognition via Distilled Data [120.11782405714234]
We propose omni-supervised learning to exploit reliable samples in a large amount of unlabeled data for network training.
We experimentally verify that the new dataset can significantly improve the ability of the learned FER model.
To tackle this, we propose to apply a dataset distillation strategy to compress the created dataset into several informative class-wise images.
arXiv Detail & Related papers (2020-05-18T09:36:51Z) - Data-Free Network Quantization With Adversarial Knowledge Distillation [39.92282726292386]
In this paper, we consider data-free network quantization with synthetic data.
The synthetic data are generated from a generator, while no data are used in training the generator and in quantization.
We show the gain of producing diverse adversarial samples by using multiple generators and multiple students.
arXiv Detail & Related papers (2020-05-08T16:24:55Z) - Generative Low-bitwidth Data Free Quantization [44.613912463011545]
We propose Generative Low-bitwidth Data Free Quantization (GDFQ) to remove the data dependence burden.
With the help of generated data, we can quantize a model by learning knowledge from the pre-trained model.
Our method achieves much higher accuracy on 4-bit quantization than the existing data free quantization method.
arXiv Detail & Related papers (2020-03-07T16:38:34Z) - DeGAN : Data-Enriching GAN for Retrieving Representative Samples from a
Trained Classifier [58.979104709647295]
We bridge the gap between the abundance of available data and lack of relevant data, for the future learning tasks of a trained network.
We use the available data, that may be an imbalanced subset of the original training dataset, or a related domain dataset, to retrieve representative samples.
We demonstrate that data from a related domain can be leveraged to achieve state-of-the-art performance.
arXiv Detail & Related papers (2019-12-27T02:05:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.