Related papers: Variational Autoencoder Generative Adversarial Network for Synthetic Data Generation in Smart Home

Variational Autoencoder Generative Adversarial Network for Synthetic Data Generation in Smart Home

URL: http://arxiv.org/abs/2201.07387v1
Date: Wed, 19 Jan 2022 02:30:25 GMT
Title: Variational Autoencoder Generative Adversarial Network for Synthetic Data Generation in Smart Home
Authors: Mina Razghandi, Hao Zhou, Melike Erol-Kantarci, and Damla Turgut
Abstract summary: We propose a Variational AutoEncoder Geneversarative Adrial Network (VAE-GAN) as a smart grid data generative model. VAE-GAN is capable of learning various types of data distributions and generating plausible samples from the same distribution. Experiments indicate that the proposed synthetic data generative model outperforms the vanilla GAN network.
Score: 15.995891934245334
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Data is the fuel of data science and machine learning techniques for smart grid applications, similar to many other fields. However, the availability of data can be an issue due to privacy concerns, data size, data quality, and so on. To this end, in this paper, we propose a Variational AutoEncoder Generative Adversarial Network (VAE-GAN) as a smart grid data generative model which is capable of learning various types of data distributions and generating plausible samples from the same distribution without performing any prior analysis on the data before the training phase.We compared the Kullback-Leibler (KL) divergence, maximum mean discrepancy (MMD), and Wasserstein distance between the synthetic data (electrical load and PV production) distribution generated by the proposed model, vanilla GAN network, and the real data distribution, to evaluate the performance of our model. Furthermore, we used five key statistical parameters to describe the smart grid data distribution and compared them between synthetic data generated by both models and real data. Experiments indicate that the proposed synthetic data generative model outperforms the vanilla GAN network. The distribution of VAE-GAN synthetic data is the most comparable to that of real data.

Related papers

Scaling Laws of Synthetic Data for Language Models [132.67350443447611]
We introduce SynthLLM, a scalable framework that transforms pre-training corpora into diverse, high-quality synthetic datasets. Our approach achieves this by automatically extracting and recombining high-level concepts across multiple documents using a graph algorithm.
arXiv Detail & Related papers (2025-03-25T11:07:12Z)
Tackling Data Heterogeneity in Federated Time Series Forecasting [61.021413959988216]
Time series forecasting plays a critical role in various real-world applications, including energy consumption prediction, disease transmission monitoring, and weather forecasting. Most existing methods rely on a centralized training paradigm, where large amounts of data are collected from distributed devices to a central cloud server. We propose a novel framework, Fed-TREND, to address data heterogeneity by generating informative synthetic data as auxiliary knowledge carriers.
arXiv Detail & Related papers (2024-11-24T04:56:45Z)
FLIGAN: Enhancing Federated Learning with Incomplete Data using GAN [1.5749416770494706]
Federated Learning (FL) provides a privacy-preserving mechanism for distributed training of machine learning models on networked devices. We propose FLIGAN, a novel approach to address the issue of data incompleteness in FL. Our methodology adheres to FL's privacy requirements by generating synthetic data in a federated manner without sharing the actual data in the process.
arXiv Detail & Related papers (2024-03-25T16:49:38Z)
Towards Theoretical Understandings of Self-Consuming Generative Models [56.84592466204185]
This paper tackles the emerging challenge of training generative models within a self-consuming loop. We construct a theoretical framework to rigorously evaluate how this training procedure impacts the data distributions learned by future models. We present results for kernel density estimation, delivering nuanced insights such as the impact of mixed data training on error propagation.
arXiv Detail & Related papers (2024-02-19T02:08:09Z)
Generative Modeling for Tabular Data via Penalized Optimal Transport Network [2.0319002824093015]
Wasserstein generative adversarial network (WGAN) is a notable improvement in generative modeling. We propose POTNet, a generative deep neural network based on a novel, robust, and interpretable marginally-penalized Wasserstein (MPW) loss.
arXiv Detail & Related papers (2024-02-16T05:27:05Z)
Assessment of Differentially Private Synthetic Data for Utility and Fairness in End-to-End Machine Learning Pipelines for Tabular Data [3.555830838738963]
Differentially private (DP) synthetic data sets are a solution for sharing data while preserving the privacy of individual data providers. We identify the most effective synthetic data generation techniques for training and evaluating machine learning models.
arXiv Detail & Related papers (2023-10-30T03:37:16Z)
Private Synthetic Data Meets Ensemble Learning [15.425653946755025]
When machine learning models are trained on synthetic data and then deployed on real data, there is often a performance drop. We introduce a new ensemble strategy for training downstream models, with the goal of enhancing their performance when used on real data.
arXiv Detail & Related papers (2023-10-15T04:24:42Z)
On the Stability of Iterative Retraining of Generative Models on their own Data [56.153542044045224]
We study the impact of training generative models on mixed datasets. We first prove the stability of iterative training under the condition that the initial generative models approximate the data distribution well enough. We empirically validate our theory on both synthetic and natural images by iteratively training normalizing flows and state-of-the-art diffusion models.
arXiv Detail & Related papers (2023-09-30T16:41:04Z)
Synthetic data, real errors: how (not) to publish and use synthetic data [86.65594304109567]
We show how the generative process affects the downstream ML task. We introduce Deep Generative Ensemble (DGE) to approximate the posterior distribution over the generative process model parameters.
arXiv Detail & Related papers (2023-05-16T07:30:29Z)
Targeted Analysis of High-Risk States Using an Oriented Variational Autoencoder [3.494548275937873]
Variational autoencoder (VAE) neural networks can be trained to generate power system states. The coordinates of the latent space codes of VAEs have been shown to correlate with conceptual features of the data. In this paper, an oriented variation autoencoder (OVAE) is proposed to constrain the link between latent space code and generated data.
arXiv Detail & Related papers (2023-03-20T19:34:21Z)
Distributed Traffic Synthesis and Classification in Edge Networks: A Federated Self-supervised Learning Approach [83.2160310392168]
This paper proposes FS-GAN to support automatic traffic analysis and synthesis over a large number of heterogeneous datasets. FS-GAN is composed of multiple distributed Generative Adversarial Networks (GANs) FS-GAN can classify data of unknown types of service and create synthetic samples that capture the traffic distribution of the unknown types.
arXiv Detail & Related papers (2023-02-01T03:23:11Z)
A Bayesian Generative Adversarial Network (GAN) to Generate Synthetic Time-Series Data, Application in Combined Sewer Flow Prediction [3.3139597764446607]
In machine learning, generative models are a class of methods capable of learning data distribution to generate artificial data. In this study, we developed a GAN model to generate synthetic time series to balance our limited recorded time series data. The aim is to predict the flow using precipitation data and examine the impact of data augmentation using synthetic data in model performance.
arXiv Detail & Related papers (2023-01-31T16:12:26Z)
A Generative Approach for Production-Aware Industrial Network Traffic Modeling [70.46446906513677]
We investigate the network traffic data generated from a laser cutting machine deployed in a Trumpf factory in Germany. We analyze the traffic statistics, capture the dependencies between the internal states of the machine, and model the network traffic as a production state dependent process. We compare the performance of various generative models including variational autoencoder (VAE), conditional variational autoencoder (CVAE), and generative adversarial network (GAN)
arXiv Detail & Related papers (2022-11-11T09:46:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.