Evaluating Privacy-Utility Tradeoffs in Synthetic Smart Grid Data
- URL: http://arxiv.org/abs/2506.11026v1
- Date: Tue, 20 May 2025 10:46:29 GMT
- Title: Evaluating Privacy-Utility Tradeoffs in Synthetic Smart Grid Data
- Authors: Andre Catarino, Rui Melo, Rui Abreu, Luis Cruz,
- Abstract summary: We conduct a comparative evaluation of four synthetic data generation methods.<n>We assess classification utility, distribution fidelity, and privacy leakage.<n>These findings highlight the potential of structured generative models for developing privacy-preserving, data-driven energy systems.
- Score: 9.927400227483428
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The widespread adoption of dynamic Time-of-Use (dToU) electricity tariffs requires accurately identifying households that would benefit from such pricing structures. However, the use of real consumption data poses serious privacy concerns, motivating the adoption of synthetic alternatives. In this study, we conduct a comparative evaluation of four synthetic data generation methods, Wasserstein-GP Generative Adversarial Networks (WGAN), Conditional Tabular GAN (CTGAN), Diffusion Models, and Gaussian noise augmentation, under different synthetic regimes. We assess classification utility, distribution fidelity, and privacy leakage. Our results show that architectural design plays a key role: diffusion models achieve the highest utility (macro-F1 up to 88.2%), while CTGAN provide the strongest resistance to reconstruction attacks. These findings highlight the potential of structured generative models for developing privacy-preserving, data-driven energy systems.
Related papers
- SynEHRgy: Synthesizing Mixed-Type Structured Electronic Health Records using Decoder-Only Transformers [3.9018723423306003]
We propose a novel tokenization strategy tailored for structured EHR data.
We benchmark the fidelity, utility, and privacy of the generated data against state-of-the-art models.
arXiv Detail & Related papers (2024-11-20T16:11:20Z) - Little Giants: Synthesizing High-Quality Embedding Data at Scale [71.352883755806]
We introduce SPEED, a framework that aligns open-source small models to efficiently generate large-scale embedding data.
SPEED uses only less than 1/10 of the GPT API calls, outperforming the state-of-the-art embedding model E5_mistral when both are trained solely on their synthetic data.
arXiv Detail & Related papers (2024-10-24T10:47:30Z) - Synthetic Data Generation for Residential Load Patterns via Recurrent GAN and Ensemble Method [12.161170324762645]
We develop the Ensemble Recurrent Generative Adversarial Network (ERGAN) framework to generate high-fidelity synthetic residential load data.
Our developed ERGAN can capture diverse load patterns across various households, thereby enhancing the realism and diversity of the synthetic data generated.
arXiv Detail & Related papers (2024-10-20T12:33:38Z) - Creating synthetic energy meter data using conditional diffusion and building metadata [0.0]
The study proposes a conditional diffusion model for generating high-quality synthetic energy data using relevant metadata.
Using a dataset comprising 1,828 power meters from various buildings and countries, this model is compared with traditional methods.
Results demonstrate the proposed diffusion model's superior performance, with a 36% reduction in Frechet Inception Distance (FID) score and a 13% decrease in Kullback-Leibler divergence (KL divergence)
arXiv Detail & Related papers (2024-03-31T01:58:38Z) - MargCTGAN: A "Marginally'' Better CTGAN for the Low Sample Regime [63.851085173614]
MargCTGAN adds feature matching of de-correlated marginals, which results in a consistent improvement in downstream utility as well as statistical properties of the synthetic data.
arXiv Detail & Related papers (2023-07-16T10:28:49Z) - CasTGAN: Cascaded Generative Adversarial Network for Realistic Tabular
Data Synthesis [0.4999814847776097]
Generative adversarial networks (GANs) have drawn considerable attention in recent years for their proven capability in generating synthetic data.
The validity of the synthetic data and the underlying privacy concerns represent major challenges which are not sufficiently addressed.
arXiv Detail & Related papers (2023-07-01T16:52:18Z) - Synthetic data, real errors: how (not) to publish and use synthetic data [86.65594304109567]
We show how the generative process affects the downstream ML task.
We introduce Deep Generative Ensemble (DGE) to approximate the posterior distribution over the generative process model parameters.
arXiv Detail & Related papers (2023-05-16T07:30:29Z) - Synthesizing Mixed-type Electronic Health Records using Diffusion Models [10.973115905786129]
Synthetic data generation is a promising solution to mitigate privacy concerns when sharing sensitive patient information.
Recent studies have shown that diffusion models offer several advantages over GANs, such as generation of more realistic synthetic data and stable training in generating data modalities, including image, text, and sound.
Our experiments demonstrate that TabDDPM outperforms the state-of-the-art models across all evaluation metrics, except for privacy, which confirms the trade-off between privacy and utility.
arXiv Detail & Related papers (2023-02-28T15:42:30Z) - CAFE: Learning to Condense Dataset by Aligning Features [72.99394941348757]
We propose a novel scheme to Condense dataset by Aligning FEatures (CAFE)
At the heart of our approach is an effective strategy to align features from the real and synthetic data across various scales.
We validate the proposed CAFE across various datasets, and demonstrate that it generally outperforms the state of the art.
arXiv Detail & Related papers (2022-03-03T05:58:49Z) - Deceive D: Adaptive Pseudo Augmentation for GAN Training with Limited
Data [125.7135706352493]
Generative adversarial networks (GANs) typically require ample data for training in order to synthesize high-fidelity images.
Recent studies have shown that training GANs with limited data remains formidable due to discriminator overfitting.
This paper introduces a novel strategy called Adaptive Pseudo Augmentation (APA) to encourage healthy competition between the generator and the discriminator.
arXiv Detail & Related papers (2021-11-12T18:13:45Z) - Discriminator Contrastive Divergence: Semi-Amortized Generative Modeling
by Exploring Energy of the Discriminator [85.68825725223873]
Generative Adversarial Networks (GANs) have shown great promise in modeling high dimensional data.
We introduce the Discriminator Contrastive Divergence, which is well motivated by the property of WGAN's discriminator.
We demonstrate the benefits of significant improved generation on both synthetic data and several real-world image generation benchmarks.
arXiv Detail & Related papers (2020-04-05T01:50:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.