GEM+: Scalable State-of-the-Art Private Synthetic Data with Generator Networks
- URL: http://arxiv.org/abs/2511.09672v1
- Date: Fri, 14 Nov 2025 01:03:31 GMT
- Title: GEM+: Scalable State-of-the-Art Private Synthetic Data with Generator Networks
- Authors: Samuel Maddock, Shripad Gade, Graham Cormode, Will Bullock,
- Abstract summary: We introduce GEM+, which integrates AIM's adaptive measurement framework with GEM's scalable generator network.<n>Our experiments show that GEM+ outperforms AIM in both utility and scalability, delivering state-of-the-art results.
- Score: 9.432150710329607
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: State-of-the-art differentially private synthetic tabular data has been defined by adaptive 'select-measure-generate' frameworks, exemplified by methods like AIM. These approaches iteratively measure low-order noisy marginals and fit graphical models to produce synthetic data, enabling systematic optimisation of data quality under privacy constraints. Graphical models, however, are inefficient for high-dimensional data because they require substantial memory and must be retrained from scratch whenever the graph structure changes, leading to significant computational overhead. Recent methods, like GEM, overcome these limitations by using generator neural networks for improved scalability. However, empirical comparisons have mostly focused on small datasets, limiting real-world applicability. In this work, we introduce GEM+, which integrates AIM's adaptive measurement framework with GEM's scalable generator network. Our experiments show that GEM+ outperforms AIM in both utility and scalability, delivering state-of-the-art results while efficiently handling datasets with over a hundred columns, where AIM fails due to memory and computational overheads.
Related papers
- MeshGraphNet-Transformer: Scalable Mesh-based Learned Simulation for Solid Mechanics [0.0]
We present MeshGraphNet-Transformer (MGN-T), a novel architecture that combines the global modeling capabilities of Transformers with the geometric inductive bias of MeshGraphNets.<n>MGN-T overcomes a key limitation of standard MGN, the inefficient long-range information propagation caused by iterative message passing on large, high-resolution meshes.<n>We demonstrate that MGN-T successfully handles industrial-scale meshes for impact dynamics, a setting in which standard MGN fails due message-passing under-reaching.
arXiv Detail & Related papers (2026-01-30T17:02:47Z) - UtilGen: Utility-Centric Generative Data Augmentation with Dual-Level Task Adaptation [70.2215233759276]
UtilGen is a novel utility-centric data augmentation framework for computer vision tasks.<n>We show that UtilGen consistently achieves superior datasets, with an average accuracy improvement of 3.87% over previous SOTA.<n>Further analysis of data influence and distribution reveals that UtilGen produces more impactful and task-relevant synthetic data.
arXiv Detail & Related papers (2025-10-28T10:17:11Z) - Scaling Transformer-Based Novel View Synthesis Models with Token Disentanglement and Synthetic Data [53.040873127309766]
We propose a token disentanglement process within the transformer architecture, enhancing feature separation and ensuring more effective learning.<n>Our method outperforms existing models on both in-dataset and cross-dataset evaluations.
arXiv Detail & Related papers (2025-09-08T17:58:06Z) - Comparison of derivative-free and gradient-based minimization for multi-objective compositional design of shape memory alloys [0.0]
We use machine models as surrogates and apply numerical methods to search for alloy compositions.<n>This study demonstrates a practical approach to exploring new SMA by combining physics-informed data, machine learning models, and optimization algorithms.
arXiv Detail & Related papers (2025-08-19T03:43:35Z) - Private Training & Data Generation by Clustering Embeddings [74.00687214400021]
Differential privacy (DP) provides a robust framework for protecting individual data.<n>We introduce a novel principled method for DP synthetic image embedding generation.<n> Empirically, a simple two-layer neural network trained on synthetically generated embeddings achieves state-of-the-art (SOTA) classification accuracy.
arXiv Detail & Related papers (2025-06-20T00:17:14Z) - Instruction-Guided Autoregressive Neural Network Parameter Generation [49.800239140036496]
We propose IGPG, an autoregressive framework that unifies parameter synthesis across diverse tasks and architectures.<n>By autoregressively generating neural network weights' tokens, IGPG ensures inter-layer coherence and enables efficient adaptation across models and datasets.<n>Experiments on multiple datasets demonstrate that IGPG consolidates diverse pretrained models into a single, flexible generative framework.
arXiv Detail & Related papers (2025-04-02T05:50:19Z) - AugGen: Synthetic Augmentation using Diffusion Models Can Improve Recognition [14.525986333650417]
Synthetic data generation offers a promising alternative to external datasets or pre-trained models.<n>AugGen strategically samples from a class-conditional generative model trained exclusively on the target FR dataset.<n>Our findings demonstrate that carefully integrated synthetic data can both mitigate privacy constraints and substantially enhance recognition performance.
arXiv Detail & Related papers (2025-03-14T16:10:21Z) - Leveraging Large Language Models to Address Data Scarcity in Machine Learning: Applications in Graphene Synthesis [0.0]
Machine learning in materials science faces challenges due to limited experimental data.<n>We propose strategies that utilize large language models (LLMs) to enhance machine learning performance.
arXiv Detail & Related papers (2025-03-06T16:04:01Z) - Generative adversarial networks for data-scarce spectral applications [0.0]
We report on an application of GANs in the domain of synthetic spectral data generation.
We show that CWGANs can act as a surrogate model with improved performance in the low-data regime.
arXiv Detail & Related papers (2023-07-14T16:27:24Z) - Synthetic data, real errors: how (not) to publish and use synthetic data [86.65594304109567]
We show how the generative process affects the downstream ML task.
We introduce Deep Generative Ensemble (DGE) to approximate the posterior distribution over the generative process model parameters.
arXiv Detail & Related papers (2023-05-16T07:30:29Z) - LD-GAN: Low-Dimensional Generative Adversarial Network for Spectral
Image Generation with Variance Regularization [72.4394510913927]
Deep learning methods are state-of-the-art for spectral image (SI) computational tasks.
GANs enable diverse augmentation by learning and sampling from the data distribution.
GAN-based SI generation is challenging since the high-dimensionality nature of this kind of data hinders the convergence of the GAN training yielding to suboptimal generation.
We propose a statistical regularization to control the low-dimensional representation variance for the autoencoder training and to achieve high diversity of samples generated with the GAN.
arXiv Detail & Related papers (2023-04-29T00:25:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.