GTV: Generating Tabular Data via Vertical Federated Learning
- URL: http://arxiv.org/abs/2302.01706v1
- Date: Fri, 3 Feb 2023 13:04:12 GMT
- Title: GTV: Generating Tabular Data via Vertical Federated Learning
- Authors: Zilong Zhao, Han Wu, Aad Van Moorsel and Lydia Y. Chen
- Abstract summary: We propose GTV, a VFL framework for Generative Adversarial Networks (GANs)
GTV proposes an unique distributed training architecture for generator and discriminator to access training data in a privacy-preserving manner.
Results show that GTV can consistently generate high-fidelity synthetic data of comparable quality to that generated by centralized GAN algorithm.
- Score: 20.683314367860532
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Generative Adversarial Networks (GANs) have achieved state-of-the-art results
in tabular data synthesis, under the presumption of direct accessible training
data. Vertical Federated Learning (VFL) is a paradigm which allows to
distributedly train machine learning model with clients possessing unique
features pertaining to the same individuals, where the tabular data learning is
the primary use case. However, it is unknown if tabular GANs can be learned in
VFL. Demand for secure data transfer among clients and GAN during training and
data synthesizing poses extra challenge. Conditional vector for tabular GANs is
a valuable tool to control specific features of generated data. But it contains
sensitive information from real data - risking privacy guarantees. In this
paper, we propose GTV, a VFL framework for tabular GANs, whose key components
are generator, discriminator and the conditional vector. GTV proposes an unique
distributed training architecture for generator and discriminator to access
training data in a privacy-preserving manner. To accommodate conditional vector
into training without privacy leakage, GTV designs a mechanism
training-with-shuffling to ensure that no party can reconstruct training data
with conditional vector. We evaluate the effectiveness of GTV in terms of
synthetic data quality, and overall training scalability. Results show that GTV
can consistently generate high-fidelity synthetic tabular data of comparable
quality to that generated by centralized GAN algorithm. The difference on
machine learning utility can be as low as to 2.7%, even under extremely
imbalanced data distributions across clients and different number of clients.
Related papers
- An improved tabular data generator with VAE-GMM integration [9.4491536689161]
We propose a novel Variational Autoencoder (VAE)-based model that addresses limitations of current approaches.
Inspired by the TVAE model, our approach incorporates a Bayesian Gaussian Mixture model (BGM) within the VAE architecture.
We thoroughly validate our model on three real-world datasets with mixed data types, including two medically relevant ones.
arXiv Detail & Related papers (2024-04-12T12:31:06Z) - FLIGAN: Enhancing Federated Learning with Incomplete Data using GAN [1.5749416770494706]
Federated Learning (FL) provides a privacy-preserving mechanism for distributed training of machine learning models on networked devices.
We propose FLIGAN, a novel approach to address the issue of data incompleteness in FL.
Our methodology adheres to FL's privacy requirements by generating synthetic data in a federated manner without sharing the actual data in the process.
arXiv Detail & Related papers (2024-03-25T16:49:38Z) - Taming Gradient Variance in Federated Learning with Networked Control
Variates [5.424502283356168]
Federated learning, a decentralized approach to machine learning, faces significant challenges such as extensive communication overheads.
We introduce a novel Networked Control Variates (FedNCV) framework for Federated Learning.
arXiv Detail & Related papers (2023-10-26T07:32:52Z) - PFL-GAN: When Client Heterogeneity Meets Generative Models in
Personalized Federated Learning [55.930403371398114]
We propose a novel generative adversarial network (GAN) sharing and aggregation strategy for personalized learning (PFL)
PFL-GAN addresses the client heterogeneity in different scenarios. More specially, we first learn the similarity among clients and then develop an weighted collaborative data aggregation.
The empirical results through the rigorous experimentation on several well-known datasets demonstrate the effectiveness of PFL-GAN.
arXiv Detail & Related papers (2023-08-23T22:38:35Z) - Distributed Traffic Synthesis and Classification in Edge Networks: A
Federated Self-supervised Learning Approach [83.2160310392168]
This paper proposes FS-GAN to support automatic traffic analysis and synthesis over a large number of heterogeneous datasets.
FS-GAN is composed of multiple distributed Generative Adversarial Networks (GANs)
FS-GAN can classify data of unknown types of service and create synthetic samples that capture the traffic distribution of the unknown types.
arXiv Detail & Related papers (2023-02-01T03:23:11Z) - Fair and efficient contribution valuation for vertical federated
learning [49.50442779626123]
Federated learning is a popular technology for training machine learning models on distributed data sources without sharing data.
The Shapley value (SV) is a provably fair contribution valuation metric originated from cooperative game theory.
We propose a contribution valuation metric called vertical federated Shapley value (VerFedSV) based on SV.
arXiv Detail & Related papers (2022-01-07T19:57:15Z) - Fed-TGAN: Federated Learning Framework for Synthesizing Tabular Data [8.014848609114154]
We propose Fed-TGAN, the first Federated learning framework for Tabular GANs.
To effectively learn a complex GAN on non-identical participants, Fed-TGAN designs two novel features.
Results show that Fed-TGAN accelerates training time per epoch up to 200%.
arXiv Detail & Related papers (2021-08-18T01:47:36Z) - FedH2L: Federated Learning with Model and Statistical Heterogeneity [75.61234545520611]
Federated learning (FL) enables distributed participants to collectively learn a strong global model without sacrificing their individual data privacy.
We introduce FedH2L, which is agnostic to both the model architecture and robust to different data distributions across participants.
In contrast to approaches sharing parameters or gradients, FedH2L relies on mutual distillation, exchanging only posteriors on a shared seed set between participants in a decentralized manner.
arXiv Detail & Related papers (2021-01-27T10:10:18Z) - Privacy-Preserving Asynchronous Federated Learning Algorithms for
Multi-Party Vertically Collaborative Learning [151.47900584193025]
We propose an asynchronous federated SGD (AFSGD-VP) algorithm and its SVRG and SAGA variants on the vertically partitioned data.
To the best of our knowledge, AFSGD-VP and its SVRG and SAGA variants are the first asynchronous federated learning algorithms for vertically partitioned data.
arXiv Detail & Related papers (2020-08-14T08:08:15Z) - Feature Quantization Improves GAN Training [126.02828112121874]
Feature Quantization (FQ) for the discriminator embeds both true and fake data samples into a shared discrete space.
Our method can be easily plugged into existing GAN models, with little computational overhead in training.
arXiv Detail & Related papers (2020-04-05T04:06:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.