GTV: Generating Tabular Data via Vertical Federated Learning
- URL: http://arxiv.org/abs/2302.01706v1
- Date: Fri, 3 Feb 2023 13:04:12 GMT
- Title: GTV: Generating Tabular Data via Vertical Federated Learning
- Authors: Zilong Zhao, Han Wu, Aad Van Moorsel and Lydia Y. Chen
- Abstract summary: We propose GTV, a VFL framework for Generative Adversarial Networks (GANs)
GTV proposes an unique distributed training architecture for generator and discriminator to access training data in a privacy-preserving manner.
Results show that GTV can consistently generate high-fidelity synthetic data of comparable quality to that generated by centralized GAN algorithm.
- Score: 20.683314367860532
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Generative Adversarial Networks (GANs) have achieved state-of-the-art results
in tabular data synthesis, under the presumption of direct accessible training
data. Vertical Federated Learning (VFL) is a paradigm which allows to
distributedly train machine learning model with clients possessing unique
features pertaining to the same individuals, where the tabular data learning is
the primary use case. However, it is unknown if tabular GANs can be learned in
VFL. Demand for secure data transfer among clients and GAN during training and
data synthesizing poses extra challenge. Conditional vector for tabular GANs is
a valuable tool to control specific features of generated data. But it contains
sensitive information from real data - risking privacy guarantees. In this
paper, we propose GTV, a VFL framework for tabular GANs, whose key components
are generator, discriminator and the conditional vector. GTV proposes an unique
distributed training architecture for generator and discriminator to access
training data in a privacy-preserving manner. To accommodate conditional vector
into training without privacy leakage, GTV designs a mechanism
training-with-shuffling to ensure that no party can reconstruct training data
with conditional vector. We evaluate the effectiveness of GTV in terms of
synthetic data quality, and overall training scalability. Results show that GTV
can consistently generate high-fidelity synthetic tabular data of comparable
quality to that generated by centralized GAN algorithm. The difference on
machine learning utility can be as low as to 2.7%, even under extremely
imbalanced data distributions across clients and different number of clients.
Related papers
- TabVFL: Improving Latent Representation in Vertical Federated Learning [6.602969765752305]
TabVFL is a distributed framework designed to improve latent representation learning using the joint features of participants.
In this paper, we propose TabVFL, a distributed framework designed to improve latent representation learning using the joint features of participants.
arXiv Detail & Related papers (2024-04-27T19:40:35Z) - Federated Learning Empowered by Generative Content [55.576885852501775]
Federated learning (FL) enables leveraging distributed private data for model training in a privacy-preserving way.
We propose a novel FL framework termed FedGC, designed to mitigate data heterogeneity issues by diversifying private data with generative content.
We conduct a systematic empirical study on FedGC, covering diverse baselines, datasets, scenarios, and modalities.
arXiv Detail & Related papers (2023-12-10T07:38:56Z) - PS-FedGAN: An Efficient Federated Learning Framework Based on Partially
Shared Generative Adversarial Networks For Data Privacy [56.347786940414935]
Federated Learning (FL) has emerged as an effective learning paradigm for distributed computation.
This work proposes a novel FL framework that requires only partial GAN model sharing.
Named as PS-FedGAN, this new framework enhances the GAN releasing and training mechanism to address heterogeneous data distributions.
arXiv Detail & Related papers (2023-05-19T05:39:40Z) - Benchmarking FedAvg and FedCurv for Image Classification Tasks [1.376408511310322]
This paper focuses on the problem of statistical heterogeneity of the data in the same federated network.
Several Federated Learning algorithms, such as FedAvg, FedProx and Federated Curvature (FedCurv) have already been proposed.
As a side product of this work, we release the non-IID version of the datasets we used so to facilitate further comparisons from the FL community.
arXiv Detail & Related papers (2023-03-31T10:13:01Z) - Scalable Collaborative Learning via Representation Sharing [53.047460465980144]
Federated learning (FL) and Split Learning (SL) are two frameworks that enable collaborative learning while keeping the data private (on device)
In FL, each data holder trains a model locally and releases it to a central server for aggregation.
In SL, the clients must release individual cut-layer activations (smashed data) to the server and wait for its response (during both inference and back propagation).
In this work, we present a novel approach for privacy-preserving machine learning, where the clients collaborate via online knowledge distillation using a contrastive loss.
arXiv Detail & Related papers (2022-11-20T10:49:22Z) - DReS-FL: Dropout-Resilient Secure Federated Learning for Non-IID Clients
via Secret Data Sharing [7.573516684862637]
Federated learning (FL) strives to enable collaborative training of machine learning models without centrally collecting clients' private data.
This paper proposes a Dropout-Resilient Secure Federated Learning framework based on Lagrange computing.
We show that DReS-FL is resilient to client dropouts and provides privacy protection for the local datasets.
arXiv Detail & Related papers (2022-10-06T05:04:38Z) - Rethinking Data Heterogeneity in Federated Learning: Introducing a New
Notion and Standard Benchmarks [65.34113135080105]
We show that not only the issue of data heterogeneity in current setups is not necessarily a problem but also in fact it can be beneficial for the FL participants.
Our observations are intuitive.
Our code is available at https://github.com/MMorafah/FL-SC-NIID.
arXiv Detail & Related papers (2022-09-30T17:15:19Z) - Stochastic Coded Federated Learning with Convergence and Privacy
Guarantees [8.2189389638822]
Federated learning (FL) has attracted much attention as a privacy-preserving distributed machine learning framework.
This paper proposes a coded federated learning framework, namely coded federated learning (SCFL) to mitigate the straggler issue.
We characterize the privacy guarantee by the mutual information differential privacy (MI-DP) and analyze the convergence performance in federated learning.
arXiv Detail & Related papers (2022-01-25T04:43:29Z) - Robust Semi-supervised Federated Learning for Images Automatic
Recognition in Internet of Drones [57.468730437381076]
We present a Semi-supervised Federated Learning (SSFL) framework for privacy-preserving UAV image recognition.
There are significant differences in the number, features, and distribution of local data collected by UAVs using different camera modules.
We propose an aggregation rule based on the frequency of the client's participation in training, namely the FedFreq aggregation rule.
arXiv Detail & Related papers (2022-01-03T16:49:33Z) - Fed-TGAN: Federated Learning Framework for Synthesizing Tabular Data [8.014848609114154]
We propose Fed-TGAN, the first Federated learning framework for Tabular GANs.
To effectively learn a complex GAN on non-identical participants, Fed-TGAN designs two novel features.
Results show that Fed-TGAN accelerates training time per epoch up to 200%.
arXiv Detail & Related papers (2021-08-18T01:47:36Z) - Omni-supervised Facial Expression Recognition via Distilled Data [120.11782405714234]
We propose omni-supervised learning to exploit reliable samples in a large amount of unlabeled data for network training.
We experimentally verify that the new dataset can significantly improve the ability of the learned FER model.
To tackle this, we propose to apply a dataset distillation strategy to compress the created dataset into several informative class-wise images.
arXiv Detail & Related papers (2020-05-18T09:36:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.