Related papers: GTV: Generating Tabular Data via Vertical Federated Learning

GTV: Generating Tabular Data via Vertical Federated Learning

URL: http://arxiv.org/abs/2302.01706v1
Date: Fri, 3 Feb 2023 13:04:12 GMT
Title: GTV: Generating Tabular Data via Vertical Federated Learning
Authors: Zilong Zhao, Han Wu, Aad Van Moorsel and Lydia Y. Chen
Abstract summary: We propose GTV, a VFL framework for Generative Adversarial Networks (GANs) GTV proposes an unique distributed training architecture for generator and discriminator to access training data in a privacy-preserving manner. Results show that GTV can consistently generate high-fidelity synthetic data of comparable quality to that generated by centralized GAN algorithm.
Score: 20.683314367860532
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Generative Adversarial Networks (GANs) have achieved state-of-the-art results in tabular data synthesis, under the presumption of direct accessible training data. Vertical Federated Learning (VFL) is a paradigm which allows to distributedly train machine learning model with clients possessing unique features pertaining to the same individuals, where the tabular data learning is the primary use case. However, it is unknown if tabular GANs can be learned in VFL. Demand for secure data transfer among clients and GAN during training and data synthesizing poses extra challenge. Conditional vector for tabular GANs is a valuable tool to control specific features of generated data. But it contains sensitive information from real data - risking privacy guarantees. In this paper, we propose GTV, a VFL framework for tabular GANs, whose key components are generator, discriminator and the conditional vector. GTV proposes an unique distributed training architecture for generator and discriminator to access training data in a privacy-preserving manner. To accommodate conditional vector into training without privacy leakage, GTV designs a mechanism training-with-shuffling to ensure that no party can reconstruct training data with conditional vector. We evaluate the effectiveness of GTV in terms of synthetic data quality, and overall training scalability. Results show that GTV can consistently generate high-fidelity synthetic tabular data of comparable quality to that generated by centralized GAN algorithm. The difference on machine learning utility can be as low as to 2.7%, even under extremely imbalanced data distributions across clients and different number of clients.

Related papers

TabVFL: Improving Latent Representation in Vertical Federated Learning [6.602969765752305]
TabVFL is a distributed framework designed to improve latent representation learning using the joint features of participants. In this paper, we propose TabVFL, a distributed framework designed to improve latent representation learning using the joint features of participants.
arXiv Detail & Related papers (2024-04-27T19:40:35Z)
An improved tabular data generator with VAE-GMM integration [9.4491536689161]
We propose a novel Variational Autoencoder (VAE)-based model that addresses limitations of current approaches. Inspired by the TVAE model, our approach incorporates a Bayesian Gaussian Mixture model (BGM) within the VAE architecture. We thoroughly validate our model on three real-world datasets with mixed data types, including two medically relevant ones.
arXiv Detail & Related papers (2024-04-12T12:31:06Z)
FLIGAN: Enhancing Federated Learning with Incomplete Data using GAN [1.5749416770494706]
Federated Learning (FL) provides a privacy-preserving mechanism for distributed training of machine learning models on networked devices. We propose FLIGAN, a novel approach to address the issue of data incompleteness in FL. Our methodology adheres to FL's privacy requirements by generating synthetic data in a federated manner without sharing the actual data in the process.
arXiv Detail & Related papers (2024-03-25T16:49:38Z)
Federated Learning Empowered by Generative Content [55.576885852501775]
Federated learning (FL) enables leveraging distributed private data for model training in a privacy-preserving way. We propose a novel FL framework termed FedGC, designed to mitigate data heterogeneity issues by diversifying private data with generative content. We conduct a systematic empirical study on FedGC, covering diverse baselines, datasets, scenarios, and modalities.
arXiv Detail & Related papers (2023-12-10T07:38:56Z)
Taming Gradient Variance in Federated Learning with Networked Control Variates [5.424502283356168]
Federated learning, a decentralized approach to machine learning, faces significant challenges such as extensive communication overheads. We introduce a novel Networked Control Variates (FedNCV) framework for Federated Learning.
arXiv Detail & Related papers (2023-10-26T07:32:52Z)
PFL-GAN: When Client Heterogeneity Meets Generative Models in Personalized Federated Learning [55.930403371398114]
We propose a novel generative adversarial network (GAN) sharing and aggregation strategy for personalized learning (PFL) PFL-GAN addresses the client heterogeneity in different scenarios. More specially, we first learn the similarity among clients and then develop an weighted collaborative data aggregation. The empirical results through the rigorous experimentation on several well-known datasets demonstrate the effectiveness of PFL-GAN.
arXiv Detail & Related papers (2023-08-23T22:38:35Z)
PS-FedGAN: An Efficient Federated Learning Framework Based on Partially Shared Generative Adversarial Networks For Data Privacy [56.347786940414935]
Federated Learning (FL) has emerged as an effective learning paradigm for distributed computation. This work proposes a novel FL framework that requires only partial GAN model sharing. Named as PS-FedGAN, this new framework enhances the GAN releasing and training mechanism to address heterogeneous data distributions.
arXiv Detail & Related papers (2023-05-19T05:39:40Z)
Benchmarking FedAvg and FedCurv for Image Classification Tasks [1.376408511310322]
This paper focuses on the problem of statistical heterogeneity of the data in the same federated network. Several Federated Learning algorithms, such as FedAvg, FedProx and Federated Curvature (FedCurv) have already been proposed. As a side product of this work, we release the non-IID version of the datasets we used so to facilitate further comparisons from the FL community.
arXiv Detail & Related papers (2023-03-31T10:13:01Z)
Distributed Traffic Synthesis and Classification in Edge Networks: A Federated Self-supervised Learning Approach [83.2160310392168]
This paper proposes FS-GAN to support automatic traffic analysis and synthesis over a large number of heterogeneous datasets. FS-GAN is composed of multiple distributed Generative Adversarial Networks (GANs) FS-GAN can classify data of unknown types of service and create synthetic samples that capture the traffic distribution of the unknown types.
arXiv Detail & Related papers (2023-02-01T03:23:11Z)
Scalable Collaborative Learning via Representation Sharing [53.047460465980144]
Federated learning (FL) and Split Learning (SL) are two frameworks that enable collaborative learning while keeping the data private (on device) In FL, each data holder trains a model locally and releases it to a central server for aggregation. In SL, the clients must release individual cut-layer activations (smashed data) to the server and wait for its response (during both inference and back propagation). In this work, we present a novel approach for privacy-preserving machine learning, where the clients collaborate via online knowledge distillation using a contrastive loss.
arXiv Detail & Related papers (2022-11-20T10:49:22Z)
DReS-FL: Dropout-Resilient Secure Federated Learning for Non-IID Clients via Secret Data Sharing [7.573516684862637]
Federated learning (FL) strives to enable collaborative training of machine learning models without centrally collecting clients' private data. This paper proposes a Dropout-Resilient Secure Federated Learning framework based on Lagrange computing. We show that DReS-FL is resilient to client dropouts and provides privacy protection for the local datasets.
arXiv Detail & Related papers (2022-10-06T05:04:38Z)
Rethinking Data Heterogeneity in Federated Learning: Introducing a New Notion and Standard Benchmarks [65.34113135080105]
We show that not only the issue of data heterogeneity in current setups is not necessarily a problem but also in fact it can be beneficial for the FL participants. Our observations are intuitive. Our code is available at https://github.com/MMorafah/FL-SC-NIID.
arXiv Detail & Related papers (2022-09-30T17:15:19Z)
Stochastic Coded Federated Learning with Convergence and Privacy Guarantees [8.2189389638822]
Federated learning (FL) has attracted much attention as a privacy-preserving distributed machine learning framework. This paper proposes a coded federated learning framework, namely coded federated learning (SCFL) to mitigate the straggler issue. We characterize the privacy guarantee by the mutual information differential privacy (MI-DP) and analyze the convergence performance in federated learning.
arXiv Detail & Related papers (2022-01-25T04:43:29Z)
Fair and efficient contribution valuation for vertical federated learning [49.50442779626123]
Federated learning is a popular technology for training machine learning models on distributed data sources without sharing data. The Shapley value (SV) is a provably fair contribution valuation metric originated from cooperative game theory. We propose a contribution valuation metric called vertical federated Shapley value (VerFedSV) based on SV.
arXiv Detail & Related papers (2022-01-07T19:57:15Z)
Robust Semi-supervised Federated Learning for Images Automatic Recognition in Internet of Drones [57.468730437381076]
We present a Semi-supervised Federated Learning (SSFL) framework for privacy-preserving UAV image recognition. There are significant differences in the number, features, and distribution of local data collected by UAVs using different camera modules. We propose an aggregation rule based on the frequency of the client's participation in training, namely the FedFreq aggregation rule.
arXiv Detail & Related papers (2022-01-03T16:49:33Z)
Fed-TGAN: Federated Learning Framework for Synthesizing Tabular Data [8.014848609114154]
We propose Fed-TGAN, the first Federated learning framework for Tabular GANs. To effectively learn a complex GAN on non-identical participants, Fed-TGAN designs two novel features. Results show that Fed-TGAN accelerates training time per epoch up to 200%.
arXiv Detail & Related papers (2021-08-18T01:47:36Z)
FedH2L: Federated Learning with Model and Statistical Heterogeneity [75.61234545520611]
Federated learning (FL) enables distributed participants to collectively learn a strong global model without sacrificing their individual data privacy. We introduce FedH2L, which is agnostic to both the model architecture and robust to different data distributions across participants. In contrast to approaches sharing parameters or gradients, FedH2L relies on mutual distillation, exchanging only posteriors on a shared seed set between participants in a decentralized manner.
arXiv Detail & Related papers (2021-01-27T10:10:18Z)
Privacy-Preserving Asynchronous Federated Learning Algorithms for Multi-Party Vertically Collaborative Learning [151.47900584193025]
We propose an asynchronous federated SGD (AFSGD-VP) algorithm and its SVRG and SAGA variants on the vertically partitioned data. To the best of our knowledge, AFSGD-VP and its SVRG and SAGA variants are the first asynchronous federated learning algorithms for vertically partitioned data.
arXiv Detail & Related papers (2020-08-14T08:08:15Z)
Omni-supervised Facial Expression Recognition via Distilled Data [120.11782405714234]
We propose omni-supervised learning to exploit reliable samples in a large amount of unlabeled data for network training. We experimentally verify that the new dataset can significantly improve the ability of the learned FER model. To tackle this, we propose to apply a dataset distillation strategy to compress the created dataset into several informative class-wise images.
arXiv Detail & Related papers (2020-05-18T09:36:51Z)
Feature Quantization Improves GAN Training [126.02828112121874]
Feature Quantization (FQ) for the discriminator embeds both true and fake data samples into a shared discrete space. Our method can be easily plugged into existing GAN models, with little computational overhead in training.
arXiv Detail & Related papers (2020-04-05T04:06:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.