Fed-TDA: Federated Tabular Data Augmentation on Non-IID Data
- URL: http://arxiv.org/abs/2211.13116v1
- Date: Tue, 22 Nov 2022 02:17:15 GMT
- Title: Fed-TDA: Federated Tabular Data Augmentation on Non-IID Data
- Authors: Shaoming Duan, Chuanyi Liu, Peiyi Han, Tianyu He, Yifeng Xu, Qiyuan
Deng
- Abstract summary: Non-independent and identically distributed (non-IID) data is a key challenge in federated learning (FL)
Existing data augmentation methods based on federated generative models or raw data sharing strategies for solving the non-IID problem still suffer from low performance, privacy protection concerns, and high communication overhead.
We propose Fed-TDA, which synthesizes data for data augmentation using some simple statistics.
- Score: 7.5178093283247165
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Non-independent and identically distributed (non-IID) data is a key challenge
in federated learning (FL), which usually hampers the optimization convergence
and the performance of FL. Existing data augmentation methods based on
federated generative models or raw data sharing strategies for solving the
non-IID problem still suffer from low performance, privacy protection concerns,
and high communication overhead in decentralized tabular data. To tackle these
challenges, we propose a federated tabular data augmentation method, named
Fed-TDA. The core idea of Fed-TDA is to synthesize tabular data for data
augmentation using some simple statistics (e.g., distributions of each column
and global covariance). Specifically, we propose the multimodal distribution
transformation and inverse cumulative distribution mapping respectively
synthesize continuous and discrete columns in tabular data from a noise
according to the pre-learned statistics. Furthermore, we theoretically analyze
that our Fed-TDA not only preserves data privacy but also maintains the
distribution of the original data and the correlation between columns. Through
extensive experiments on five real-world tabular datasets, we demonstrate the
superiority of Fed-TDA over the state-of-the-art in test performance and
communication efficiency.
Related papers
- Federated t-SNE and UMAP for Distributed Data Visualization [20.58663155344881]
Big data is often distributed across multiple data centers and subject to security and privacy concerns.
This work proposes Fed-tSNE and Fed-UMAP, which provide high-dimensional data visualization without exchanging data across clients or sending data to the central server.
arXiv Detail & Related papers (2024-12-18T04:33:11Z) - Generative AI-Powered Plugin for Robust Federated Learning in Heterogeneous IoT Networks [3.536605202672355]
Federated learning enables edge devices to collaboratively train a global model while maintaining data privacy by keeping data localized.
We propose a novel plugin for federated optimization techniques that approximates Non-IID data distributions to IID through generative AI-enhanced data augmentation and balanced sampling strategy.
arXiv Detail & Related papers (2024-10-31T11:13:47Z) - TabDiff: a Mixed-type Diffusion Model for Tabular Data Generation [91.50296404732902]
We introduce TabDiff, a joint diffusion framework that models all mixed-type distributions of tabular data in one model.
Our key innovation is the development of a joint continuous-time diffusion process for numerical and categorical data.
TabDiff achieves superior average performance over existing competitive baselines, with up to $22.5%$ improvement over the state-of-the-art model on pair-wise column correlation estimations.
arXiv Detail & Related papers (2024-10-27T22:58:47Z) - Federated Causal Discovery from Heterogeneous Data [70.31070224690399]
We propose a novel FCD method attempting to accommodate arbitrary causal models and heterogeneous data.
These approaches involve constructing summary statistics as a proxy of the raw data to protect data privacy.
We conduct extensive experiments on synthetic and real datasets to show the efficacy of our method.
arXiv Detail & Related papers (2024-02-20T18:53:53Z) - FedTabDiff: Federated Learning of Diffusion Probabilistic Models for
Synthetic Mixed-Type Tabular Data Generation [5.824064631226058]
We introduce textitFederated Tabular Diffusion (FedTabDiff) for generating high-fidelity mixed-type tabular data without centralized access to the original datasets.
FedTabDiff realizes a decentralized learning scheme that permits multiple entities to collaboratively train a generative model while respecting data privacy and locality.
Experimental evaluations on real-world financial and medical datasets attest to the framework's capability to produce synthetic data that maintains high fidelity, utility, privacy, and coverage.
arXiv Detail & Related papers (2024-01-11T21:17:50Z) - Federated Learning Empowered by Generative Content [55.576885852501775]
Federated learning (FL) enables leveraging distributed private data for model training in a privacy-preserving way.
We propose a novel FL framework termed FedGC, designed to mitigate data heterogeneity issues by diversifying private data with generative content.
We conduct a systematic empirical study on FedGC, covering diverse baselines, datasets, scenarios, and modalities.
arXiv Detail & Related papers (2023-12-10T07:38:56Z) - A Simple Data Augmentation for Feature Distribution Skewed Federated Learning [47.27053883247425]
Federated Learning (FL) facilitates collaborative learning among multiple clients in a distributed manner.
FL's performance degrades with non-Independent and Identically Distributed (non-IID) data.
We propose FedRDN, which randomly injects the statistical information of the local distribution from the entire federation into the client's data.
Our FedRDN is a plug-and-play component, which can be seamlessly integrated into the data augmentation flow with only a few lines of code.
arXiv Detail & Related papers (2023-06-14T05:46:52Z) - FedWon: Triumphing Multi-domain Federated Learning Without Normalization [50.49210227068574]
Federated learning (FL) enhances data privacy with collaborative in-situ training on decentralized clients.
However, Federated learning (FL) encounters challenges due to non-independent and identically distributed (non-i.i.d) data.
We propose a novel method called Federated learning Without normalizations (FedWon) to address the multi-domain problem in FL.
arXiv Detail & Related papers (2023-06-09T13:18:50Z) - Fed-TGAN: Federated Learning Framework for Synthesizing Tabular Data [8.014848609114154]
We propose Fed-TGAN, the first Federated learning framework for Tabular GANs.
To effectively learn a complex GAN on non-identical participants, Fed-TGAN designs two novel features.
Results show that Fed-TGAN accelerates training time per epoch up to 200%.
arXiv Detail & Related papers (2021-08-18T01:47:36Z) - Federated Doubly Stochastic Kernel Learning for Vertically Partitioned
Data [93.76907759950608]
We propose a doubly kernel learning algorithm for vertically partitioned data.
We show that FDSKL is significantly faster than state-of-the-art federated learning methods when dealing with kernels.
arXiv Detail & Related papers (2020-08-14T05:46:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.