Fed-TDA: Federated Tabular Data Augmentation on Non-IID Data
- URL: http://arxiv.org/abs/2211.13116v1
- Date: Tue, 22 Nov 2022 02:17:15 GMT
- Title: Fed-TDA: Federated Tabular Data Augmentation on Non-IID Data
- Authors: Shaoming Duan, Chuanyi Liu, Peiyi Han, Tianyu He, Yifeng Xu, Qiyuan
Deng
- Abstract summary: Non-independent and identically distributed (non-IID) data is a key challenge in federated learning (FL)
Existing data augmentation methods based on federated generative models or raw data sharing strategies for solving the non-IID problem still suffer from low performance, privacy protection concerns, and high communication overhead.
We propose Fed-TDA, which synthesizes data for data augmentation using some simple statistics.
- Score: 7.5178093283247165
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Non-independent and identically distributed (non-IID) data is a key challenge
in federated learning (FL), which usually hampers the optimization convergence
and the performance of FL. Existing data augmentation methods based on
federated generative models or raw data sharing strategies for solving the
non-IID problem still suffer from low performance, privacy protection concerns,
and high communication overhead in decentralized tabular data. To tackle these
challenges, we propose a federated tabular data augmentation method, named
Fed-TDA. The core idea of Fed-TDA is to synthesize tabular data for data
augmentation using some simple statistics (e.g., distributions of each column
and global covariance). Specifically, we propose the multimodal distribution
transformation and inverse cumulative distribution mapping respectively
synthesize continuous and discrete columns in tabular data from a noise
according to the pre-learned statistics. Furthermore, we theoretically analyze
that our Fed-TDA not only preserves data privacy but also maintains the
distribution of the original data and the correlation between columns. Through
extensive experiments on five real-world tabular datasets, we demonstrate the
superiority of Fed-TDA over the state-of-the-art in test performance and
communication efficiency.
Related papers
- Federated Causal Discovery from Heterogeneous Data [70.31070224690399]
We propose a novel FCD method attempting to accommodate arbitrary causal models and heterogeneous data.
These approaches involve constructing summary statistics as a proxy of the raw data to protect data privacy.
We conduct extensive experiments on synthetic and real datasets to show the efficacy of our method.
arXiv Detail & Related papers (2024-02-20T18:53:53Z) - FedTabDiff: Federated Learning of Diffusion Probabilistic Models for
Synthetic Mixed-Type Tabular Data Generation [5.824064631226058]
We introduce textitFederated Tabular Diffusion (FedTabDiff) for generating high-fidelity mixed-type tabular data without centralized access to the original datasets.
FedTabDiff realizes a decentralized learning scheme that permits multiple entities to collaboratively train a generative model while respecting data privacy and locality.
Experimental evaluations on real-world financial and medical datasets attest to the framework's capability to produce synthetic data that maintains high fidelity, utility, privacy, and coverage.
arXiv Detail & Related papers (2024-01-11T21:17:50Z) - Federated Learning Empowered by Generative Content [55.576885852501775]
Federated learning (FL) enables leveraging distributed private data for model training in a privacy-preserving way.
We propose a novel FL framework termed FedGC, designed to mitigate data heterogeneity issues by diversifying private data with generative content.
We conduct a systematic empirical study on FedGC, covering diverse baselines, datasets, scenarios, and modalities.
arXiv Detail & Related papers (2023-12-10T07:38:56Z) - A Simple Data Augmentation for Feature Distribution Skewed Federated
Learning [12.636154758643757]
Federated learning (FL) facilitates collaborative learning among multiple clients in a distributed manner, while ensuring privacy protection.
In this paper, we focus on the feature distribution skewed FL scenario, which is widespread in real-world applications.
We propose FedRDN, a simple yet remarkably effective data augmentation method for feature distribution skewed FL.
arXiv Detail & Related papers (2023-06-14T05:46:52Z) - FedWon: Triumphing Multi-domain Federated Learning Without Normalization [50.49210227068574]
Federated learning (FL) enhances data privacy with collaborative in-situ training on decentralized clients.
However, Federated learning (FL) encounters challenges due to non-independent and identically distributed (non-i.i.d) data.
We propose a novel method called Federated learning Without normalizations (FedWon) to address the multi-domain problem in FL.
arXiv Detail & Related papers (2023-06-09T13:18:50Z) - Rethinking Data Heterogeneity in Federated Learning: Introducing a New
Notion and Standard Benchmarks [65.34113135080105]
We show that not only the issue of data heterogeneity in current setups is not necessarily a problem but also in fact it can be beneficial for the FL participants.
Our observations are intuitive.
Our code is available at https://github.com/MMorafah/FL-SC-NIID.
arXiv Detail & Related papers (2022-09-30T17:15:19Z) - FedADMM: A Robust Federated Deep Learning Framework with Adaptivity to
System Heterogeneity [4.2059108111562935]
Federated Learning (FL) is an emerging framework for distributed processing of large data volumes by edge devices.
In this paper, we introduce a new FLAD FedADMM based protocol.
We show that FedADMM consistently outperforms all baseline methods in terms of communication efficiency.
arXiv Detail & Related papers (2022-04-07T15:58:33Z) - Local Learning Matters: Rethinking Data Heterogeneity in Federated
Learning [61.488646649045215]
Federated learning (FL) is a promising strategy for performing privacy-preserving, distributed learning with a network of clients (i.e., edge devices)
arXiv Detail & Related papers (2021-11-28T19:03:39Z) - Fed-TGAN: Federated Learning Framework for Synthesizing Tabular Data [8.014848609114154]
We propose Fed-TGAN, the first Federated learning framework for Tabular GANs.
To effectively learn a complex GAN on non-identical participants, Fed-TGAN designs two novel features.
Results show that Fed-TGAN accelerates training time per epoch up to 200%.
arXiv Detail & Related papers (2021-08-18T01:47:36Z) - Federated Doubly Stochastic Kernel Learning for Vertically Partitioned
Data [93.76907759950608]
We propose a doubly kernel learning algorithm for vertically partitioned data.
We show that FDSKL is significantly faster than state-of-the-art federated learning methods when dealing with kernels.
arXiv Detail & Related papers (2020-08-14T05:46:56Z) - FedFMC: Sequential Efficient Federated Learning on Non-iid Data [0.0]
FedFMC (Fork-Consolidate-Merge) is a method that forks devices into updating different global models then merges and consolidates separate models into one.
We show that FedFMC substantially improves upon earlier approaches to non-iid data in the federated learning context without using a globally shared subset of data nor increase communication costs.
arXiv Detail & Related papers (2020-06-19T02:36:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.