Related papers: Fed-TDA: Federated Tabular Data Augmentation on Non-IID Data

Fed-TDA: Federated Tabular Data Augmentation on Non-IID Data

URL: http://arxiv.org/abs/2211.13116v1
Date: Tue, 22 Nov 2022 02:17:15 GMT
Title: Fed-TDA: Federated Tabular Data Augmentation on Non-IID Data
Authors: Shaoming Duan, Chuanyi Liu, Peiyi Han, Tianyu He, Yifeng Xu, Qiyuan Deng
Abstract summary: Non-independent and identically distributed (non-IID) data is a key challenge in federated learning (FL) Existing data augmentation methods based on federated generative models or raw data sharing strategies for solving the non-IID problem still suffer from low performance, privacy protection concerns, and high communication overhead. We propose Fed-TDA, which synthesizes data for data augmentation using some simple statistics.
Score: 7.5178093283247165
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Non-independent and identically distributed (non-IID) data is a key challenge in federated learning (FL), which usually hampers the optimization convergence and the performance of FL. Existing data augmentation methods based on federated generative models or raw data sharing strategies for solving the non-IID problem still suffer from low performance, privacy protection concerns, and high communication overhead in decentralized tabular data. To tackle these challenges, we propose a federated tabular data augmentation method, named Fed-TDA. The core idea of Fed-TDA is to synthesize tabular data for data augmentation using some simple statistics (e.g., distributions of each column and global covariance). Specifically, we propose the multimodal distribution transformation and inverse cumulative distribution mapping respectively synthesize continuous and discrete columns in tabular data from a noise according to the pre-learned statistics. Furthermore, we theoretically analyze that our Fed-TDA not only preserves data privacy but also maintains the distribution of the original data and the correlation between columns. Through extensive experiments on five real-world tabular datasets, we demonstrate the superiority of Fed-TDA over the state-of-the-art in test performance and communication efficiency.

Related papers

Federated t-SNE and UMAP for Distributed Data Visualization [20.58663155344881]
Big data is often distributed across multiple data centers and subject to security and privacy concerns. This work proposes Fed-tSNE and Fed-UMAP, which provide high-dimensional data visualization without exchanging data across clients or sending data to the central server.
arXiv Detail & Related papers (2024-12-18T04:33:11Z)
Generative AI-Powered Plugin for Robust Federated Learning in Heterogeneous IoT Networks [3.536605202672355]
Federated learning enables edge devices to collaboratively train a global model while maintaining data privacy by keeping data localized. We propose a novel plugin for federated optimization techniques that approximates Non-IID data distributions to IID through generative AI-enhanced data augmentation and balanced sampling strategy.
arXiv Detail & Related papers (2024-10-31T11:13:47Z)
TabDiff: a Multi-Modal Diffusion Model for Tabular Data Generation [91.50296404732902]
We introduce TabDiff, a joint diffusion framework that models all multi-modal distributions of tabular data in one model. Our key innovation is the development of a joint continuous-time diffusion process for numerical and categorical data. TabDiff achieves superior average performance over existing competitive baselines, with up to $22.5%$ improvement over the state-of-the-art model on pair-wise column correlation estimations.
arXiv Detail & Related papers (2024-10-27T22:58:47Z)
Federated Causal Discovery from Heterogeneous Data [70.31070224690399]
We propose a novel FCD method attempting to accommodate arbitrary causal models and heterogeneous data. These approaches involve constructing summary statistics as a proxy of the raw data to protect data privacy. We conduct extensive experiments on synthetic and real datasets to show the efficacy of our method.
arXiv Detail & Related papers (2024-02-20T18:53:53Z)
FedTabDiff: Federated Learning of Diffusion Probabilistic Models for Synthetic Mixed-Type Tabular Data Generation [5.824064631226058]
We introduce textitFederated Tabular Diffusion (FedTabDiff) for generating high-fidelity mixed-type tabular data without centralized access to the original datasets. FedTabDiff realizes a decentralized learning scheme that permits multiple entities to collaboratively train a generative model while respecting data privacy and locality. Experimental evaluations on real-world financial and medical datasets attest to the framework's capability to produce synthetic data that maintains high fidelity, utility, privacy, and coverage.
arXiv Detail & Related papers (2024-01-11T21:17:50Z)
Federated Learning Empowered by Generative Content [55.576885852501775]
Federated learning (FL) enables leveraging distributed private data for model training in a privacy-preserving way. We propose a novel FL framework termed FedGC, designed to mitigate data heterogeneity issues by diversifying private data with generative content. We conduct a systematic empirical study on FedGC, covering diverse baselines, datasets, scenarios, and modalities.
arXiv Detail & Related papers (2023-12-10T07:38:56Z)
A Simple Data Augmentation for Feature Distribution Skewed Federated Learning [12.636154758643757]
Federated learning (FL) facilitates collaborative learning among multiple clients in a distributed manner, while ensuring privacy protection. In this paper, we focus on the feature distribution skewed FL scenario, which is widespread in real-world applications. We propose FedRDN, a simple yet remarkably effective data augmentation method for feature distribution skewed FL.
arXiv Detail & Related papers (2023-06-14T05:46:52Z)
FedWon: Triumphing Multi-domain Federated Learning Without Normalization [50.49210227068574]
Federated learning (FL) enhances data privacy with collaborative in-situ training on decentralized clients. However, Federated learning (FL) encounters challenges due to non-independent and identically distributed (non-i.i.d) data. We propose a novel method called Federated learning Without normalizations (FedWon) to address the multi-domain problem in FL.
arXiv Detail & Related papers (2023-06-09T13:18:50Z)
Mitigating Data Absence in Federated Learning Using Privacy-Controllable Data Digests [12.65383500988952]
We introduce the Federated Learning with Data Digest (FedDig) framework. FedDig manages unexpected distribution changes using a novel privacy-controllable data digest representation. It consistently outperforms five baseline algorithms by substantial margins in various data absence scenarios.
arXiv Detail & Related papers (2022-10-03T06:54:44Z)
Local Learning Matters: Rethinking Data Heterogeneity in Federated Learning [61.488646649045215]
Federated learning (FL) is a promising strategy for performing privacy-preserving, distributed learning with a network of clients (i.e., edge devices)
arXiv Detail & Related papers (2021-11-28T19:03:39Z)
Fed-TGAN: Federated Learning Framework for Synthesizing Tabular Data [8.014848609114154]
We propose Fed-TGAN, the first Federated learning framework for Tabular GANs. To effectively learn a complex GAN on non-identical participants, Fed-TGAN designs two novel features. Results show that Fed-TGAN accelerates training time per epoch up to 200%.
arXiv Detail & Related papers (2021-08-18T01:47:36Z)
Federated Doubly Stochastic Kernel Learning for Vertically Partitioned Data [93.76907759950608]
We propose a doubly kernel learning algorithm for vertically partitioned data. We show that FDSKL is significantly faster than state-of-the-art federated learning methods when dealing with kernels.
arXiv Detail & Related papers (2020-08-14T05:46:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.