Related papers: XOR Mixup: Privacy-Preserving Data Augmentation for One-Shot Federated Learning

XOR Mixup: Privacy-Preserving Data Augmentation for One-Shot Federated Learning

URL: http://arxiv.org/abs/2006.05148v1
Date: Tue, 9 Jun 2020 09:43:41 GMT
Title: XOR Mixup: Privacy-Preserving Data Augmentation for One-Shot Federated Learning
Authors: MyungJae Shin, Chihoon Hwang, Joongheon Kim, Jihong Park, Mehdi Bennis and Seong-Lyun Kim
Abstract summary: We develop a privacy-preserving XOR based mixup data augmentation technique, coined XorMixup. The core idea is to collect other devices' encoded data samples that are decoded only using each device's own data samples. XorMixFL achieves up to 17.6% higher accuracy than Vanilla FL under a non-IID MNIST dataset.
Score: 49.130350799077114
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: User-generated data distributions are often imbalanced across devices and labels, hampering the performance of federated learning (FL). To remedy to this non-independent and identically distributed (non-IID) data problem, in this work we develop a privacy-preserving XOR based mixup data augmentation technique, coined XorMixup, and thereby propose a novel one-shot FL framework, termed XorMixFL. The core idea is to collect other devices' encoded data samples that are decoded only using each device's own data samples. The decoding provides synthetic-but-realistic samples until inducing an IID dataset, used for model training. Both encoding and decoding procedures follow the bit-wise XOR operations that intentionally distort raw samples, thereby preserving data privacy. Simulation results corroborate that XorMixFL achieves up to 17.6% higher accuracy than Vanilla FL under a non-IID MNIST dataset.

Related papers

CLIMB: CLustering-based Iterative Data Mixture Bootstrapping for Language Model Pre-training [63.07024608399447]
We propose an automated framework that discovers, evaluates, and refines data mixtures in a pre-training setting. We introduce ClimbLab, a filtered 1.2-trillion-token corpus with 20 clusters as a research playground, and ClimbMix, a compact yet powerful 400-billion-token dataset.
arXiv Detail & Related papers (2025-04-17T17:58:13Z)
Fake It Till Make It: Federated Learning with Consensus-Oriented Generation [52.82176415223988]
We propose federated learning with consensus-oriented generation (FedCOG) FedCOG consists of two key components at the client side: complementary data generation and knowledge-distillation-based model training. Experiments on classical and real-world FL datasets show that FedCOG consistently outperforms state-of-the-art methods.
arXiv Detail & Related papers (2023-12-10T18:49:59Z)
PEOPL: Characterizing Privately Encoded Open Datasets with Public Labels [59.66777287810985]
We introduce information-theoretic scores for privacy and utility, which quantify the average performance of an unfaithful user. We then theoretically characterize primitives in building families of encoding schemes that motivate the use of random deep neural networks.
arXiv Detail & Related papers (2023-03-31T18:03:53Z)
Learning from aggregated data with a maximum entropy model [73.63512438583375]
We show how a new model, similar to a logistic regression, may be learned from aggregated data only by approximating the unobserved feature distribution with a maximum entropy hypothesis. We present empirical evidence on several public datasets that the model learned this way can achieve performances comparable to those of a logistic model trained with the full unaggregated data.
arXiv Detail & Related papers (2022-10-05T09:17:27Z)
Federated Learning with GAN-based Data Synthesis for Non-IID Clients [8.304185807036783]
Federated learning (FL) has recently emerged as a popular privacy-preserving collaborative learning paradigm. We propose a novel framework, named Synthetic Data Aided Federated Learning (SDA-FL), to resolve this non-IID challenge by sharing synthetic data.
arXiv Detail & Related papers (2022-06-11T11:43:25Z)
A Data Cartography based MixUp for Pre-trained Language Models [47.90235939359225]
MixUp is a data augmentation strategy where additional samples are generated during training by combining random pairs of training samples and their labels. We propose TDMixUp, a novel MixUp strategy that leverages Training Dynamics and allows more informative samples to be combined for generating new data samples. We empirically validate that our method not only achieves competitive performance using a smaller subset of the training data compared with strong baselines, but also yields lower expected calibration error on the pre-trained language model, BERT, on both in-domain and out-of-domain settings in a wide range of NLP tasks.
arXiv Detail & Related papers (2022-05-06T17:59:19Z)
Federated Mixture of Experts [94.25278695272874]
FedMix is a framework that allows us to train an ensemble of specialized models. We show that users with similar data characteristics select the same members and therefore share statistical strength.
arXiv Detail & Related papers (2021-07-14T14:15:24Z)
FedMix: Approximation of Mixup under Mean Augmented Federated Learning [60.503258658382]
Federated learning (FL) allows edge devices to collectively learn a model without directly sharing data within each device. Current state-of-the-art algorithms suffer from performance degradation as the heterogeneity of local data across clients increases. We propose a new augmentation algorithm, named FedMix, which is inspired by a phenomenal yet simple data augmentation method, Mixup.
arXiv Detail & Related papers (2021-07-01T06:14:51Z)
Data-Free Network Quantization With Adversarial Knowledge Distillation [39.92282726292386]
In this paper, we consider data-free network quantization with synthetic data. The synthetic data are generated from a generator, while no data are used in training the generator and in quantization. We show the gain of producing diverse adversarial samples by using multiple generators and multiple students.
arXiv Detail & Related papers (2020-05-08T16:24:55Z)
Communication-Efficient On-Device Machine Learning: Federated Distillation and Augmentation under Non-IID Private Data [31.85853956347045]
On-device machine learning (ML) enables the training process to exploit a massive amount of user-generated private data samples. We propose federated distillation (FD), a distributed model training algorithm whose communication payload size is much smaller than a benchmark scheme, federated learning (FL) We show FD with FAug yields around 26x less communication overhead while achieving 95-98% test accuracy compared to FL.
arXiv Detail & Related papers (2018-11-28T10:16:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.