DP-Mix: Mixup-based Data Augmentation for Differentially Private
Learning
- URL: http://arxiv.org/abs/2311.01295v1
- Date: Thu, 2 Nov 2023 15:12:12 GMT
- Title: DP-Mix: Mixup-based Data Augmentation for Differentially Private
Learning
- Authors: Wenxuan Bao, Francesco Pittaluga, Vijay Kumar B G, Vincent
Bindschaedler
- Abstract summary: We propose two novel data augmentation techniques specifically designed for the constraints of differentially private learning.
Our first technique, DP-Mix_Self, achieves SoTA classification performance across a range of datasets and settings by performing mixup on self-augmented data.
Our second technique, DP-Mix_Diff, further improves performance by incorporating synthetic data from a pre-trained diffusion model into the mixup process.
- Score: 10.971246386083884
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Data augmentation techniques, such as simple image transformations and
combinations, are highly effective at improving the generalization of computer
vision models, especially when training data is limited. However, such
techniques are fundamentally incompatible with differentially private learning
approaches, due to the latter's built-in assumption that each training image's
contribution to the learned model is bounded. In this paper, we investigate why
naive applications of multi-sample data augmentation techniques, such as mixup,
fail to achieve good performance and propose two novel data augmentation
techniques specifically designed for the constraints of differentially private
learning. Our first technique, DP-Mix_Self, achieves SoTA classification
performance across a range of datasets and settings by performing mixup on
self-augmented data. Our second technique, DP-Mix_Diff, further improves
performance by incorporating synthetic data from a pre-trained diffusion model
into the mixup process. We open-source the code at
https://github.com/wenxuan-Bao/DP-Mix.
Related papers
- Model Inversion Attacks Through Target-Specific Conditional Diffusion Models [54.69008212790426]
Model attacks (MIAs) aim to reconstruct private images from a target classifier's training set, thereby raising privacy concerns in AI applications.
Previous GAN-based MIAs tend to suffer from inferior generative fidelity due to GAN's inherent flaws and biased optimization within latent space.
We propose Diffusion-based Model Inversion (Diff-MI) attacks to alleviate these issues.
arXiv Detail & Related papers (2024-07-16T06:38:49Z) - DANCE: Dual-View Distribution Alignment for Dataset Condensation [39.08022095906364]
We propose a new DM-based method named Dual-view distribution AligNment for dataset CondEnsation (DANCE)
Specifically, from the inner-class view, we construct multiple "middle encoders" to perform pseudo long-term distribution alignment.
While from the inter-class view, we use the expert models to perform distribution calibration.
arXiv Detail & Related papers (2024-06-03T07:22:17Z) - Data Mixing Laws: Optimizing Data Mixtures by Predicting Language Modeling Performance [55.872926690722714]
We study the predictability of model performance regarding the mixture proportions in function forms.
We propose nested use of the scaling laws of training steps, model sizes, and our data mixing law.
Our method effectively optimize the training mixture of a 1B model trained for 100B tokens in RedPajama.
arXiv Detail & Related papers (2024-03-25T17:14:00Z) - Distribution-Aware Data Expansion with Diffusion Models [55.979857976023695]
We propose DistDiff, a training-free data expansion framework based on the distribution-aware diffusion model.
DistDiff consistently enhances accuracy across a diverse range of datasets compared to models trained solely on original data.
arXiv Detail & Related papers (2024-03-11T14:07:53Z) - PrivImage: Differentially Private Synthetic Image Generation using Diffusion Models with Semantic-Aware Pretraining [13.823621924706348]
Differential Privacy (DP) image data synthesis allows organizations to share and utilize synthetic images without privacy concerns.
Previous methods incorporate the advanced techniques of generative models and pre-training on a public dataset to produce exceptional DP image data.
This paper proposes a novel DP image synthesis method, termed PRIVIMAGE, which meticulously selects pre-training data.
arXiv Detail & Related papers (2023-10-19T14:04:53Z) - Diffusion Model for Dense Matching [34.13580888014]
The objective for establishing dense correspondence between paired images consists of two terms: a data term and a prior term.
We propose DiffMatch, a novel conditional diffusion-based framework designed to explicitly model both the data and prior terms.
Our experimental results demonstrate significant performance improvements of our method over existing approaches.
arXiv Detail & Related papers (2023-05-30T14:58:24Z) - Phased Data Augmentation for Training a Likelihood-Based Generative Model with Limited Data [0.0]
Generative models excel in creating realistic images, yet their dependency on extensive datasets for training presents significant challenges.
Current data-efficient methods largely focus on GAN architectures, leaving a gap in training other types of generative models.
"phased data augmentation" is a novel technique that addresses this gap by optimizing training in limited data scenarios without altering the inherent data distribution.
arXiv Detail & Related papers (2023-05-22T03:38:59Z) - MixupE: Understanding and Improving Mixup from Directional Derivative
Perspective [86.06981860668424]
We propose an improved version of Mixup, theoretically justified to deliver better generalization performance than the vanilla Mixup.
Our results show that the proposed method improves Mixup across multiple datasets using a variety of architectures.
arXiv Detail & Related papers (2022-12-27T07:03:52Z) - DoubleMix: Simple Interpolation-Based Data Augmentation for Text
Classification [56.817386699291305]
This paper proposes a simple yet effective data augmentation approach termed DoubleMix.
DoubleMix first generates several perturbed samples for each training data.
It then uses the perturbed data and original data to carry out a two-step in the hidden space of neural models.
arXiv Detail & Related papers (2022-09-12T15:01:04Z) - Mixup-Transformer: Dynamic Data Augmentation for NLP Tasks [75.69896269357005]
Mixup is the latest data augmentation technique that linearly interpolates input examples and the corresponding labels.
In this paper, we explore how to apply mixup to natural language processing tasks.
We incorporate mixup to transformer-based pre-trained architecture, named "mixup-transformer", for a wide range of NLP tasks.
arXiv Detail & Related papers (2020-10-05T23:37:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.