A Comprehensive Study on Dataset Distillation: Performance, Privacy,
Robustness and Fairness
- URL: http://arxiv.org/abs/2305.03355v3
- Date: Sat, 27 May 2023 11:04:02 GMT
- Title: A Comprehensive Study on Dataset Distillation: Performance, Privacy,
Robustness and Fairness
- Authors: Zongxiong Chen, Jiahui Geng, Derui Zhu, Herbert Woisetschlaeger, Qing
Li, Sonja Schimmler, Ruben Mayer, Chunming Rong
- Abstract summary: We conduct extensive experiments to evaluate current state-of-the-art dataset distillation methods.
We successfully use membership inference attacks to show that privacy risks still remain.
This work offers a large-scale benchmarking framework for dataset distillation evaluation.
- Score: 8.432686179800543
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The aim of dataset distillation is to encode the rich features of an original
dataset into a tiny dataset. It is a promising approach to accelerate neural
network training and related studies. Different approaches have been proposed
to improve the informativeness and generalization performance of distilled
images. However, no work has comprehensively analyzed this technique from a
security perspective and there is a lack of systematic understanding of
potential risks. In this work, we conduct extensive experiments to evaluate
current state-of-the-art dataset distillation methods. We successfully use
membership inference attacks to show that privacy risks still remain. Our work
also demonstrates that dataset distillation can cause varying degrees of impact
on model robustness and amplify model unfairness across classes when making
predictions. This work offers a large-scale benchmarking framework for dataset
distillation evaluation.
Related papers
- Behaviour Distillation [10.437472004180883]
We formalize behaviour distillation, a setting that aims to discover and condense information required for training an expert policy into a synthetic dataset.
We then introduce Hallucinating datasets with Evolution Strategies (HaDES), a method for behaviour distillation that can discover datasets of just four state-action pairs.
We show that these datasets generalize out of distribution to training policies with a wide range of architectures.
We also demonstrate application to a downstream task, namely training multi-task agents in a zero-shot fashion.
arXiv Detail & Related papers (2024-06-21T10:45:43Z) - Exploring the potential of prototype-based soft-labels data distillation for imbalanced data classification [0.0]
Main goal is to push further the performance of prototype-based soft-labels distillation in terms of classification accuracy.
Experimental studies trace the capability of the method to distill the data, but also the opportunity to act as an augmentation method.
arXiv Detail & Related papers (2024-03-25T19:15:19Z) - Towards Adversarially Robust Dataset Distillation by Curvature Regularization [11.463315774971857]
We study how to embed adversarial robustness in distilled datasets, so that models trained on these datasets maintain the high accuracy and acquire better adversarial robustness.
We propose a new method that achieves this goal by incorporating curvature regularization into the distillation process with much less computational overhead than standard adversarial training.
arXiv Detail & Related papers (2024-03-15T06:31:03Z) - Importance-Aware Adaptive Dataset Distillation [53.79746115426363]
Development of deep learning models is enabled by the availability of large-scale datasets.
dataset distillation aims to synthesize a compact dataset that retains the essential information from the large original dataset.
We propose an importance-aware adaptive dataset distillation (IADD) method that can improve distillation performance.
arXiv Detail & Related papers (2024-01-29T03:29:39Z) - Rethinking Backdoor Attacks on Dataset Distillation: A Kernel Method
Perspective [65.70799289211868]
We introduce two new theory-driven trigger pattern generation methods specialized for dataset distillation.
We show that our optimization-based trigger design framework informs effective backdoor attacks on dataset distillation.
arXiv Detail & Related papers (2023-11-28T09:53:05Z) - Distill Gold from Massive Ores: Bi-level Data Pruning towards Efficient Dataset Distillation [96.92250565207017]
We study the data efficiency and selection for the dataset distillation task.
By re-formulating the dynamics of distillation, we provide insight into the inherent redundancy in the real dataset.
We find the most contributing samples based on their causal effects on the distillation.
arXiv Detail & Related papers (2023-05-28T06:53:41Z) - Evaluating the effect of data augmentation and BALD heuristics on
distillation of Semantic-KITTI dataset [63.20765930558542]
Active Learning has remained relatively unexplored for LiDAR perception tasks in autonomous driving datasets.
We evaluate Bayesian active learning methods applied to the task of dataset distillation or core subset selection.
We also study the effect of application of data augmentation within Bayesian AL based dataset distillation.
arXiv Detail & Related papers (2023-02-21T13:56:47Z) - A Comprehensive Survey of Dataset Distillation [73.15482472726555]
It has become challenging to handle the unlimited growth of data with limited computing power.
Deep learning technology has developed unprecedentedly in the last decade.
This paper provides a holistic understanding of dataset distillation from multiple aspects.
arXiv Detail & Related papers (2023-01-13T15:11:38Z) - Backdoor Attacks Against Dataset Distillation [24.39067295054253]
This study performs the first backdoor attack against the models trained on the data distilled by dataset distillation models in the image domain.
We propose two types of backdoor attacks, namely NAIVEATTACK and DOORPING.
Empirical evaluation shows that NAIVEATTACK achieves decent attack success rate (ASR) scores in some cases, while DOORPING reaches higher ASR scores (close to 1.0) in all cases.
arXiv Detail & Related papers (2023-01-03T16:58:34Z) - Omni-supervised Facial Expression Recognition via Distilled Data [120.11782405714234]
We propose omni-supervised learning to exploit reliable samples in a large amount of unlabeled data for network training.
We experimentally verify that the new dataset can significantly improve the ability of the learned FER model.
To tackle this, we propose to apply a dataset distillation strategy to compress the created dataset into several informative class-wise images.
arXiv Detail & Related papers (2020-05-18T09:36:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.