Is Adversarial Training with Compressed Datasets Effective?
- URL: http://arxiv.org/abs/2402.05675v1
- Date: Thu, 8 Feb 2024 13:53:11 GMT
- Title: Is Adversarial Training with Compressed Datasets Effective?
- Authors: Tong Chen, Raghavendra Selvan
- Abstract summary: We show the impact of adversarial robustness on models trained with compressed datasets.
We propose a novel robustness-aware dataset compression method based on finding the Minimal Finite Covering (MFC) of the dataset.
- Score: 4.8576927426880125
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Dataset Condensation (DC) refers to the recent class of dataset compression
methods that generate a smaller, synthetic, dataset from a larger dataset. This
synthetic dataset retains the essential information of the original dataset,
enabling models trained on it to achieve performance levels comparable to those
trained on the full dataset. Most current DC methods have mainly concerned with
achieving high test performance with limited data budget, and have not directly
addressed the question of adversarial robustness. In this work, we investigate
the impact of adversarial robustness on models trained with compressed
datasets. We show that the compressed datasets obtained from DC methods are not
effective in transferring adversarial robustness to models. As a solution to
improve dataset compression efficiency and adversarial robustness
simultaneously, we propose a novel robustness-aware dataset compression method
based on finding the Minimal Finite Covering (MFC) of the dataset. The proposed
method is (1) obtained by one-time computation and is applicable for any model,
(2) more effective than DC methods when applying adversarial training over MFC,
(3) provably robust by minimizing the generalized adversarial loss.
Additionally, empirical evaluation on three datasets shows that the proposed
method is able to achieve better robustness and performance trade-off compared
to DC methods such as distribution matching.
Related papers
- A CLIP-Powered Framework for Robust and Generalizable Data Selection [51.46695086779598]
Real-world datasets often contain redundant and noisy data, imposing a negative impact on training efficiency and model performance.
Data selection has shown promise in identifying the most representative samples from the entire dataset.
We propose a novel CLIP-powered data selection framework that leverages multimodal information for more robust and generalizable sample selection.
arXiv Detail & Related papers (2024-10-15T03:00:58Z) - Group Distributionally Robust Dataset Distillation with Risk
Minimization [18.07189444450016]
We introduce an algorithm that combines clustering with the minimization of a risk measure on the loss to conduct DD.
We demonstrate its effective generalization and robustness across subgroups through numerical experiments.
arXiv Detail & Related papers (2024-02-07T09:03:04Z) - M3D: Dataset Condensation by Minimizing Maximum Mean Discrepancy [26.227927019615446]
Training state-of-the-art (SOTA) deep models often requires extensive data, resulting in substantial training and storage costs.
dataset condensation has been developed to learn a small synthetic set that preserves essential information from the original large-scale dataset.
We present a novel DM-based method named M3D for dataset condensation by Minimizing the Maximum Mean Discrepancy.
arXiv Detail & Related papers (2023-12-26T07:45:32Z) - Improved Distribution Matching for Dataset Condensation [91.55972945798531]
We propose a novel dataset condensation method based on distribution matching.
Our simple yet effective method outperforms most previous optimization-oriented methods with much fewer computational resources.
arXiv Detail & Related papers (2023-07-19T04:07:33Z) - Minimizing the Accumulated Trajectory Error to Improve Dataset
Distillation [151.70234052015948]
We propose a novel approach that encourages the optimization algorithm to seek a flat trajectory.
We show that the weights trained on synthetic data are robust against the accumulated errors perturbations with the regularization towards the flat trajectory.
Our method, called Flat Trajectory Distillation (FTD), is shown to boost the performance of gradient-matching methods by up to 4.7%.
arXiv Detail & Related papers (2022-11-20T15:49:11Z) - Towards Robust Dataset Learning [90.2590325441068]
We propose a principled, tri-level optimization to formulate the robust dataset learning problem.
Under an abstraction model that characterizes robust vs. non-robust features, the proposed method provably learns a robust dataset.
arXiv Detail & Related papers (2022-11-19T17:06:10Z) - DC-BENCH: Dataset Condensation Benchmark [79.18718490863908]
This work provides the first large-scale standardized benchmark on dataset condensation.
It consists of a suite of evaluations to comprehensively reflect the generability and effectiveness of condensation methods.
The benchmark library is open-sourced to facilitate future research and application.
arXiv Detail & Related papers (2022-07-20T03:54:05Z) - CAFE: Learning to Condense Dataset by Aligning Features [72.99394941348757]
We propose a novel scheme to Condense dataset by Aligning FEatures (CAFE)
At the heart of our approach is an effective strategy to align features from the real and synthetic data across various scales.
We validate the proposed CAFE across various datasets, and demonstrate that it generally outperforms the state of the art.
arXiv Detail & Related papers (2022-03-03T05:58:49Z) - Dataset Condensation with Contrastive Signals [41.195453119305746]
gradient matching-based dataset synthesis (DC) methods can achieve state-of-the-art performance when applied to data-efficient learning tasks.
In this study, we prove that the existing DC methods can perform worse than the random selection method when task-irrelevant information forms a significant part of the training dataset.
We propose dataset condensation with Contrastive signals (DCC) by modifying the loss function to enable the DC methods to effectively capture the differences between classes.
arXiv Detail & Related papers (2022-02-07T03:05:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.