M3D: Dataset Condensation by Minimizing Maximum Mean Discrepancy
- URL: http://arxiv.org/abs/2312.15927v3
- Date: Sun, 25 Feb 2024 15:49:05 GMT
- Title: M3D: Dataset Condensation by Minimizing Maximum Mean Discrepancy
- Authors: Hansong Zhang, Shikun Li, Pengju Wang, Dan Zeng, Shiming Ge
- Abstract summary: Training state-of-the-art (SOTA) deep models often requires extensive data, resulting in substantial training and storage costs.
dataset condensation has been developed to learn a small synthetic set that preserves essential information from the original large-scale dataset.
We present a novel DM-based method named M3D for dataset condensation by Minimizing the Maximum Mean Discrepancy.
- Score: 26.227927019615446
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Training state-of-the-art (SOTA) deep models often requires extensive data,
resulting in substantial training and storage costs. To address these
challenges, dataset condensation has been developed to learn a small synthetic
set that preserves essential information from the original large-scale dataset.
Nowadays, optimization-oriented methods have been the primary method in the
field of dataset condensation for achieving SOTA results. However, the bi-level
optimization process hinders the practical application of such methods to
realistic and larger datasets. To enhance condensation efficiency, previous
works proposed Distribution-Matching (DM) as an alternative, which
significantly reduces the condensation cost. Nonetheless, current DM-based
methods still yield less comparable results to SOTA optimization-oriented
methods. In this paper, we argue that existing DM-based methods overlook the
higher-order alignment of the distributions, which may lead to sub-optimal
matching results. Inspired by this, we present a novel DM-based method named
M3D for dataset condensation by Minimizing the Maximum Mean Discrepancy between
feature representations of the synthetic and real images. By embedding their
distributions in a reproducing kernel Hilbert space, we align all orders of
moments of the distributions of real and synthetic images, resulting in a more
generalized condensed set. Notably, our method even surpasses the SOTA
optimization-oriented method IDC on the high-resolution ImageNet dataset.
Extensive analysis is conducted to verify the effectiveness of the proposed
method. Source codes are available at https://github.com/Hansong-Zhang/M3D.
Related papers
- Hierarchical Features Matter: A Deep Exploration of GAN Priors for Improved Dataset Distillation [51.44054828384487]
We propose a novel parameterization method dubbed Hierarchical Generative Latent Distillation (H-GLaD)
This method systematically explores hierarchical layers within the generative adversarial networks (GANs)
In addition, we introduce a novel class-relevant feature distance metric to alleviate the computational burden associated with synthetic dataset evaluation.
arXiv Detail & Related papers (2024-06-09T09:15:54Z) - DANCE: Dual-View Distribution Alignment for Dataset Condensation [39.08022095906364]
We propose a new DM-based method named Dual-view distribution AligNment for dataset CondEnsation (DANCE)
Specifically, from the inner-class view, we construct multiple "middle encoders" to perform pseudo long-term distribution alignment.
While from the inter-class view, we use the expert models to perform distribution calibration.
arXiv Detail & Related papers (2024-06-03T07:22:17Z) - Is Adversarial Training with Compressed Datasets Effective? [4.8576927426880125]
We show the impact of adversarial robustness on models trained with compressed datasets.
We propose a novel robustness-aware dataset compression method based on finding the Minimal Finite Covering (MFC) of the dataset.
arXiv Detail & Related papers (2024-02-08T13:53:11Z) - Dataset Distillation via the Wasserstein Metric [35.32856617593164]
We introduce the Wasserstein distance, a metric grounded in optimal transport theory, to enhance distribution matching in dataset distillation.
Our method achieves new state-of-the-art performance across a range of high-resolution datasets.
arXiv Detail & Related papers (2023-11-30T13:15:28Z) - Improved Distribution Matching for Dataset Condensation [91.55972945798531]
We propose a novel dataset condensation method based on distribution matching.
Our simple yet effective method outperforms most previous optimization-oriented methods with much fewer computational resources.
arXiv Detail & Related papers (2023-07-19T04:07:33Z) - Towards Efficient Deep Hashing Retrieval: Condensing Your Data via
Feature-Embedding Matching [7.908244841289913]
The expenses involved in training state-of-the-art deep hashing retrieval models have witnessed an increase.
The state-of-the-art dataset distillation methods can not expand to all deep hashing retrieval methods.
We propose an efficient condensation framework that addresses these limitations by matching the feature-embedding between synthetic set and real set.
arXiv Detail & Related papers (2023-05-29T13:23:55Z) - Hierarchical Integration Diffusion Model for Realistic Image Deblurring [71.76410266003917]
Diffusion models (DMs) have been introduced in image deblurring and exhibited promising performance.
We propose the Hierarchical Integration Diffusion Model (HI-Diff), for realistic image deblurring.
Experiments on synthetic and real-world blur datasets demonstrate that our HI-Diff outperforms state-of-the-art methods.
arXiv Detail & Related papers (2023-05-22T12:18:20Z) - Minimizing the Accumulated Trajectory Error to Improve Dataset
Distillation [151.70234052015948]
We propose a novel approach that encourages the optimization algorithm to seek a flat trajectory.
We show that the weights trained on synthetic data are robust against the accumulated errors perturbations with the regularization towards the flat trajectory.
Our method, called Flat Trajectory Distillation (FTD), is shown to boost the performance of gradient-matching methods by up to 4.7%.
arXiv Detail & Related papers (2022-11-20T15:49:11Z) - DC-BENCH: Dataset Condensation Benchmark [79.18718490863908]
This work provides the first large-scale standardized benchmark on dataset condensation.
It consists of a suite of evaluations to comprehensively reflect the generability and effectiveness of condensation methods.
The benchmark library is open-sourced to facilitate future research and application.
arXiv Detail & Related papers (2022-07-20T03:54:05Z) - CAFE: Learning to Condense Dataset by Aligning Features [72.99394941348757]
We propose a novel scheme to Condense dataset by Aligning FEatures (CAFE)
At the heart of our approach is an effective strategy to align features from the real and synthetic data across various scales.
We validate the proposed CAFE across various datasets, and demonstrate that it generally outperforms the state of the art.
arXiv Detail & Related papers (2022-03-03T05:58:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.