DREAM+: Efficient Dataset Distillation by Bidirectional Representative
Matching
- URL: http://arxiv.org/abs/2310.15052v1
- Date: Mon, 23 Oct 2023 15:55:30 GMT
- Title: DREAM+: Efficient Dataset Distillation by Bidirectional Representative
Matching
- Authors: Yanqing Liu, Jianyang Gu, Kai Wang, Zheng Zhu, Kaipeng Zhang, Wei
Jiang and Yang You
- Abstract summary: We propose a novel dataset matching strategy called DREAM+, which selects representative original images for bidirectional matching.
DREAM+ significantly reduces the number of distillation iterations by more than 15 times without affecting performance.
Given sufficient training time, DREAM+ can further improve the performance and achieve state-of-the-art results.
- Score: 40.18223537419178
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Dataset distillation plays a crucial role in creating compact datasets with
similar training performance compared with original large-scale ones. This is
essential for addressing the challenges of data storage and training costs.
Prevalent methods facilitate knowledge transfer by matching the gradients,
embedding distributions, or training trajectories of synthetic images with
those of the sampled original images. Although there are various matching
objectives, currently the strategy for selecting original images is limited to
naive random sampling. We argue that random sampling overlooks the evenness of
the selected sample distribution, which may result in noisy or biased matching
targets. Besides, the sample diversity is also not constrained by random
sampling. Additionally, current methods predominantly focus on
single-dimensional matching, where information is not fully utilized. To
address these challenges, we propose a novel matching strategy called Dataset
Distillation by Bidirectional REpresentAtive Matching (DREAM+), which selects
representative original images for bidirectional matching. DREAM+ is applicable
to a variety of mainstream dataset distillation frameworks and significantly
reduces the number of distillation iterations by more than 15 times without
affecting performance. Given sufficient training time, DREAM+ can further
improve the performance and achieve state-of-the-art results. We have released
the code at github.com/NUS-HPC-AI-Lab/DREAM+.
Related papers
- Improved Distribution Matching Distillation for Fast Image Synthesis [54.72356560597428]
We introduce DMD2, a set of techniques that lift this limitation and improve DMD training.
First, we eliminate the regression loss and the need for expensive dataset construction.
Second, we integrate a GAN loss into the distillation procedure, discriminating between generated samples and real images.
arXiv Detail & Related papers (2024-05-23T17:59:49Z) - Deep Generative Sampling in the Dual Divergence Space: A Data-efficient & Interpretative Approach for Generative AI [29.13807697733638]
We build on the remarkable achievements in generative sampling of natural images.
We propose an innovative challenge, potentially overly ambitious, which involves generating samples that resemble images.
The statistical challenge lies in the small sample size, sometimes consisting of a few hundred subjects.
arXiv Detail & Related papers (2024-04-10T22:35:06Z) - Revisiting the Evaluation of Image Synthesis with GANs [55.72247435112475]
This study presents an empirical investigation into the evaluation of synthesis performance, with generative adversarial networks (GANs) as a representative of generative models.
In particular, we make in-depth analyses of various factors, including how to represent a data point in the representation space, how to calculate a fair distance using selected samples, and how many instances to use from each set.
arXiv Detail & Related papers (2023-04-04T17:54:32Z) - DEff-GAN: Diverse Attribute Transfer for Few-Shot Image Synthesis [0.38073142980733]
We extend the single-image GAN method to model multiple images for sample synthesis.
Our Data-Efficient GAN (DEff-GAN) generates excellent results when similarities and correspondences can be drawn between the input images or classes.
arXiv Detail & Related papers (2023-02-28T12:43:52Z) - DREAM: Efficient Dataset Distillation by Representative Matching [38.92087223000823]
We propose a novel matching strategy named as textbfDataset distillation by textbfREpresenttextbfAtive textbfMatching (DREAM)
DREAM is able to be easily plugged into popular dataset distillation frameworks and reduce the distilling iterations by more than 8 times without performance drop.
arXiv Detail & Related papers (2023-02-28T08:48:45Z) - ScoreMix: A Scalable Augmentation Strategy for Training GANs with
Limited Data [93.06336507035486]
Generative Adversarial Networks (GANs) typically suffer from overfitting when limited training data is available.
We present ScoreMix, a novel and scalable data augmentation approach for various image synthesis tasks.
arXiv Detail & Related papers (2022-10-27T02:55:15Z) - ReSmooth: Detecting and Utilizing OOD Samples when Training with Data
Augmentation [57.38418881020046]
Recent DA techniques always meet the need for diversity in augmented training samples.
An augmentation strategy that has a high diversity usually introduces out-of-distribution (OOD) augmented samples.
We propose ReSmooth, a framework that firstly detects OOD samples in augmented samples and then leverages them.
arXiv Detail & Related papers (2022-05-25T09:29:27Z) - Saliency Grafting: Innocuous Attribution-Guided Mixup with Calibrated
Label Mixing [104.630875328668]
Mixup scheme suggests mixing a pair of samples to create an augmented training sample.
We present a novel, yet simple Mixup-variant that captures the best of both worlds.
arXiv Detail & Related papers (2021-12-16T11:27:48Z) - Sample selection for efficient image annotation [14.695979686066066]
Supervised object detection has been proven to be successful in many benchmark datasets achieving human-level performances.
We propose an efficient image selection approach that samples the most informative images from the unlabeled dataset.
Our method can reduce up to 80% of manual annotation workload, compared to full manual labeling setting, and performs better than random sampling.
arXiv Detail & Related papers (2021-05-10T21:25:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.