DREAM: Efficient Dataset Distillation by Representative Matching
- URL: http://arxiv.org/abs/2302.14416v3
- Date: Wed, 30 Aug 2023 14:22:32 GMT
- Title: DREAM: Efficient Dataset Distillation by Representative Matching
- Authors: Yanqing Liu, Jianyang Gu, Kai Wang, Zheng Zhu, Wei Jiang and Yang You
- Abstract summary: We propose a novel matching strategy named as textbfDataset distillation by textbfREpresenttextbfAtive textbfMatching (DREAM)
DREAM is able to be easily plugged into popular dataset distillation frameworks and reduce the distilling iterations by more than 8 times without performance drop.
- Score: 38.92087223000823
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Dataset distillation aims to synthesize small datasets with little
information loss from original large-scale ones for reducing storage and
training costs. Recent state-of-the-art methods mainly constrain the sample
synthesis process by matching synthetic images and the original ones regarding
gradients, embedding distributions, or training trajectories. Although there
are various matching objectives, currently the strategy for selecting original
images is limited to naive random sampling.
We argue that random sampling overlooks the evenness of the selected sample
distribution, which may result in noisy or biased matching targets.
Besides, the sample diversity is also not constrained by random sampling.
These factors together lead to optimization instability in the distilling
process and degrade the training efficiency. Accordingly, we propose a novel
matching strategy named as \textbf{D}ataset distillation by
\textbf{RE}present\textbf{A}tive \textbf{M}atching (DREAM), where only
representative original images are selected for matching. DREAM is able to be
easily plugged into popular dataset distillation frameworks and reduce the
distilling iterations by more than 8 times without performance drop. Given
sufficient training time, DREAM further provides significant improvements and
achieves state-of-the-art performances.
Related papers
- DREAM+: Efficient Dataset Distillation by Bidirectional Representative
Matching [40.18223537419178]
We propose a novel dataset matching strategy called DREAM+, which selects representative original images for bidirectional matching.
DREAM+ significantly reduces the number of distillation iterations by more than 15 times without affecting performance.
Given sufficient training time, DREAM+ can further improve the performance and achieve state-of-the-art results.
arXiv Detail & Related papers (2023-10-23T15:55:30Z) - Self-Supervised Dataset Distillation for Transfer Learning [77.4714995131992]
We propose a novel problem of distilling an unlabeled dataset into a set of small synthetic samples for efficient self-supervised learning (SSL)
We first prove that a gradient of synthetic samples with respect to a SSL objective in naive bilevel optimization is textitbiased due to randomness originating from data augmentations or masking.
We empirically validate the effectiveness of our method on various applications involving transfer learning.
arXiv Detail & Related papers (2023-10-10T10:48:52Z) - ScoreMix: A Scalable Augmentation Strategy for Training GANs with
Limited Data [93.06336507035486]
Generative Adversarial Networks (GANs) typically suffer from overfitting when limited training data is available.
We present ScoreMix, a novel and scalable data augmentation approach for various image synthesis tasks.
arXiv Detail & Related papers (2022-10-27T02:55:15Z) - Saliency Grafting: Innocuous Attribution-Guided Mixup with Calibrated
Label Mixing [104.630875328668]
Mixup scheme suggests mixing a pair of samples to create an augmented training sample.
We present a novel, yet simple Mixup-variant that captures the best of both worlds.
arXiv Detail & Related papers (2021-12-16T11:27:48Z) - ReMix: Towards Image-to-Image Translation with Limited Data [154.71724970593036]
We propose a data augmentation method (ReMix) to tackle this issue.
We interpolate training samples at the feature level and propose a novel content loss based on the perceptual relations among samples.
The proposed approach effectively reduces the ambiguity of generation and renders content-preserving results.
arXiv Detail & Related papers (2021-03-31T06:24:10Z) - Jo-SRC: A Contrastive Approach for Combating Noisy Labels [58.867237220886885]
We propose a noise-robust approach named Jo-SRC (Joint Sample Selection and Model Regularization based on Consistency)
Specifically, we train the network in a contrastive learning manner. Predictions from two different views of each sample are used to estimate its "likelihood" of being clean or out-of-distribution.
arXiv Detail & Related papers (2021-03-24T07:26:07Z) - Instance Selection for GANs [25.196177369030146]
Generative Adversarial Networks (GANs) have led to their widespread adoption for the purposes of generating high quality synthetic imagery.
GANs often produce unrealistic samples which fall outside of the data manifold.
We propose a novel approach to improve sample quality: altering the training dataset via instance selection before model training has taken place.
arXiv Detail & Related papers (2020-07-30T06:33:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.