Dataset Distillation via Factorization
- URL: http://arxiv.org/abs/2210.16774v1
- Date: Sun, 30 Oct 2022 08:36:19 GMT
- Title: Dataset Distillation via Factorization
- Authors: Songhua Liu, Kai Wang, Xingyi Yang, Jingwen Ye, Xinchao Wang
- Abstract summary: We introduce a emphdataset factorization approach, termed emphHaBa, which is a plug-and-play strategy portable to any existing dataset distillation (DD) baseline.
emphHaBa explores decomposing a dataset into two components: data emphHallucination networks and emphBases.
Our method can yield significant improvement on downstream classification tasks compared with previous state of the arts, while reducing the total number of compressed parameters by up to 65%.
- Score: 58.8114016318593
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we study \xw{dataset distillation (DD)}, from a novel
perspective and introduce a \emph{dataset factorization} approach, termed
\emph{HaBa}, which is a plug-and-play strategy portable to any existing DD
baseline. Unlike conventional DD approaches that aim to produce distilled and
representative samples, \emph{HaBa} explores decomposing a dataset into two
components: data \emph{Ha}llucination networks and \emph{Ba}ses, where the
latter is fed into the former to reconstruct image samples. The flexible
combinations between bases and hallucination networks, therefore, equip the
distilled data with exponential informativeness gain, which largely increase
the representation capability of distilled datasets. To furthermore increase
the data efficiency of compression results, we further introduce a pair of
adversarial contrastive constraints on the resultant hallucination networks and
bases, which increase the diversity of generated images and inject more
discriminant information into the factorization. Extensive comparisons and
experiments demonstrate that our method can yield significant improvement on
downstream classification tasks compared with previous state of the arts, while
reducing the total number of compressed parameters by up to 65\%. Moreover,
distilled datasets by our approach also achieve \textasciitilde10\% higher
accuracy than baseline methods in cross-architecture generalization. Our code
is available \href{https://github.com/Huage001/DatasetFactorization}{here}.
Related papers
- Hierarchical Features Matter: A Deep Exploration of GAN Priors for Improved Dataset Distillation [51.44054828384487]
We propose a novel parameterization method dubbed Hierarchical Generative Latent Distillation (H-GLaD)
This method systematically explores hierarchical layers within the generative adversarial networks (GANs)
In addition, we introduce a novel class-relevant feature distance metric to alleviate the computational burden associated with synthetic dataset evaluation.
arXiv Detail & Related papers (2024-06-09T09:15:54Z) - Distribution-Aware Data Expansion with Diffusion Models [55.979857976023695]
We propose DistDiff, a training-free data expansion framework based on the distribution-aware diffusion model.
DistDiff consistently enhances accuracy across a diverse range of datasets compared to models trained solely on original data.
arXiv Detail & Related papers (2024-03-11T14:07:53Z) - Hodge-Aware Contrastive Learning [101.56637264703058]
Simplicial complexes prove effective in modeling data with multiway dependencies.
We develop a contrastive self-supervised learning approach for processing simplicial data.
arXiv Detail & Related papers (2023-09-14T00:40:07Z) - Dataset Distillation Meets Provable Subset Selection [14.158845925610438]
dataset distillation is proposed to compress a large training dataset into a smaller synthetic one that retains its performance.
We present a provable, sampling-based approach for initializing the distilled set by identifying important and removing redundant points in the data.
We further merge the idea of data subset selection with dataset distillation, by training the distilled set on '' sampled points during the training procedure instead of randomly sampling the next batch.
arXiv Detail & Related papers (2023-07-16T15:58:19Z) - Towards Efficient Deep Hashing Retrieval: Condensing Your Data via
Feature-Embedding Matching [7.908244841289913]
The expenses involved in training state-of-the-art deep hashing retrieval models have witnessed an increase.
The state-of-the-art dataset distillation methods can not expand to all deep hashing retrieval methods.
We propose an efficient condensation framework that addresses these limitations by matching the feature-embedding between synthetic set and real set.
arXiv Detail & Related papers (2023-05-29T13:23:55Z) - Implicit Counterfactual Data Augmentation for Robust Learning [24.795542869249154]
This study proposes an Implicit Counterfactual Data Augmentation method to remove spurious correlations and make stable predictions.
Experiments have been conducted across various biased learning scenarios covering both image and text datasets.
arXiv Detail & Related papers (2023-04-26T10:36:40Z) - DoubleMix: Simple Interpolation-Based Data Augmentation for Text
Classification [56.817386699291305]
This paper proposes a simple yet effective data augmentation approach termed DoubleMix.
DoubleMix first generates several perturbed samples for each training data.
It then uses the perturbed data and original data to carry out a two-step in the hidden space of neural models.
arXiv Detail & Related papers (2022-09-12T15:01:04Z) - Feature transforms for image data augmentation [74.12025519234153]
In image classification, many augmentation approaches utilize simple image manipulation algorithms.
In this work, we build ensembles on the data level by adding images generated by combining fourteen augmentation approaches.
Pretrained ResNet50 networks are finetuned on training sets that include images derived from each augmentation method.
arXiv Detail & Related papers (2022-01-24T14:12:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.