Dataset Condensation with Latent Space Knowledge Factorization and
Sharing
- URL: http://arxiv.org/abs/2208.10494v1
- Date: Sun, 21 Aug 2022 18:14:08 GMT
- Title: Dataset Condensation with Latent Space Knowledge Factorization and
Sharing
- Authors: Hae Beom Lee, Dong Bok Lee, Sung Ju Hwang
- Abstract summary: We introduce a novel approach for solving dataset condensation problem by exploiting the regularity in a given dataset.
Instead of condensing the dataset directly in the original input space, we assume a generative process of the dataset with a set of learnable codes.
We experimentally show that our method achieves new state-of-the-art records by significant margins on various benchmark datasets.
- Score: 73.31614936678571
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we introduce a novel approach for systematically solving
dataset condensation problem in an efficient manner by exploiting the
regularity in a given dataset. Instead of condensing the dataset directly in
the original input space, we assume a generative process of the dataset with a
set of learnable codes defined in a compact latent space followed by a set of
tiny decoders which maps them differently to the original input space. By
combining different codes and decoders interchangeably, we can dramatically
increase the number of synthetic examples with essentially the same parameter
count, because the latent space is much lower dimensional and since we can
assume as many decoders as necessary to capture different styles represented in
the dataset with negligible cost. Such knowledge factorization allows efficient
sharing of information between synthetic examples in a systematic way,
providing far better trade-off between compression ratio and quality of the
generated examples. We experimentally show that our method achieves new
state-of-the-art records by significant margins on various benchmark datasets
such as SVHN, CIFAR10, CIFAR100, and TinyImageNet.
Related papers
- Style Quantization for Data-Efficient GAN Training [18.40243591024141]
Under limited data setting, GANs often struggle to navigate and effectively exploit the input latent space.
We propose textitSQ-GAN, a novel approach that enhances consistency regularization.
Experiments demonstrate significant improvements in both discriminator robustness and generation quality.
arXiv Detail & Related papers (2025-03-31T16:28:44Z) - Efficient Dataset Distillation through Low-Rank Space Sampling [34.29086540681496]
This paper proposes a dataset distillation method based on Matching Training Trajectories with Low-rank Space Sampling.
The synthetic data is represented by basis vectors and shared dimension mappers from these subspaces.
The proposed method is tested on CIFAR-10, CIFAR-100, and SVHN datasets, and outperforms the baseline methods by an average of 9.9%.
arXiv Detail & Related papers (2025-03-11T02:59:17Z) - Heavy Labels Out! Dataset Distillation with Label Space Lightening [69.67681224137561]
HeLlO aims at effective image-to-label projectors, with which synthetic labels can be directly generated online from synthetic images.
We demonstrate that with only about 0.003% of the original storage required for a complete set of soft labels, we achieve comparable performance to current state-of-the-art dataset distillation methods on large-scale datasets.
arXiv Detail & Related papers (2024-08-15T15:08:58Z) - Hierarchical Features Matter: A Deep Exploration of GAN Priors for Improved Dataset Distillation [51.44054828384487]
We propose a novel parameterization method dubbed Hierarchical Generative Latent Distillation (H-GLaD)
This method systematically explores hierarchical layers within the generative adversarial networks (GANs)
In addition, we introduce a novel class-relevant feature distance metric to alleviate the computational burden associated with synthetic dataset evaluation.
arXiv Detail & Related papers (2024-06-09T09:15:54Z) - Koopcon: A new approach towards smarter and less complex learning [13.053285552524052]
In the era of big data, the sheer volume and complexity of datasets pose significant challenges in machine learning.
This paper introduces an innovative Autoencoder-based dataset condensation model backed by Koopman operator theory.
Inspired by the predictive coding mechanisms of the human brain, our model leverages a novel approach to encode and reconstruct data.
arXiv Detail & Related papers (2024-05-22T17:47:14Z) - One Category One Prompt: Dataset Distillation using Diffusion Models [22.512552596310176]
We introduce Diffusion Models (D3M) as a novel paradigm for dataset distillation, leveraging recent advancements in generative text-to-image foundation models.
Our approach utilizes textual inversion, a technique for fine-tuning text-to-image generative models, to create concise and informative representations for large datasets.
arXiv Detail & Related papers (2024-03-11T20:23:59Z) - Improved Distribution Matching for Dataset Condensation [91.55972945798531]
We propose a novel dataset condensation method based on distribution matching.
Our simple yet effective method outperforms most previous optimization-oriented methods with much fewer computational resources.
arXiv Detail & Related papers (2023-07-19T04:07:33Z) - Disentanglement via Latent Quantization [60.37109712033694]
In this work, we construct an inductive bias towards encoding to and decoding from an organized latent space.
We demonstrate the broad applicability of this approach by adding it to both basic data-re (vanilla autoencoder) and latent-reconstructing (InfoGAN) generative models.
arXiv Detail & Related papers (2023-05-28T06:30:29Z) - DC-BENCH: Dataset Condensation Benchmark [79.18718490863908]
This work provides the first large-scale standardized benchmark on dataset condensation.
It consists of a suite of evaluations to comprehensively reflect the generability and effectiveness of condensation methods.
The benchmark library is open-sourced to facilitate future research and application.
arXiv Detail & Related papers (2022-07-20T03:54:05Z) - Neural Distributed Source Coding [59.630059301226474]
We present a framework for lossy DSC that is agnostic to the correlation structure and can scale to high dimensions.
We evaluate our method on multiple datasets and show that our method can handle complex correlations and state-of-the-art PSNR.
arXiv Detail & Related papers (2021-06-05T04:50:43Z) - Encoded Prior Sliced Wasserstein AutoEncoder for learning latent
manifold representations [0.7614628596146599]
We introduce an Encoded Prior Sliced Wasserstein AutoEncoder.
An additional prior-encoder network learns an embedding of the data manifold.
We show that the prior encodes the geometry underlying the data unlike conventional autoencoders.
arXiv Detail & Related papers (2020-10-02T14:58:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.