Dataset Condensation with Latent Space Knowledge Factorization and
Sharing
- URL: http://arxiv.org/abs/2208.10494v1
- Date: Sun, 21 Aug 2022 18:14:08 GMT
- Title: Dataset Condensation with Latent Space Knowledge Factorization and
Sharing
- Authors: Hae Beom Lee, Dong Bok Lee, Sung Ju Hwang
- Abstract summary: We introduce a novel approach for solving dataset condensation problem by exploiting the regularity in a given dataset.
Instead of condensing the dataset directly in the original input space, we assume a generative process of the dataset with a set of learnable codes.
We experimentally show that our method achieves new state-of-the-art records by significant margins on various benchmark datasets.
- Score: 73.31614936678571
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we introduce a novel approach for systematically solving
dataset condensation problem in an efficient manner by exploiting the
regularity in a given dataset. Instead of condensing the dataset directly in
the original input space, we assume a generative process of the dataset with a
set of learnable codes defined in a compact latent space followed by a set of
tiny decoders which maps them differently to the original input space. By
combining different codes and decoders interchangeably, we can dramatically
increase the number of synthetic examples with essentially the same parameter
count, because the latent space is much lower dimensional and since we can
assume as many decoders as necessary to capture different styles represented in
the dataset with negligible cost. Such knowledge factorization allows efficient
sharing of information between synthetic examples in a systematic way,
providing far better trade-off between compression ratio and quality of the
generated examples. We experimentally show that our method achieves new
state-of-the-art records by significant margins on various benchmark datasets
such as SVHN, CIFAR10, CIFAR100, and TinyImageNet.
Related papers
- Hierarchical Features Matter: A Deep Exploration of GAN Priors for Improved Dataset Distillation [51.44054828384487]
We propose a novel parameterization method dubbed Hierarchical Generative Latent Distillation (H-GLaD)
This method systematically explores hierarchical layers within the generative adversarial networks (GANs)
In addition, we introduce a novel class-relevant feature distance metric to alleviate the computational burden associated with synthetic dataset evaluation.
arXiv Detail & Related papers (2024-06-09T09:15:54Z) - Koopcon: A new approach towards smarter and less complex learning [13.053285552524052]
In the era of big data, the sheer volume and complexity of datasets pose significant challenges in machine learning.
This paper introduces an innovative Autoencoder-based dataset condensation model backed by Koopman operator theory.
Inspired by the predictive coding mechanisms of the human brain, our model leverages a novel approach to encode and reconstruct data.
arXiv Detail & Related papers (2024-05-22T17:47:14Z) - One Category One Prompt: Dataset Distillation using Diffusion Models [22.512552596310176]
We introduce Diffusion Models (D3M) as a novel paradigm for dataset distillation, leveraging recent advancements in generative text-to-image foundation models.
Our approach utilizes textual inversion, a technique for fine-tuning text-to-image generative models, to create concise and informative representations for large datasets.
arXiv Detail & Related papers (2024-03-11T20:23:59Z) - Improved Distribution Matching for Dataset Condensation [91.55972945798531]
We propose a novel dataset condensation method based on distribution matching.
Our simple yet effective method outperforms most previous optimization-oriented methods with much fewer computational resources.
arXiv Detail & Related papers (2023-07-19T04:07:33Z) - Disentanglement via Latent Quantization [60.37109712033694]
In this work, we construct an inductive bias towards encoding to and decoding from an organized latent space.
We demonstrate the broad applicability of this approach by adding it to both basic data-re (vanilla autoencoder) and latent-reconstructing (InfoGAN) generative models.
arXiv Detail & Related papers (2023-05-28T06:30:29Z) - DC-BENCH: Dataset Condensation Benchmark [79.18718490863908]
This work provides the first large-scale standardized benchmark on dataset condensation.
It consists of a suite of evaluations to comprehensively reflect the generability and effectiveness of condensation methods.
The benchmark library is open-sourced to facilitate future research and application.
arXiv Detail & Related papers (2022-07-20T03:54:05Z) - Neural Distributed Source Coding [59.630059301226474]
We present a framework for lossy DSC that is agnostic to the correlation structure and can scale to high dimensions.
We evaluate our method on multiple datasets and show that our method can handle complex correlations and state-of-the-art PSNR.
arXiv Detail & Related papers (2021-06-05T04:50:43Z) - Encoded Prior Sliced Wasserstein AutoEncoder for learning latent
manifold representations [0.7614628596146599]
We introduce an Encoded Prior Sliced Wasserstein AutoEncoder.
An additional prior-encoder network learns an embedding of the data manifold.
We show that the prior encodes the geometry underlying the data unlike conventional autoencoders.
arXiv Detail & Related papers (2020-10-02T14:58:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.