Dataset Condensation via Generative Model
- URL: http://arxiv.org/abs/2309.07698v1
- Date: Thu, 14 Sep 2023 13:17:02 GMT
- Title: Dataset Condensation via Generative Model
- Authors: David Junhao Zhang, Heng Wang, Chuhui Xue, Rui Yan, Wenqing Zhang,
Song Bai, Mike Zheng Shou
- Abstract summary: We propose to condense large datasets into another format, a generative model.
Such a novel format allows for the condensation of large datasets because the size of the generative model remains relatively stable as the number of classes or image resolution increases.
An intra-class and an inter-class loss are proposed to model the relation of condensed samples.
- Score: 71.89427409059472
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Dataset condensation aims to condense a large dataset with a lot of training
samples into a small set. Previous methods usually condense the dataset into
the pixels format. However, it suffers from slow optimization speed and large
number of parameters to be optimized. When increasing image resolutions and
classes, the number of learnable parameters grows accordingly, prohibiting
condensation methods from scaling up to large datasets with diverse classes.
Moreover, the relations among condensed samples have been neglected and hence
the feature distribution of condensed samples is often not diverse. To solve
these problems, we propose to condense the dataset into another format, a
generative model. Such a novel format allows for the condensation of large
datasets because the size of the generative model remains relatively stable as
the number of classes or image resolution increases. Furthermore, an
intra-class and an inter-class loss are proposed to model the relation of
condensed samples. Intra-class loss aims to create more diverse samples for
each class by pushing each sample away from the others of the same class.
Meanwhile, inter-class loss increases the discriminability of samples by
widening the gap between the centers of different classes. Extensive
comparisons with state-of-the-art methods and our ablation studies confirm the
effectiveness of our method and its individual component. To our best
knowledge, we are the first to successfully conduct condensation on
ImageNet-1k.
Related papers
- Towards Model-Agnostic Dataset Condensation by Heterogeneous Models [13.170099297210372]
We develop a novel method to produce universally applicable condensed images through cross-model interactions.
By balancing the contribution of each model and maintaining their semantic meaning closely, our approach overcomes the limitations associated with model-specific condensed images.
arXiv Detail & Related papers (2024-09-22T17:13:07Z) - Scaling and Masking: A New Paradigm of Data Sampling for Image and Video
Quality Assessment [24.545341041444797]
Quality assessment of images and videos emphasizes both local details and global semantics, whereas general data sampling methods fail to catch them simultaneously.
In this work, instead of stacking up models, a more elegant data sampling method is explored, which compacts both the local and global content in a regular input size.
Experiments show that our sampling method can improve the performance of current single-branch models significantly, and achieves competitive performance to the multi-branch models without extra model complexity.
arXiv Detail & Related papers (2024-01-05T03:12:03Z) - Improved Distribution Matching for Dataset Condensation [91.55972945798531]
We propose a novel dataset condensation method based on distribution matching.
Our simple yet effective method outperforms most previous optimization-oriented methods with much fewer computational resources.
arXiv Detail & Related papers (2023-07-19T04:07:33Z) - Intra-class Adaptive Augmentation with Neighbor Correction for Deep
Metric Learning [99.14132861655223]
We propose a novel intra-class adaptive augmentation (IAA) framework for deep metric learning.
We reasonably estimate intra-class variations for every class and generate adaptive synthetic samples to support hard samples mining.
Our method significantly improves and outperforms the state-of-the-art methods on retrieval performances by 3%-6%.
arXiv Detail & Related papers (2022-11-29T14:52:38Z) - DC-BENCH: Dataset Condensation Benchmark [79.18718490863908]
This work provides the first large-scale standardized benchmark on dataset condensation.
It consists of a suite of evaluations to comprehensively reflect the generability and effectiveness of condensation methods.
The benchmark library is open-sourced to facilitate future research and application.
arXiv Detail & Related papers (2022-07-20T03:54:05Z) - Constrained Deep One-Class Feature Learning For Classifying Imbalanced
Medical Images [4.211466076086617]
One-class classification has attracted increasing attention to address the data imbalance problem.
We propose a novel deep learning-based method to learn compact features.
Our method can learn more relevant features associated with the given class, making the majority and minority samples more distinguishable.
arXiv Detail & Related papers (2021-11-20T15:25:24Z) - Multi-dataset Pretraining: A Unified Model for Semantic Segmentation [97.61605021985062]
We propose a unified framework, termed as Multi-Dataset Pretraining, to take full advantage of the fragmented annotations of different datasets.
This is achieved by first pretraining the network via the proposed pixel-to-prototype contrastive loss over multiple datasets.
In order to better model the relationship among images and classes from different datasets, we extend the pixel level embeddings via cross dataset mixing.
arXiv Detail & Related papers (2021-06-08T06:13:11Z) - Doubly Contrastive Deep Clustering [135.7001508427597]
We present a novel Doubly Contrastive Deep Clustering (DCDC) framework, which constructs contrastive loss over both sample and class views.
Specifically, for the sample view, we set the class distribution of the original sample and its augmented version as positive sample pairs.
For the class view, we build the positive and negative pairs from the sample distribution of the class.
In this way, two contrastive losses successfully constrain the clustering results of mini-batch samples in both sample and class level.
arXiv Detail & Related papers (2021-03-09T15:15:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.