Multimodal Dataset Distillation Made Simple by Prototype-Guided Data Synthesis
- URL: http://arxiv.org/abs/2602.19756v2
- Date: Fri, 27 Feb 2026 06:30:54 GMT
- Title: Multimodal Dataset Distillation Made Simple by Prototype-Guided Data Synthesis
- Authors: Junhyeok Choi, Sangwoo Mo, Minwoo Chae,
- Abstract summary: We propose a learning-free dataset distillation framework that eliminates the need for large-scale training and optimization.<n>Our method uses CLIP to extract aligned image-text embeddings, obtains prototypes, and employs an unCLIP decoder to synthesize images.
- Score: 8.74674837306488
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent advances in multimodal learning have achieved remarkable success across diverse vision-language tasks. However, such progress heavily relies on large-scale image-text datasets, making training costly and inefficient. Prior efforts in dataset filtering and pruning attempt to mitigate this issue, but still require relatively large subsets to maintain performance and fail under very small subsets. Dataset distillation offers a promising alternative, yet existing multimodal dataset distillation methods require full-dataset training and joint optimization of image pixels and text features, making them architecture-dependent and limiting cross-architecture generalization. To overcome this, we propose a learning-free dataset distillation framework that eliminates the need for large-scale training and optimization while enhancing generalization across architectures. Our method uses CLIP to extract aligned image-text embeddings, obtains prototypes, and employs an unCLIP decoder to synthesize images, enabling efficient and scalable multimodal dataset distillation. Extensive experiments demonstrate that our approach consistently outperforms optimization-based dataset distillation and subset selection methods, achieving state-of-the-art cross-architecture generalization.
Related papers
- Foreground-Aware Dataset Distillation via Dynamic Patch Selection [56.565143366562495]
We propose a foreground-aware dataset distillation method that enhances patch selection in a content-adaptive manner.<n>Experiments on multiple benchmarks show that the proposed method consistently improves distillation performance over existing approaches.
arXiv Detail & Related papers (2026-01-06T05:44:02Z) - Optimizing Distributional Geometry Alignment with Optimal Transport for Generative Dataset Distillation [109.13471554184554]
We reformulate dataset distillation as an Optimal Transport (OT) distance minimization problem.<n>OT offers a geometrically faithful framework for distribution matching.<n>Our method consistently outperforms state-of-the-art approaches in an efficient manner.
arXiv Detail & Related papers (2025-11-29T04:04:05Z) - Efficient Multimodal Dataset Distillation via Generative Models [37.60051495186203]
We introduce EDGE, a generative distillation method for efficient multimodal dataset distillation.<n>Specifically, we identify two key challenges of distilling multimodal datasets with generative models.<n>We propose a novel generative model training workflow with a bi-directional contrastive loss and a diversity loss.<n>Our method is evaluated on Flickr30K, COCO, and CC3M datasets, demonstrating superior performance and efficiency compared to existing approaches.
arXiv Detail & Related papers (2025-09-18T22:36:57Z) - Large-Scale Data-Free Knowledge Distillation for ImageNet via Multi-Resolution Data Generation [53.95204595640208]
Data-Free Knowledge Distillation (DFKD) is an advanced technique that enables knowledge transfer from a teacher model to a student model without relying on original training data.
Previous approaches have generated synthetic images at high resolutions without leveraging information from real images.
MUSE generates images at lower resolutions while using Class Activation Maps (CAMs) to ensure that the generated images retain critical, class-specific features.
arXiv Detail & Related papers (2024-11-26T02:23:31Z) - D$^4$M: Dataset Distillation via Disentangled Diffusion Model [4.568710926635445]
We propose an efficient framework for dataset distillation via Disentangled Diffusion Model (D$4$M)
Compared to architecture-dependent methods, D$4$M employs latent diffusion model to guarantee consistency and incorporates label information into category prototypes.
D$4$M demonstrates superior performance and robust generalization, surpassing the SOTA methods across most aspects.
arXiv Detail & Related papers (2024-07-21T12:16:20Z) - ATOM: Attention Mixer for Efficient Dataset Distillation [17.370852204228253]
We propose a module to efficiently distill large datasets using a mixture of channel and spatial-wise attention.<n>By integrating both types of attention, our ATOM module demonstrates superior performance across various computer vision datasets.
arXiv Detail & Related papers (2024-05-02T15:15:01Z) - One Category One Prompt: Dataset Distillation using Diffusion Models [22.512552596310176]
We introduce Diffusion Models (D3M) as a novel paradigm for dataset distillation, leveraging recent advancements in generative text-to-image foundation models.
Our approach utilizes textual inversion, a technique for fine-tuning text-to-image generative models, to create concise and informative representations for large datasets.
arXiv Detail & Related papers (2024-03-11T20:23:59Z) - Generalizing Dataset Distillation via Deep Generative Prior [75.9031209877651]
We propose to distill an entire dataset's knowledge into a few synthetic images.
The idea is to synthesize a small number of synthetic data points that, when given to a learning algorithm as training data, result in a model approximating one trained on the original data.
We present a new optimization algorithm that distills a large number of images into a few intermediate feature vectors in the generative model's latent space.
arXiv Detail & Related papers (2023-05-02T17:59:31Z) - A Comprehensive Survey of Dataset Distillation [73.15482472726555]
It has become challenging to handle the unlimited growth of data with limited computing power.
Deep learning technology has developed unprecedentedly in the last decade.
This paper provides a holistic understanding of dataset distillation from multiple aspects.
arXiv Detail & Related papers (2023-01-13T15:11:38Z) - Learning Deformable Image Registration from Optimization: Perspective,
Modules, Bilevel Training and Beyond [62.730497582218284]
We develop a new deep learning based framework to optimize a diffeomorphic model via multi-scale propagation.
We conduct two groups of image registration experiments on 3D volume datasets including image-to-atlas registration on brain MRI data and image-to-image registration on liver CT data.
arXiv Detail & Related papers (2020-04-30T03:23:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.