Related papers: Dataset Distillation via Vision-Language Category Prototype

Dataset Distillation via Vision-Language Category Prototype

URL: http://arxiv.org/abs/2506.23580v1
Date: Mon, 30 Jun 2025 07:34:33 GMT
Title: Dataset Distillation via Vision-Language Category Prototype
Authors: Yawen Zou, Guang Li, Duo Su, Zi Wang, Jun Yu, Chao Zhang,
Abstract summary: We introduce vision-language methods to distill language information and collaboratively synthesize data with image prototypes.<n>This framework demonstrates broad applicability across datasets without pre-existing text descriptions.<n>The proposed approach generates logically coherent images containing target objects, achieving state-of-the-art validation performance and demonstrating robust generalization.
Score: 14.526547847730548
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Dataset distillation (DD) condenses large datasets into compact yet informative substitutes, preserving performance comparable to the original dataset while reducing storage, transmission costs, and computational consumption. However, previous DD methods mainly focus on distilling information from images, often overlooking the semantic information inherent in the data. The disregard for context hinders the model's generalization ability, particularly in tasks involving complex datasets, which may result in illogical outputs or the omission of critical objects. In this study, we integrate vision-language methods into DD by introducing text prototypes to distill language information and collaboratively synthesize data with image prototypes, thereby enhancing dataset distillation performance. Notably, the text prototypes utilized in this study are derived from descriptive text information generated by an open-source large language model. This framework demonstrates broad applicability across datasets without pre-existing text descriptions, expanding the potential of dataset distillation beyond traditional image-based approaches. Compared to other methods, the proposed approach generates logically coherent images containing target objects, achieving state-of-the-art validation performance and demonstrating robust generalization. Source code and generated data are available in https://github.com/zou-yawen/Dataset-Distillation-via-Vision-Language-Category-Prototype/

Related papers

D2AF: A Dual-Driven Annotation and Filtering Framework for Visual Grounding [36.321156992727055]
D2AF is a robust annotation framework for visual grounding using only input images.<n>By implementing dual-driven annotation strategies, we effectively generate detailed region-text pairs.<n>Our findings demonstrate that increasing data volume enhances model performance.
arXiv Detail & Related papers (2025-05-30T09:04:47Z)
CONCORD: Concept-Informed Diffusion for Dataset Distillation [29.092857460373278]
We propose Concept-Informed Diffusion (CONCORD) for dataset distillation.<n>The proposed method significantly enhances both the controllability and interpretability of the distilled image generation.<n>We demonstrate the efficacy of CONCORD by achieving state-of-the-art performance on ImageNet-1K and its subsets.
arXiv Detail & Related papers (2025-05-23T20:39:23Z)
Distilling Vision-Language Foundation Models: A Data-Free Approach via Prompt Diversification [49.41632476658246]
We discuss the extension of DFKD to Vision-Language Foundation Models without access to the billion-level image-text datasets. The objective is to customize a student model for distribution-agnostic downstream tasks with given category concepts. We propose three novel Prompt Diversification methods to encourage image synthesis with diverse styles.
arXiv Detail & Related papers (2024-07-21T13:26:30Z)
One Category One Prompt: Dataset Distillation using Diffusion Models [22.512552596310176]
We introduce Diffusion Models (D3M) as a novel paradigm for dataset distillation, leveraging recent advancements in generative text-to-image foundation models. Our approach utilizes textual inversion, a technique for fine-tuning text-to-image generative models, to create concise and informative representations for large datasets.
arXiv Detail & Related papers (2024-03-11T20:23:59Z)
Open-Vocabulary Camouflaged Object Segmentation [66.94945066779988]
We introduce a new task, open-vocabulary camouflaged object segmentation (OVCOS) We construct a large-scale complex scene dataset (textbfOVCamo) containing 11,483 hand-selected images with fine annotations and corresponding object classes. By integrating the guidance of class semantic knowledge and the supplement of visual structure cues from the edge and depth information, the proposed method can efficiently capture camouflaged objects.
arXiv Detail & Related papers (2023-11-19T06:00:39Z)
Exploring Multilingual Text Data Distillation [0.0]
We propose several data distillation techniques for multilingual text classification datasets using language-model-based learning methods. We conduct experiments to analyze their performance in terms of classification strength, and cross-architecture generalization. Our approach builds upon existing techniques, enhancing cross-architecture generalization in the text data distillation domain.
arXiv Detail & Related papers (2023-08-09T14:31:57Z)
Zero-shot Composed Text-Image Retrieval [72.43790281036584]
We consider the problem of composed image retrieval (CIR) It aims to train a model that can fuse multi-modal information, e.g., text and images, to accurately retrieve images that match the query, extending the user's expression ability.
arXiv Detail & Related papers (2023-06-12T17:56:01Z)
Generalizing Dataset Distillation via Deep Generative Prior [75.9031209877651]
We propose to distill an entire dataset's knowledge into a few synthetic images. The idea is to synthesize a small number of synthetic data points that, when given to a learning algorithm as training data, result in a model approximating one trained on the original data. We present a new optimization algorithm that distills a large number of images into a few intermediate feature vectors in the generative model's latent space.
arXiv Detail & Related papers (2023-05-02T17:59:31Z)
Modeling Entities as Semantic Points for Visual Information Extraction in the Wild [55.91783742370978]
We propose an alternative approach to precisely and robustly extract key information from document images. We explicitly model entities as semantic points, i.e., center points of entities are enriched with semantic information describing the attributes and relationships of different entities. The proposed method can achieve significantly enhanced performance on entity labeling and linking, compared with previous state-of-the-art models.
arXiv Detail & Related papers (2023-03-23T08:21:16Z)
Prefix Conditioning Unifies Language and Label Supervision [84.11127588805138]
We show that dataset biases negatively affect pre-training by reducing the generalizability of learned representations. In experiments, we show that this simple technique improves the performance in zero-shot image recognition accuracy and robustness to the image-level distribution shift.
arXiv Detail & Related papers (2022-06-02T16:12:26Z)
Generating More Pertinent Captions by Leveraging Semantics and Style on Multi-Source Datasets [56.018551958004814]
This paper addresses the task of generating fluent descriptions by training on a non-uniform combination of data sources. Large-scale datasets with noisy image-text pairs provide a sub-optimal source of supervision. We propose to leverage and separate semantics and descriptive style through the incorporation of a style token and keywords extracted through a retrieval component.
arXiv Detail & Related papers (2021-11-24T19:00:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.