Related papers: Foreground-Aware Dataset Distillation via Dynamic Patch Selection

Foreground-Aware Dataset Distillation via Dynamic Patch Selection

URL: http://arxiv.org/abs/2601.02727v1
Date: Tue, 06 Jan 2026 05:44:02 GMT
Title: Foreground-Aware Dataset Distillation via Dynamic Patch Selection
Authors: Longzhen Li, Guang Li, Ren Togo, Keisuke Maeda, Takahiro Ogawa, Miki Haseyama,
Abstract summary: We propose a foreground-aware dataset distillation method that enhances patch selection in a content-adaptive manner.<n>Experiments on multiple benchmarks show that the proposed method consistently improves distillation performance over existing approaches.
Score: 56.565143366562495
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this paper, we propose a foreground-aware dataset distillation method that enhances patch selection in a content-adaptive manner. With the rising computational cost of training large-scale deep models, dataset distillation has emerged as a promising approach for constructing compact synthetic datasets that retain the knowledge of their large original counterparts. However, traditional optimization-based methods often suffer from high computational overhead, memory constraints, and the generation of unrealistic, noise-like images with limited architectural generalization. Recent non-optimization methods alleviate some of these issues by constructing distilled data from real image patches, but the used rigid patch selection strategies can still discard critical information about the main objects. To solve this problem, we first leverage Grounded SAM2 to identify foreground objects and compute per-image foreground occupancy, from which we derive a category-wise patch decision threshold. Guided by these thresholds, we design a dynamic patch selection strategy that, for each image, either selects the most informative patch from multiple candidates or directly resizes the full image when the foreground dominates. This dual-path mechanism preserves more key information about the main objects while reducing redundant background content. Extensive experiments on multiple benchmarks show that the proposed method consistently improves distillation performance over existing approaches, producing more informative and representative distilled datasets and enhancing robustness across different architectures and image compositions.

Related papers

Multimodal Dataset Distillation Made Simple by Prototype-Guided Data Synthesis [8.74674837306488]
We propose a learning-free dataset distillation framework that eliminates the need for large-scale training and optimization.<n>Our method uses CLIP to extract aligned image-text embeddings, obtains prototypes, and employs an unCLIP decoder to synthesize images.
arXiv Detail & Related papers (2026-02-23T12:08:28Z)
A Novel Lightweight Transformer with Edge-Aware Fusion for Remote Sensing Image Captioning [0.12499537119440242]
A lightweight transformer architecture is proposed to reduce the dimensionality of the encoder layers and employ a distilled version of GPT-2 as the decoder.<n>A knowledge distillation strategy is used to transfer knowledge from a more complex teacher model to improve the performance of the lightweight network.<n> Experimental results demonstrate that the proposed approach significantly improves caption quality compared to state-of-the-art methods.
arXiv Detail & Related papers (2025-06-11T06:24:02Z)
Seeing What Matters: Empowering CLIP with Patch Generation-to-Selection [54.21851618853518]
We present a concise yet effective approach called Patch Generation-to-Selection to enhance CLIP's training efficiency.<n>Our approach, CLIP-PGS, sets new state-of-the-art results in zero-shot classification and retrieval tasks.
arXiv Detail & Related papers (2025-03-21T12:10:38Z)
Efficient Dataset Distillation via Diffusion-Driven Patch Selection for Improved Generalization [34.53986517177061]
We propose a novel framework to existing diffusion-based distillation methods, leveraging diffusion models for selection rather than generation.<n>Our method starts by predicting noise generated by the diffusion model based on input images and text prompts, then calculates the corresponding loss for each pair.<n>This streamlined framework enables a single-step distillation process, and extensive experiments demonstrate that our approach outperforms state-of-the-art methods across various metrics.
arXiv Detail & Related papers (2024-12-13T08:34:46Z)
One Category One Prompt: Dataset Distillation using Diffusion Models [22.512552596310176]
We introduce Diffusion Models (D3M) as a novel paradigm for dataset distillation, leveraging recent advancements in generative text-to-image foundation models. Our approach utilizes textual inversion, a technique for fine-tuning text-to-image generative models, to create concise and informative representations for large datasets.
arXiv Detail & Related papers (2024-03-11T20:23:59Z)
On the Effect of Image Resolution on Semantic Segmentation [27.115235051091663]
We show that a model capable of directly producing high-resolution segmentations can match the performance of more complex systems. Our approach leverages a bottom-up information propagation technique across various scales. We have rigorously tested our method using leading-edge semantic segmentation datasets.
arXiv Detail & Related papers (2024-02-08T04:21:30Z)
Modeling Image Composition for Complex Scene Generation [77.10533862854706]
We present a method that achieves state-of-the-art results on layout-to-image generation tasks. After compressing RGB images into patch tokens, we propose the Transformer with Focal Attention (TwFA) for exploring dependencies of object-to-object, object-to-patch and patch-to-patch.
arXiv Detail & Related papers (2022-06-02T08:34:25Z)
HIPA: Hierarchical Patch Transformer for Single Image Super Resolution [62.7081074931892]
This paper presents HIPA, a novel Transformer architecture that progressively recovers the high resolution image using a hierarchical patch partition. We build a cascaded model that processes an input image in multiple stages, where we start with tokens with small patch sizes and gradually merge to the full resolution. Such a hierarchical patch mechanism not only explicitly enables feature aggregation at multiple resolutions but also adaptively learns patch-aware features for different image regions.
arXiv Detail & Related papers (2022-03-19T05:09:34Z)
Boosting Robustness of Image Matting with Context Assembling and Strong Data Augmentation [83.31087402305306]
robustness to trimaps and generalization to images from different domains is still under-explored. We propose an image matting method which achieves higher robustness (RMat) via multilevel context assembling and strong data augmentation targeting matting.
arXiv Detail & Related papers (2022-01-18T11:45:17Z)
Learning Deformable Image Registration from Optimization: Perspective, Modules, Bilevel Training and Beyond [62.730497582218284]
We develop a new deep learning based framework to optimize a diffeomorphic model via multi-scale propagation. We conduct two groups of image registration experiments on 3D volume datasets including image-to-atlas registration on brain MRI data and image-to-image registration on liver CT data.
arXiv Detail & Related papers (2020-04-30T03:23:45Z)

This list is automatically generated from the titles and abstracts of the papers in this site.