Dataset Condensation with Color Compensation
- URL: http://arxiv.org/abs/2508.01139v1
- Date: Sat, 02 Aug 2025 01:44:23 GMT
- Title: Dataset Condensation with Color Compensation
- Authors: Huyu Wu, Duo Su, Junjie Hou, Guang Li,
- Abstract summary: Existing methods struggle with two: image-level selection methods (Coreset Selection, dataset Quantization) suffer from condensation inefficiency.<n>We find that a critical problem in dataset condensation is the oversight of color's dual role as an information carrier and a basic semantic representation unit.<n>We propose DC3: a dataset condensation framework with Color Compensation.
- Score: 1.8962690634270805
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Dataset condensation always faces a constitutive trade-off: balancing performance and fidelity under extreme compression. Existing methods struggle with two bottlenecks: image-level selection methods (Coreset Selection, Dataset Quantization) suffer from inefficiency condensation, while pixel-level optimization (Dataset Distillation) introduces semantic distortion due to over-parameterization. With empirical observations, we find that a critical problem in dataset condensation is the oversight of color's dual role as an information carrier and a basic semantic representation unit. We argue that improving the colorfulness of condensed images is beneficial for representation learning. Motivated by this, we propose DC3: a Dataset Condensation framework with Color Compensation. After a calibrated selection strategy, DC3 utilizes the latent diffusion model to enhance the color diversity of an image rather than creating a brand-new one. Extensive experiments demonstrate the superior performance and generalization of DC3 that outperforms SOTA methods across multiple benchmarks. To the best of our knowledge, besides focusing on downstream tasks, DC3 is the first research to fine-tune pre-trained diffusion models with condensed datasets. The FID results prove that training networks with our high-quality datasets is feasible without model collapse or other degradation issues. Code and generated data will be released soon.
Related papers
- DPBridge: Latent Diffusion Bridge for Dense Prediction [49.1574468325115]
We introduce DPBridge, the first latent diffusion bridge framework for dense prediction tasks.<n>Our method consistently achieves superior performance, demonstrating its effectiveness and capability generalization under different scenarios.
arXiv Detail & Related papers (2024-12-29T15:50:34Z) - A Bias-Free Training Paradigm for More General AI-generated Image Detection [15.421102443599773]
A well-designed forensic detector should detect generator specific artifacts rather than reflect data biases.<n>We propose B-Free, a bias-free training paradigm, where fake images are generated from real ones.<n>We show significant improvements in both generalization and robustness over state-of-the-art detectors.
arXiv Detail & Related papers (2024-12-23T15:54:32Z) - Reframing Image Difference Captioning with BLIP2IDC and Synthetic Augmentation [5.887986127737718]
We introduce BLIP2IDC, an adaptation of BLIP2 to the Image Difference Captioning (IDC) task at low computational cost.<n>We show it outperforms two-streams approaches by a significant margin on real-world IDC datasets.<n>We also propose to use synthetic augmentation to improve the performance of IDC models in an agnostic fashion.
arXiv Detail & Related papers (2024-12-20T14:32:56Z) - Decomposed Distribution Matching in Dataset Condensation [16.40653529334528]
Recent research formulates DC as a distribution matching problem which circumvents the costly bi-level optimization.<n>We present a simple yet effective method to match the style information between original and condensed data.<n>We demonstrate the efficacy of our method through experiments on diverse datasets of varying size and resolution.
arXiv Detail & Related papers (2024-12-06T03:20:36Z) - M3D: Dataset Condensation by Minimizing Maximum Mean Discrepancy [26.227927019615446]
Training state-of-the-art (SOTA) deep models often requires extensive data, resulting in substantial training and storage costs.
dataset condensation has been developed to learn a small synthetic set that preserves essential information from the original large-scale dataset.
We present a novel DM-based method named M3D for dataset condensation by Minimizing the Maximum Mean Discrepancy.
arXiv Detail & Related papers (2023-12-26T07:45:32Z) - Leveraging Neural Radiance Fields for Uncertainty-Aware Visual
Localization [56.95046107046027]
We propose to leverage Neural Radiance Fields (NeRF) to generate training samples for scene coordinate regression.
Despite NeRF's efficiency in rendering, many of the rendered data are polluted by artifacts or only contain minimal information gain.
arXiv Detail & Related papers (2023-10-10T20:11:13Z) - Improved Distribution Matching for Dataset Condensation [91.55972945798531]
We propose a novel dataset condensation method based on distribution matching.
Our simple yet effective method outperforms most previous optimization-oriented methods with much fewer computational resources.
arXiv Detail & Related papers (2023-07-19T04:07:33Z) - Training on Thin Air: Improve Image Classification with Generated Data [28.96941414724037]
Diffusion Inversion is a simple yet effective method to generate diverse, high-quality training data for image classification.
Our approach captures the original data distribution and ensures data coverage by inverting images to the latent space of Stable Diffusion.
We identify three key components that allow our generated images to successfully supplant the original dataset.
arXiv Detail & Related papers (2023-05-24T16:33:02Z) - Noise Self-Regression: A New Learning Paradigm to Enhance Low-Light Images Without Task-Related Data [86.68013790656762]
We propose Noise SElf-Regression (NoiSER) without access to any task-related data.<n>NoiSER is highly competitive in enhancement quality, yet with a much smaller model size, and much lower training and inference cost.
arXiv Detail & Related papers (2022-11-09T06:18:18Z) - Degrade is Upgrade: Learning Degradation for Low-light Image Enhancement [52.49231695707198]
We investigate the intrinsic degradation and relight the low-light image while refining the details and color in two steps.
Inspired by the color image formulation, we first estimate the degradation from low-light inputs to simulate the distortion of environment illumination color, and then refine the content to recover the loss of diffuse illumination color.
Our proposed method has surpassed the SOTA by 0.95dB in PSNR on LOL1000 dataset and 3.18% in mAP on ExDark dataset.
arXiv Detail & Related papers (2021-03-19T04:00:27Z) - DeFlow: Learning Complex Image Degradations from Unpaired Data with
Conditional Flows [145.83812019515818]
We propose DeFlow, a method for learning image degradations from unpaired data.
We model the degradation process in the latent space of a shared flow-decoder network.
We validate our DeFlow formulation on the task of joint image restoration and super-resolution.
arXiv Detail & Related papers (2021-01-14T18:58:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.