GIFT: Unlocking Full Potential of Labels in Distilled Dataset at Near-zero Cost
- URL: http://arxiv.org/abs/2405.14736v1
- Date: Thu, 23 May 2024 16:02:30 GMT
- Title: GIFT: Unlocking Full Potential of Labels in Distilled Dataset at Near-zero Cost
- Authors: Xinyi Shang, Peng Sun, Tao Lin,
- Abstract summary: We introduce a novel perspective by emphasizing the full utilization of labels.
We introduce GIFT, which encompasses soft label refinement and a cosine similarity-based loss function.
GIFT consistently enhances the state-of-the-art dataset distillation methods without incurring additional computational costs.
- Score: 7.05277588099645
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent advancements in dataset distillation have demonstrated the significant benefits of employing soft labels generated by pre-trained teacher models. In this paper, we introduce a novel perspective by emphasizing the full utilization of labels. We first conduct a comprehensive comparison of various loss functions for soft label utilization in dataset distillation, revealing that the model trained on the synthetic dataset exhibits high sensitivity to the choice of loss function for soft label utilization. This finding highlights the necessity of a universal loss function for training models on synthetic datasets. Building on these insights, we introduce an extremely simple yet surprisingly effective plug-and-play approach, GIFT, which encompasses soft label refinement and a cosine similarity-based loss function to efficiently leverage full label information. Extensive experiments demonstrate that GIFT consistently enhances the state-of-the-art dataset distillation methods across various scales datasets without incurring additional computational costs. For instance, on ImageNet-1K with IPC = 10, GIFT improves the SOTA method RDED by 3.9% and 1.8% on ConvNet and ResNet-18, respectively. Code: https://github.com/LINs-lab/GIFT.
Related papers
- Enhancing Zero-Shot Vision Models by Label-Free Prompt Distribution Learning and Bias Correcting [55.361337202198925]
Vision-language models, such as CLIP, have shown impressive generalization capacities when using appropriate text descriptions.
We propose a label-Free prompt distribution learning and bias correction framework, dubbed as **Frolic**, which boosts zero-shot performance without the need for labeled data.
arXiv Detail & Related papers (2024-10-25T04:00:45Z) - DRUPI: Dataset Reduction Using Privileged Information [20.59889438709671]
dataset reduction (DR) seeks to select or distill samples from large datasets into smaller subsets while preserving performance on target tasks.
We introduce dataset Reduction Using Privileged Information (DRUPI), which enriches DR by synthesizing privileged information alongside the reduced dataset.
Our findings reveal that effective feature labels must balance between being overly discriminative and excessively diverse, with a moderate level proving optimal for improving the reduced dataset's efficacy.
arXiv Detail & Related papers (2024-10-02T14:49:05Z) - Heavy Labels Out! Dataset Distillation with Label Space Lightening [69.67681224137561]
HeLlO aims at effective image-to-label projectors, with which synthetic labels can be directly generated online from synthetic images.
We demonstrate that with only about 0.003% of the original storage required for a complete set of soft labels, we achieve comparable performance to current state-of-the-art dataset distillation methods on large-scale datasets.
arXiv Detail & Related papers (2024-08-15T15:08:58Z) - Breaking Class Barriers: Efficient Dataset Distillation via Inter-Class Feature Compensator [42.04363042234042]
Inter-class Feature Compensator (INFER) is an innovative distillation approach that transcends the class-specific data-label framework widely utilized in current dataset distillation methods.
INFER enriches inter-class interactions during the distillation, thereby enhancing the effectiveness and generalizability of the distilled data.
arXiv Detail & Related papers (2024-08-13T14:29:00Z) - A Label is Worth a Thousand Images in Dataset Distillation [16.272675455429006]
Data $textitquality$ is a crucial factor in the performance of machine learning models.
We show that the main factor explaining the performance of state-of-the-art distillation methods is not the techniques used to generate synthetic data but rather the use of soft labels.
arXiv Detail & Related papers (2024-06-15T03:30:29Z) - Distill Gold from Massive Ores: Bi-level Data Pruning towards Efficient Dataset Distillation [96.92250565207017]
We study the data efficiency and selection for the dataset distillation task.
By re-formulating the dynamics of distillation, we provide insight into the inherent redundancy in the real dataset.
We find the most contributing samples based on their causal effects on the distillation.
arXiv Detail & Related papers (2023-05-28T06:53:41Z) - Enhancing Label Sharing Efficiency in Complementary-Label Learning with
Label Augmentation [92.4959898591397]
We analyze the implicit sharing of complementary labels on nearby instances during training.
We propose a novel technique that enhances the sharing efficiency via complementary-label augmentation.
Our results confirm that complementary-label augmentation can systematically improve empirical performance over state-of-the-art CLL models.
arXiv Detail & Related papers (2023-05-15T04:43:14Z) - Eliciting and Learning with Soft Labels from Every Annotator [31.10635260890126]
We focus on efficiently eliciting soft labels from individual annotators.
We demonstrate that learning with our labels achieves comparable model performance to prior approaches.
arXiv Detail & Related papers (2022-07-02T12:03:00Z) - CAFE: Learning to Condense Dataset by Aligning Features [72.99394941348757]
We propose a novel scheme to Condense dataset by Aligning FEatures (CAFE)
At the heart of our approach is an effective strategy to align features from the real and synthetic data across various scales.
We validate the proposed CAFE across various datasets, and demonstrate that it generally outperforms the state of the art.
arXiv Detail & Related papers (2022-03-03T05:58:49Z) - How to Leverage Unlabeled Data in Offline Reinforcement Learning [125.72601809192365]
offline reinforcement learning (RL) can learn control policies from static datasets but, like standard RL methods, it requires reward annotations for every transition.
One natural solution is to learn a reward function from the labeled data and use it to label the unlabeled data.
We find that, perhaps surprisingly, a much simpler method that simply applies zero rewards to unlabeled data leads to effective data sharing.
arXiv Detail & Related papers (2022-02-03T18:04:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.