Importance-Aware Adaptive Dataset Distillation
- URL: http://arxiv.org/abs/2401.15863v1
- Date: Mon, 29 Jan 2024 03:29:39 GMT
- Title: Importance-Aware Adaptive Dataset Distillation
- Authors: Guang Li, Ren Togo, Takahiro Ogawa, Miki Haseyama
- Abstract summary: Development of deep learning models is enabled by the availability of large-scale datasets.
dataset distillation aims to synthesize a compact dataset that retains the essential information from the large original dataset.
We propose an importance-aware adaptive dataset distillation (IADD) method that can improve distillation performance.
- Score: 53.79746115426363
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Herein, we propose a novel dataset distillation method for constructing small
informative datasets that preserve the information of the large original
datasets. The development of deep learning models is enabled by the
availability of large-scale datasets. Despite unprecedented success,
large-scale datasets considerably increase the storage and transmission costs,
resulting in a cumbersome model training process. Moreover, using raw data for
training raises privacy and copyright concerns. To address these issues, a new
task named dataset distillation has been introduced, aiming to synthesize a
compact dataset that retains the essential information from the large original
dataset. State-of-the-art (SOTA) dataset distillation methods have been
proposed by matching gradients or network parameters obtained during training
on real and synthetic datasets. The contribution of different network
parameters to the distillation process varies, and uniformly treating them
leads to degraded distillation performance. Based on this observation, we
propose an importance-aware adaptive dataset distillation (IADD) method that
can improve distillation performance by automatically assigning importance
weights to different network parameters during distillation, thereby
synthesizing more robust distilled datasets. IADD demonstrates superior
performance over other SOTA dataset distillation methods based on parameter
matching on multiple benchmark datasets and outperforms them in terms of
cross-architecture generalization. In addition, the analysis of self-adaptive
weights demonstrates the effectiveness of IADD. Furthermore, the effectiveness
of IADD is validated in a real-world medical application such as COVID-19
detection.
Related papers
- Exploring the potential of prototype-based soft-labels data distillation for imbalanced data classification [0.0]
Main goal is to push further the performance of prototype-based soft-labels distillation in terms of classification accuracy.
Experimental studies trace the capability of the method to distill the data, but also the opportunity to act as an augmentation method.
arXiv Detail & Related papers (2024-03-25T19:15:19Z) - AST: Effective Dataset Distillation through Alignment with Smooth and
High-Quality Expert Trajectories [18.266786462036553]
We propose an effective DD framework named AST, standing for Alignment with Smooth and high-quality expert Trajectories.
We conduct extensive experiments on datasets of different scales, sizes, and resolutions.
arXiv Detail & Related papers (2023-10-16T16:13:53Z) - Data Distillation Can Be Like Vodka: Distilling More Times For Better
Quality [78.6359306550245]
We argue that using just one synthetic subset for distillation will not yield optimal generalization performance.
PDD synthesizes multiple small sets of synthetic images, each conditioned on the previous sets, and trains the model on the cumulative union of these subsets.
Our experiments show that PDD can effectively improve the performance of existing dataset distillation methods by up to 4.3%.
arXiv Detail & Related papers (2023-10-10T20:04:44Z) - Towards Efficient Deep Hashing Retrieval: Condensing Your Data via
Feature-Embedding Matching [7.908244841289913]
The expenses involved in training state-of-the-art deep hashing retrieval models have witnessed an increase.
The state-of-the-art dataset distillation methods can not expand to all deep hashing retrieval methods.
We propose an efficient condensation framework that addresses these limitations by matching the feature-embedding between synthetic set and real set.
arXiv Detail & Related papers (2023-05-29T13:23:55Z) - Distill Gold from Massive Ores: Bi-level Data Pruning towards Efficient Dataset Distillation [96.92250565207017]
We study the data efficiency and selection for the dataset distillation task.
By re-formulating the dynamics of distillation, we provide insight into the inherent redundancy in the real dataset.
We find the most contributing samples based on their causal effects on the distillation.
arXiv Detail & Related papers (2023-05-28T06:53:41Z) - A Comprehensive Survey of Dataset Distillation [73.15482472726555]
It has become challenging to handle the unlimited growth of data with limited computing power.
Deep learning technology has developed unprecedentedly in the last decade.
This paper provides a holistic understanding of dataset distillation from multiple aspects.
arXiv Detail & Related papers (2023-01-13T15:11:38Z) - Dataset Distillation by Matching Training Trajectories [75.9031209877651]
We propose a new formulation that optimize our distilled data to guide networks to a similar state as those trained on real data.
Given a network, we train it for several iterations on our distilled data and optimize the distilled data with respect to the distance between the synthetically trained parameters and the parameters trained on real data.
Our method handily outperforms existing methods and also allows us to distill higher-resolution visual data.
arXiv Detail & Related papers (2022-03-22T17:58:59Z) - LiDAR dataset distillation within bayesian active learning framework:
Understanding the effect of data augmentation [63.20765930558542]
Active learning (AL) has re-gained attention recently to address reduction of annotation costs and dataset size.
This paper performs a principled evaluation of AL based dataset distillation on (1/4th) of the large Semantic-KITTI dataset.
We observe that data augmentation achieves full dataset accuracy using only 60% of samples from the selected dataset configuration.
arXiv Detail & Related papers (2022-02-06T00:04:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.