Related papers: ATOM: Attention Mixer for Efficient Dataset Distillation

ATOM: Attention Mixer for Efficient Dataset Distillation

URL: http://arxiv.org/abs/2405.01373v1
Date: Thu, 2 May 2024 15:15:01 GMT
Title: ATOM: Attention Mixer for Efficient Dataset Distillation
Authors: Samir Khaki, Ahmad Sajedi, Kai Wang, Lucy Z. Liu, Yuri A. Lawryshyn, Konstantinos N. Plataniotis,
Abstract summary: We propose a module to efficiently distill large datasets using a mixture of channel and spatial-wise attention. By integrating both types of attention, our ATOM module demonstrates superior performance across various computer vision datasets.
Score: 17.370852204228253
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent works in dataset distillation seek to minimize training expenses by generating a condensed synthetic dataset that encapsulates the information present in a larger real dataset. These approaches ultimately aim to attain test accuracy levels akin to those achieved by models trained on the entirety of the original dataset. Previous studies in feature and distribution matching have achieved significant results without incurring the costs of bi-level optimization in the distillation process. Despite their convincing efficiency, many of these methods suffer from marginal downstream performance improvements, limited distillation of contextual information, and subpar cross-architecture generalization. To address these challenges in dataset distillation, we propose the ATtentiOn Mixer (ATOM) module to efficiently distill large datasets using a mixture of channel and spatial-wise attention in the feature matching process. Spatial-wise attention helps guide the learning process based on consistent localization of classes in their respective images, allowing for distillation from a broader receptive field. Meanwhile, channel-wise attention captures the contextual information associated with the class itself, thus making the synthetic image more informative for training. By integrating both types of attention, our ATOM module demonstrates superior performance across various computer vision datasets, including CIFAR10/100 and TinyImagenet. Notably, our method significantly improves performance in scenarios with a low number of images per class, thereby enhancing its potential. Furthermore, we maintain the improvement in cross-architectures and applications such as neural architecture search.

Related papers

Boost Self-Supervised Dataset Distillation via Parameterization, Predefined Augmentation, and Approximation [19.552569546864913]
We propose a technique to distill images and their self-supervisedly trained representations into a distilled set.<n>This procedure effectively extracts rich information from real datasets, yielding the distilled sets with enhanced cross-architecture generalizability.<n>In particular, we introduce an innovative parameterization upon images and representations via distinct low-dimensional bases.
arXiv Detail & Related papers (2025-07-29T02:51:56Z)
Contrastive Learning-Enhanced Trajectory Matching for Small-Scale Dataset Distillation [0.7560883489000576]
We propose a novel dataset distillation method integrating contrastive learning during image synthesis.<n>Our approach produces more informative and diverse synthetic samples, even when dataset sizes are significantly constrained.
arXiv Detail & Related papers (2025-05-21T08:46:29Z)
DD-Ranking: Rethinking the Evaluation of Dataset Distillation [223.28392857127733]
We propose DD-Ranking, a unified evaluation framework, along with new general evaluation metrics to uncover the true performance improvements achieved by different methods.<n>By refocusing on the actual information enhancement of distilled datasets, DD-Ranking provides a more comprehensive and fair evaluation standard for future research advancements.
arXiv Detail & Related papers (2025-05-19T16:19:50Z)
The Evolution of Dataset Distillation: Toward Scalable and Generalizable Solutions [9.622221492744496]
This survey comprehensively reviews recent advances in dataset distillation. We focus on scaling to large-scale datasets such as ImageNet-1K and ImageNet-21K. We highlight breakthrough innovations, including the SRe2L framework for efficient and effective condensation. We also explore emerging applications in video and audio processing, multi-modal learning, medical imaging, and scientific computing.
arXiv Detail & Related papers (2025-02-08T19:37:33Z)
Data-to-Model Distillation: Data-Efficient Learning Framework [14.44010988811002]
We propose a novel framework called Data-to-Model Distillation (D2M) to distill the real dataset's knowledge into the learnable parameters of a pre-trained generative model. Our method effectively scales up to high-resolution 128x128 ImageNet-1K.
arXiv Detail & Related papers (2024-11-19T20:10:28Z)
Curriculum Dataset Distillation [22.938976109450877]
We present a curriculum-based dataset distillation framework designed to harmonize scalability with efficiency. This framework strategically distills synthetic images, adhering to a curriculum that transitions from simple to complex. Our framework sets new benchmarks in large-scale dataset distillation, achieving substantial improvements of 11.1% on Tiny-ImageNet, 9.0% on ImageNet-1K, and 7.3% on ImageNet-21K.
arXiv Detail & Related papers (2024-05-15T07:27:14Z)
Deep Domain Adaptation: A Sim2Real Neural Approach for Improving Eye-Tracking Systems [80.62854148838359]
Eye image segmentation is a critical step in eye tracking that has great influence over the final gaze estimate. We use dimensionality-reduction techniques to measure the overlap between the target eye images and synthetic training data. Our methods result in robust, improved performance when tackling the discrepancy between simulation and real-world data samples.
arXiv Detail & Related papers (2024-03-23T22:32:06Z)
Importance-Aware Adaptive Dataset Distillation [53.79746115426363]
Development of deep learning models is enabled by the availability of large-scale datasets. dataset distillation aims to synthesize a compact dataset that retains the essential information from the large original dataset. We propose an importance-aware adaptive dataset distillation (IADD) method that can improve distillation performance.
arXiv Detail & Related papers (2024-01-29T03:29:39Z)
Data Distillation Can Be Like Vodka: Distilling More Times For Better Quality [78.6359306550245]
We argue that using just one synthetic subset for distillation will not yield optimal generalization performance. PDD synthesizes multiple small sets of synthetic images, each conditioned on the previous sets, and trains the model on the cumulative union of these subsets. Our experiments show that PDD can effectively improve the performance of existing dataset distillation methods by up to 4.3%.
arXiv Detail & Related papers (2023-10-10T20:04:44Z)
DataDAM: Efficient Dataset Distillation with Attention Matching [15.300968899043498]
Researchers have long tried to minimize training costs in deep learning by maintaining strong generalization across diverse datasets. Emerging research on dataset aims to reduce training costs by creating a small synthetic set that contains the information of a larger real dataset. However, the synthetic data generated by previous methods are not guaranteed to distribute and discriminate as well as the original training data.
arXiv Detail & Related papers (2023-09-29T19:07:48Z)
Distill Gold from Massive Ores: Bi-level Data Pruning towards Efficient Dataset Distillation [96.92250565207017]
We study the data efficiency and selection for the dataset distillation task. By re-formulating the dynamics of distillation, we provide insight into the inherent redundancy in the real dataset. We find the most contributing samples based on their causal effects on the distillation.
arXiv Detail & Related papers (2023-05-28T06:53:41Z)
Generalizing Dataset Distillation via Deep Generative Prior [75.9031209877651]
We propose to distill an entire dataset's knowledge into a few synthetic images. The idea is to synthesize a small number of synthetic data points that, when given to a learning algorithm as training data, result in a model approximating one trained on the original data. We present a new optimization algorithm that distills a large number of images into a few intermediate feature vectors in the generative model's latent space.
arXiv Detail & Related papers (2023-05-02T17:59:31Z)
Minimizing the Accumulated Trajectory Error to Improve Dataset Distillation [151.70234052015948]
We propose a novel approach that encourages the optimization algorithm to seek a flat trajectory. We show that the weights trained on synthetic data are robust against the accumulated errors perturbations with the regularization towards the flat trajectory. Our method, called Flat Trajectory Distillation (FTD), is shown to boost the performance of gradient-matching methods by up to 4.7%.
arXiv Detail & Related papers (2022-11-20T15:49:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.