Related papers: DataDAM: Efficient Dataset Distillation with Attention Matching

DataDAM: Efficient Dataset Distillation with Attention Matching

URL: http://arxiv.org/abs/2310.00093v2
Date: Tue, 31 Oct 2023 16:23:34 GMT
Title: DataDAM: Efficient Dataset Distillation with Attention Matching
Authors: Ahmad Sajedi, Samir Khaki, Ehsan Amjadian, Lucy Z. Liu, Yuri A. Lawryshyn, Konstantinos N. Plataniotis
Abstract summary: Researchers have long tried to minimize training costs in deep learning by maintaining strong generalization across diverse datasets. Emerging research on dataset aims to reduce training costs by creating a small synthetic set that contains the information of a larger real dataset. However, the synthetic data generated by previous methods are not guaranteed to distribute and discriminate as well as the original training data.
Score: 15.300968899043498
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Researchers have long tried to minimize training costs in deep learning while maintaining strong generalization across diverse datasets. Emerging research on dataset distillation aims to reduce training costs by creating a small synthetic set that contains the information of a larger real dataset and ultimately achieves test accuracy equivalent to a model trained on the whole dataset. Unfortunately, the synthetic data generated by previous methods are not guaranteed to distribute and discriminate as well as the original training data, and they incur significant computational costs. Despite promising results, there still exists a significant performance gap between models trained on condensed synthetic sets and those trained on the whole dataset. In this paper, we address these challenges using efficient Dataset Distillation with Attention Matching (DataDAM), achieving state-of-the-art performance while reducing training costs. Specifically, we learn synthetic images by matching the spatial attention maps of real and synthetic data generated by different layers within a family of randomly initialized neural networks. Our method outperforms the prior methods on several datasets, including CIFAR10/100, TinyImageNet, ImageNet-1K, and subsets of ImageNet-1K across most of the settings, and achieves improvements of up to 6.5% and 4.1% on CIFAR100 and ImageNet-1K, respectively. We also show that our high-quality distilled images have practical benefits for downstream applications, such as continual learning and neural architecture search.

Related papers

Stylized Structural Patterns for Improved Neural Network Pre-training [1.8641315013048299]
Deep learning models in computer vision require large datasets of real images, which are difficult to curate and pose privacy and legal concerns.<n>Recent works suggest synthetic data as an alternative, yet models trained with it often underperform.<n>We propose an improved neural fractal formulation through which we introduce a new class of synthetic data.<n>Second, we propose reverse stylization, a technique that transfers visual features from a small, license-free set of real images onto synthetic datasets.
arXiv Detail & Related papers (2025-06-24T09:47:31Z)
Going Beyond Feature Similarity: Effective Dataset distillation based on Class-aware Conditional Mutual Information [43.44508080585033]
We introduce conditional mutual information (CMI) to assess the class-aware complexity of a dataset. We minimize the distillation loss while constraining the class-aware complexity of the synthetic dataset.
arXiv Detail & Related papers (2024-12-13T08:10:47Z)
Curriculum Dataset Distillation [22.938976109450877]
We present a curriculum-based dataset distillation framework designed to harmonize scalability with efficiency. This framework strategically distills synthetic images, adhering to a curriculum that transitions from simple to complex. Our framework sets new benchmarks in large-scale dataset distillation, achieving substantial improvements of 11.1% on Tiny-ImageNet, 9.0% on ImageNet-1K, and 7.3% on ImageNet-21K.
arXiv Detail & Related papers (2024-05-15T07:27:14Z)
ATOM: Attention Mixer for Efficient Dataset Distillation [17.370852204228253]
We propose a module to efficiently distill large datasets using a mixture of channel and spatial-wise attention. By integrating both types of attention, our ATOM module demonstrates superior performance across various computer vision datasets.
arXiv Detail & Related papers (2024-05-02T15:15:01Z)
Data Distillation Can Be Like Vodka: Distilling More Times For Better Quality [78.6359306550245]
We argue that using just one synthetic subset for distillation will not yield optimal generalization performance. PDD synthesizes multiple small sets of synthetic images, each conditioned on the previous sets, and trains the model on the cumulative union of these subsets. Our experiments show that PDD can effectively improve the performance of existing dataset distillation methods by up to 4.3%.
arXiv Detail & Related papers (2023-10-10T20:04:44Z)
Dataset Quantization [72.61936019738076]
We present dataset quantization (DQ), a new framework to compress large-scale datasets into small subsets. DQ is the first method that can successfully distill large-scale datasets such as ImageNet-1k with a state-of-the-art compression ratio.
arXiv Detail & Related papers (2023-08-21T07:24:29Z)
Bridging the Gap: Enhancing the Utility of Synthetic Data via Post-Processing Techniques [7.967995669387532]
generative models have emerged as a promising solution for generating synthetic datasets that can replace or augment real-world data. We propose three novel post-processing techniques to improve the quality and diversity of the synthetic dataset. Experiments show that Gap Filler (GaFi) effectively reduces the gap with real-accuracy scores to an error of 2.03%, 1.78%, and 3.99% on the Fashion-MNIST, CIFAR-10, and CIFAR-100 datasets, respectively.
arXiv Detail & Related papers (2023-05-17T10:50:38Z)
Generalizing Dataset Distillation via Deep Generative Prior [75.9031209877651]
We propose to distill an entire dataset's knowledge into a few synthetic images. The idea is to synthesize a small number of synthetic data points that, when given to a learning algorithm as training data, result in a model approximating one trained on the original data. We present a new optimization algorithm that distills a large number of images into a few intermediate feature vectors in the generative model's latent space.
arXiv Detail & Related papers (2023-05-02T17:59:31Z)
Minimizing the Accumulated Trajectory Error to Improve Dataset Distillation [151.70234052015948]
We propose a novel approach that encourages the optimization algorithm to seek a flat trajectory. We show that the weights trained on synthetic data are robust against the accumulated errors perturbations with the regularization towards the flat trajectory. Our method, called Flat Trajectory Distillation (FTD), is shown to boost the performance of gradient-matching methods by up to 4.7%.
arXiv Detail & Related papers (2022-11-20T15:49:11Z)
Dataset Distillation by Matching Training Trajectories [75.9031209877651]
We propose a new formulation that optimize our distilled data to guide networks to a similar state as those trained on real data. Given a network, we train it for several iterations on our distilled data and optimize the distilled data with respect to the distance between the synthetically trained parameters and the parameters trained on real data. Our method handily outperforms existing methods and also allows us to distill higher-resolution visual data.
arXiv Detail & Related papers (2022-03-22T17:58:59Z)
Dataset Distillation with Infinitely Wide Convolutional Networks [18.837952916998947]
We apply distributed kernel based meta-learning framework to achieve state-of-the-art results for dataset distillation. We obtain over 64% test accuracy on CIFAR-10 image classification task, a dramatic improvement over the previous best test accuracy of 40%. Our state-of-the-art results extend across many other settings for MNIST, Fashion-MNIST, CIFAR-10, CIFAR-100, and SVHN.
arXiv Detail & Related papers (2021-07-27T18:31:42Z)
Dataset Condensation with Gradient Matching [36.14340188365505]
We propose a training set synthesis technique for data-efficient learning, called dataset Condensation, that learns to condense large dataset into a small set of informative synthetic samples for training deep neural networks from scratch. We rigorously evaluate its performance in several computer vision benchmarks and demonstrate that it significantly outperforms the state-of-the-art methods.
arXiv Detail & Related papers (2020-06-10T16:30:52Z)
Omni-supervised Facial Expression Recognition via Distilled Data [120.11782405714234]
We propose omni-supervised learning to exploit reliable samples in a large amount of unlabeled data for network training. We experimentally verify that the new dataset can significantly improve the ability of the learned FER model. To tackle this, we propose to apply a dataset distillation strategy to compress the created dataset into several informative class-wise images.
arXiv Detail & Related papers (2020-05-18T09:36:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.