Minimizing the Accumulated Trajectory Error to Improve Dataset
Distillation
- URL: http://arxiv.org/abs/2211.11004v3
- Date: Sun, 26 Mar 2023 03:57:25 GMT
- Title: Minimizing the Accumulated Trajectory Error to Improve Dataset
Distillation
- Authors: Jiawei Du, Yidi Jiang, Vincent Y. F. Tan, Joey Tianyi Zhou, Haizhou Li
- Abstract summary: We propose a novel approach that encourages the optimization algorithm to seek a flat trajectory.
We show that the weights trained on synthetic data are robust against the accumulated errors perturbations with the regularization towards the flat trajectory.
Our method, called Flat Trajectory Distillation (FTD), is shown to boost the performance of gradient-matching methods by up to 4.7%.
- Score: 151.70234052015948
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Model-based deep learning has achieved astounding successes due in part to
the availability of large-scale real-world data. However, processing such
massive amounts of data comes at a considerable cost in terms of computations,
storage, training and the search for good neural architectures. Dataset
distillation has thus recently come to the fore. This paradigm involves
distilling information from large real-world datasets into tiny and compact
synthetic datasets such that processing the latter ideally yields similar
performances as the former. State-of-the-art methods primarily rely on learning
the synthetic dataset by matching the gradients obtained during training
between the real and synthetic data. However, these gradient-matching methods
suffer from the so-called accumulated trajectory error caused by the
discrepancy between the distillation and subsequent evaluation. To mitigate the
adverse impact of this accumulated trajectory error, we propose a novel
approach that encourages the optimization algorithm to seek a flat trajectory.
We show that the weights trained on synthetic data are robust against the
accumulated errors perturbations with the regularization towards the flat
trajectory. Our method, called Flat Trajectory Distillation (FTD), is shown to
boost the performance of gradient-matching methods by up to 4.7% on a subset of
images of the ImageNet dataset with higher resolution images. We also validate
the effectiveness and generalizability of our method with datasets of different
resolutions and demonstrate its applicability to neural architecture search.
Code is available at https://github.com/AngusDujw/FTD-distillation.
Related papers
- Hierarchical Features Matter: A Deep Exploration of GAN Priors for Improved Dataset Distillation [51.44054828384487]
We propose a novel parameterization method dubbed Hierarchical Generative Latent Distillation (H-GLaD)
This method systematically explores hierarchical layers within the generative adversarial networks (GANs)
In addition, we introduce a novel class-relevant feature distance metric to alleviate the computational burden associated with synthetic dataset evaluation.
arXiv Detail & Related papers (2024-06-09T09:15:54Z) - Importance-Aware Adaptive Dataset Distillation [53.79746115426363]
Development of deep learning models is enabled by the availability of large-scale datasets.
dataset distillation aims to synthesize a compact dataset that retains the essential information from the large original dataset.
We propose an importance-aware adaptive dataset distillation (IADD) method that can improve distillation performance.
arXiv Detail & Related papers (2024-01-29T03:29:39Z) - AST: Effective Dataset Distillation through Alignment with Smooth and
High-Quality Expert Trajectories [18.266786462036553]
We propose an effective DD framework named AST, standing for Alignment with Smooth and high-quality expert Trajectories.
We conduct extensive experiments on datasets of different scales, sizes, and resolutions.
arXiv Detail & Related papers (2023-10-16T16:13:53Z) - Generalizing Dataset Distillation via Deep Generative Prior [75.9031209877651]
We propose to distill an entire dataset's knowledge into a few synthetic images.
The idea is to synthesize a small number of synthetic data points that, when given to a learning algorithm as training data, result in a model approximating one trained on the original data.
We present a new optimization algorithm that distills a large number of images into a few intermediate feature vectors in the generative model's latent space.
arXiv Detail & Related papers (2023-05-02T17:59:31Z) - Dataset Distillation using Neural Feature Regression [32.53291298089172]
We develop an algorithm for dataset distillation using neural Feature Regression with Pooling (FRePo)
FRePo achieves state-of-the-art performance with an order of magnitude less memory requirement and two orders of magnitude faster training than previous methods.
We show that high-quality distilled data can greatly improve various downstream applications, such as continual learning and membership inference defense.
arXiv Detail & Related papers (2022-06-01T19:02:06Z) - Dataset Distillation by Matching Training Trajectories [75.9031209877651]
We propose a new formulation that optimize our distilled data to guide networks to a similar state as those trained on real data.
Given a network, we train it for several iterations on our distilled data and optimize the distilled data with respect to the distance between the synthetically trained parameters and the parameters trained on real data.
Our method handily outperforms existing methods and also allows us to distill higher-resolution visual data.
arXiv Detail & Related papers (2022-03-22T17:58:59Z) - Learning to Generate Synthetic Training Data using Gradient Matching and
Implicit Differentiation [77.34726150561087]
This article explores various data distillation techniques that can reduce the amount of data required to successfully train deep networks.
Inspired by recent ideas, we suggest new data distillation techniques based on generative teaching networks, gradient matching, and the Implicit Function Theorem.
arXiv Detail & Related papers (2022-03-16T11:45:32Z) - Dataset Condensation with Gradient Matching [36.14340188365505]
We propose a training set synthesis technique for data-efficient learning, called dataset Condensation, that learns to condense large dataset into a small set of informative synthetic samples for training deep neural networks from scratch.
We rigorously evaluate its performance in several computer vision benchmarks and demonstrate that it significantly outperforms the state-of-the-art methods.
arXiv Detail & Related papers (2020-06-10T16:30:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.