Distilled Pruning: Using Synthetic Data to Win the Lottery
- URL: http://arxiv.org/abs/2307.03364v3
- Date: Tue, 8 Aug 2023 22:32:24 GMT
- Title: Distilled Pruning: Using Synthetic Data to Win the Lottery
- Authors: Luke McDermott, Daniel Cummings
- Abstract summary: This work introduces a novel approach to pruning deep learning models by using distilled data.
Our approach can find sparse, trainableworks up to 5x faster than Iterative Magnitude Pruning at comparable sparsity on CIFAR-10.
The experimental results highlight the potential of using distilled data for resource-efficient neural network pruning, model compression, and neural architecture search.
- Score: 2.4366811507669124
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: This work introduces a novel approach to pruning deep learning models by
using distilled data. Unlike conventional strategies which primarily focus on
architectural or algorithmic optimization, our method reconsiders the role of
data in these scenarios. Distilled datasets capture essential patterns from
larger datasets, and we demonstrate how to leverage this capability to enable a
computationally efficient pruning process. Our approach can find sparse,
trainable subnetworks (a.k.a. Lottery Tickets) up to 5x faster than Iterative
Magnitude Pruning at comparable sparsity on CIFAR-10. The experimental results
highlight the potential of using distilled data for resource-efficient neural
network pruning, model compression, and neural architecture search.
Related papers
- Improved Distribution Matching for Dataset Condensation [91.55972945798531]
We propose a novel dataset condensation method based on distribution matching.
Our simple yet effective method outperforms most previous optimization-oriented methods with much fewer computational resources.
arXiv Detail & Related papers (2023-07-19T04:07:33Z) - Distill Gold from Massive Ores: Bi-level Data Pruning towards Efficient Dataset Distillation [96.92250565207017]
We study the data efficiency and selection for the dataset distillation task.
By re-formulating the dynamics of distillation, we provide insight into the inherent redundancy in the real dataset.
We find the most contributing samples based on their causal effects on the distillation.
arXiv Detail & Related papers (2023-05-28T06:53:41Z) - Generalizing Dataset Distillation via Deep Generative Prior [75.9031209877651]
We propose to distill an entire dataset's knowledge into a few synthetic images.
The idea is to synthesize a small number of synthetic data points that, when given to a learning algorithm as training data, result in a model approximating one trained on the original data.
We present a new optimization algorithm that distills a large number of images into a few intermediate feature vectors in the generative model's latent space.
arXiv Detail & Related papers (2023-05-02T17:59:31Z) - Minimizing the Accumulated Trajectory Error to Improve Dataset
Distillation [151.70234052015948]
We propose a novel approach that encourages the optimization algorithm to seek a flat trajectory.
We show that the weights trained on synthetic data are robust against the accumulated errors perturbations with the regularization towards the flat trajectory.
Our method, called Flat Trajectory Distillation (FTD), is shown to boost the performance of gradient-matching methods by up to 4.7%.
arXiv Detail & Related papers (2022-11-20T15:49:11Z) - Dataset Distillation using Neural Feature Regression [32.53291298089172]
We develop an algorithm for dataset distillation using neural Feature Regression with Pooling (FRePo)
FRePo achieves state-of-the-art performance with an order of magnitude less memory requirement and two orders of magnitude faster training than previous methods.
We show that high-quality distilled data can greatly improve various downstream applications, such as continual learning and membership inference defense.
arXiv Detail & Related papers (2022-06-01T19:02:06Z) - Learning to Generate Synthetic Training Data using Gradient Matching and
Implicit Differentiation [77.34726150561087]
This article explores various data distillation techniques that can reduce the amount of data required to successfully train deep networks.
Inspired by recent ideas, we suggest new data distillation techniques based on generative teaching networks, gradient matching, and the Implicit Function Theorem.
arXiv Detail & Related papers (2022-03-16T11:45:32Z) - Invariance Learning in Deep Neural Networks with Differentiable Laplace
Approximations [76.82124752950148]
We develop a convenient gradient-based method for selecting the data augmentation.
We use a differentiable Kronecker-factored Laplace approximation to the marginal likelihood as our objective.
arXiv Detail & Related papers (2022-02-22T02:51:11Z) - Dataset Distillation with Infinitely Wide Convolutional Networks [18.837952916998947]
We apply distributed kernel based meta-learning framework to achieve state-of-the-art results for dataset distillation.
We obtain over 64% test accuracy on CIFAR-10 image classification task, a dramatic improvement over the previous best test accuracy of 40%.
Our state-of-the-art results extend across many other settings for MNIST, Fashion-MNIST, CIFAR-10, CIFAR-100, and SVHN.
arXiv Detail & Related papers (2021-07-27T18:31:42Z) - Deep Structure Learning using Feature Extraction in Trained Projection
Space [0.0]
We introduce a network architecture using a self-adjusting and data dependent version of the Radon-transform (linear data projection), also known as x-ray projection, to enable feature extraction via convolutions in lower-dimensional space.
The resulting framework, named PiNet, can be trained end-to-end and shows promising performance on volumetric segmentation tasks.
arXiv Detail & Related papers (2020-09-01T12:16:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.