Dataset Distillation with Convexified Implicit Gradients
- URL: http://arxiv.org/abs/2302.06755v2
- Date: Thu, 9 Nov 2023 22:26:32 GMT
- Title: Dataset Distillation with Convexified Implicit Gradients
- Authors: Noel Loo, Ramin Hasani, Mathias Lechner, Daniela Rus
- Abstract summary: We show how implicit gradients can be effectively used to compute meta-gradient updates.
We further equip the algorithm with a convexified approximation that corresponds to learning on top of a frozen finite-width neural kernel.
- Score: 69.16247946639233
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: We propose a new dataset distillation algorithm using reparameterization and
convexification of implicit gradients (RCIG), that substantially improves the
state-of-the-art. To this end, we first formulate dataset distillation as a
bi-level optimization problem. Then, we show how implicit gradients can be
effectively used to compute meta-gradient updates. We further equip the
algorithm with a convexified approximation that corresponds to learning on top
of a frozen finite-width neural tangent kernel. Finally, we improve bias in
implicit gradients by parameterizing the neural network to enable analytical
computation of final-layer parameters given the body parameters. RCIG
establishes the new state-of-the-art on a diverse series of dataset
distillation tasks. Notably, with one image per class, on resized ImageNet,
RCIG sees on average a 108\% improvement over the previous state-of-the-art
distillation algorithm. Similarly, we observed a 66\% gain over SOTA on
Tiny-ImageNet and 37\% on CIFAR-100.
Related papers
- Modified Step Size for Enhanced Stochastic Gradient Descent: Convergence
and Experiments [0.0]
This paper introduces a novel approach to the performance of the gradient descent (SGD) algorithm by incorporating a modified decay step size based on $frac1sqrttt.
The proposed step size integrates a logarithmic step term, leading to the selection of smaller values in the final iteration.
To the effectiveness of our approach, we conducted numerical experiments on image classification tasks using the FashionMNIST, andARAR datasets.
arXiv Detail & Related papers (2023-09-03T19:21:59Z) - Generalizing Dataset Distillation via Deep Generative Prior [75.9031209877651]
We propose to distill an entire dataset's knowledge into a few synthetic images.
The idea is to synthesize a small number of synthetic data points that, when given to a learning algorithm as training data, result in a model approximating one trained on the original data.
We present a new optimization algorithm that distills a large number of images into a few intermediate feature vectors in the generative model's latent space.
arXiv Detail & Related papers (2023-05-02T17:59:31Z) - Minimizing the Accumulated Trajectory Error to Improve Dataset
Distillation [151.70234052015948]
We propose a novel approach that encourages the optimization algorithm to seek a flat trajectory.
We show that the weights trained on synthetic data are robust against the accumulated errors perturbations with the regularization towards the flat trajectory.
Our method, called Flat Trajectory Distillation (FTD), is shown to boost the performance of gradient-matching methods by up to 4.7%.
arXiv Detail & Related papers (2022-11-20T15:49:11Z) - Efficient Dataset Distillation Using Random Feature Approximation [109.07737733329019]
We propose a novel algorithm that uses a random feature approximation (RFA) of the Neural Network Gaussian Process (NNGP) kernel.
Our algorithm provides at least a 100-fold speedup over KIP and can run on a single GPU.
Our new method, termed an RFA Distillation (RFAD), performs competitively with KIP and other dataset condensation algorithms in accuracy over a range of large-scale datasets.
arXiv Detail & Related papers (2022-10-21T15:56:13Z) - Dataset Distillation using Neural Feature Regression [32.53291298089172]
We develop an algorithm for dataset distillation using neural Feature Regression with Pooling (FRePo)
FRePo achieves state-of-the-art performance with an order of magnitude less memory requirement and two orders of magnitude faster training than previous methods.
We show that high-quality distilled data can greatly improve various downstream applications, such as continual learning and membership inference defense.
arXiv Detail & Related papers (2022-06-01T19:02:06Z) - GradViT: Gradient Inversion of Vision Transformers [83.54779732309653]
We demonstrate the vulnerability of vision transformers (ViTs) to gradient-based inversion attacks.
We introduce a method, named GradViT, that optimize random noise into naturally looking images.
We observe unprecedentedly high fidelity and closeness to the original (hidden) data.
arXiv Detail & Related papers (2022-03-22T17:06:07Z) - Edge Tracing using Gaussian Process Regression [0.0]
We introduce a novel edge tracing algorithm using Gaussian process regression.
Our approach has the ability to efficiently trace edges in image sequences.
Various applications to medical imaging and satellite imaging are used to validate the technique.
arXiv Detail & Related papers (2021-11-05T16:43:14Z) - Exploiting Adam-like Optimization Algorithms to Improve the Performance
of Convolutional Neural Networks [82.61182037130405]
gradient descent (SGD) is the main approach for training deep networks.
In this work, we compare Adam based variants based on the difference between the present and the past gradients.
We have tested ensemble of networks and the fusion with ResNet50 trained with gradient descent.
arXiv Detail & Related papers (2021-03-26T18:55:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.