Dataset Condensation with Gradient Matching
- URL: http://arxiv.org/abs/2006.05929v3
- Date: Mon, 8 Mar 2021 13:31:22 GMT
- Title: Dataset Condensation with Gradient Matching
- Authors: Bo Zhao, Konda Reddy Mopuri, Hakan Bilen
- Abstract summary: We propose a training set synthesis technique for data-efficient learning, called dataset Condensation, that learns to condense large dataset into a small set of informative synthetic samples for training deep neural networks from scratch.
We rigorously evaluate its performance in several computer vision benchmarks and demonstrate that it significantly outperforms the state-of-the-art methods.
- Score: 36.14340188365505
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: As the state-of-the-art machine learning methods in many fields rely on
larger datasets, storing datasets and training models on them become
significantly more expensive. This paper proposes a training set synthesis
technique for data-efficient learning, called Dataset Condensation, that learns
to condense large dataset into a small set of informative synthetic samples for
training deep neural networks from scratch. We formulate this goal as a
gradient matching problem between the gradients of deep neural network weights
that are trained on the original and our synthetic data. We rigorously evaluate
its performance in several computer vision benchmarks and demonstrate that it
significantly outperforms the state-of-the-art methods. Finally we explore the
use of our method in continual learning and neural architecture search and
report promising gains when limited memory and computations are available.
Related papers
- Improving Object Detector Training on Synthetic Data by Starting With a Strong Baseline Methodology [0.14980193397844666]
We propose a methodology for improving the performance of a pre-trained object detector when training on synthetic data.
Our approach focuses on extracting the salient information from synthetic data without forgetting useful features learned from pre-training on real images.
arXiv Detail & Related papers (2024-05-30T08:31:01Z) - Koopcon: A new approach towards smarter and less complex learning [13.053285552524052]
In the era of big data, the sheer volume and complexity of datasets pose significant challenges in machine learning.
This paper introduces an innovative Autoencoder-based dataset condensation model backed by Koopman operator theory.
Inspired by the predictive coding mechanisms of the human brain, our model leverages a novel approach to encode and reconstruct data.
arXiv Detail & Related papers (2024-05-22T17:47:14Z) - Data Augmentations in Deep Weight Spaces [89.45272760013928]
We introduce a novel augmentation scheme based on the Mixup method.
We evaluate the performance of these techniques on existing benchmarks as well as new benchmarks we generate.
arXiv Detail & Related papers (2023-11-15T10:43:13Z) - Dataset Quantization [72.61936019738076]
We present dataset quantization (DQ), a new framework to compress large-scale datasets into small subsets.
DQ is the first method that can successfully distill large-scale datasets such as ImageNet-1k with a state-of-the-art compression ratio.
arXiv Detail & Related papers (2023-08-21T07:24:29Z) - Generalizing Dataset Distillation via Deep Generative Prior [75.9031209877651]
We propose to distill an entire dataset's knowledge into a few synthetic images.
The idea is to synthesize a small number of synthetic data points that, when given to a learning algorithm as training data, result in a model approximating one trained on the original data.
We present a new optimization algorithm that distills a large number of images into a few intermediate feature vectors in the generative model's latent space.
arXiv Detail & Related papers (2023-05-02T17:59:31Z) - Minimizing the Accumulated Trajectory Error to Improve Dataset
Distillation [151.70234052015948]
We propose a novel approach that encourages the optimization algorithm to seek a flat trajectory.
We show that the weights trained on synthetic data are robust against the accumulated errors perturbations with the regularization towards the flat trajectory.
Our method, called Flat Trajectory Distillation (FTD), is shown to boost the performance of gradient-matching methods by up to 4.7%.
arXiv Detail & Related papers (2022-11-20T15:49:11Z) - Dataset Distillation by Matching Training Trajectories [75.9031209877651]
We propose a new formulation that optimize our distilled data to guide networks to a similar state as those trained on real data.
Given a network, we train it for several iterations on our distilled data and optimize the distilled data with respect to the distance between the synthetically trained parameters and the parameters trained on real data.
Our method handily outperforms existing methods and also allows us to distill higher-resolution visual data.
arXiv Detail & Related papers (2022-03-22T17:58:59Z) - Dataset Condensation with Distribution Matching [30.571335208276246]
dataset condensation aims to replace the original large training set with a significantly smaller learned synthetic set.
We propose a simple yet effective dataset condensation technique that requires significantly lower training cost.
Thanks to its efficiency, we apply our method to more realistic and larger datasets with sophisticated neural architectures.
arXiv Detail & Related papers (2021-10-08T15:02:30Z) - Learning to Segment Human Body Parts with Synthetically Trained Deep
Convolutional Networks [58.0240970093372]
This paper presents a new framework for human body part segmentation based on Deep Convolutional Neural Networks trained using only synthetic data.
The proposed approach achieves cutting-edge results without the need of training the models with real annotated data of human body parts.
arXiv Detail & Related papers (2021-02-02T12:26:50Z) - Dataset Meta-Learning from Kernel Ridge-Regression [18.253682891579402]
Kernel Inducing Points (KIP) can compress datasets by one or two orders of magnitude.
KIP-learned datasets are transferable to the training of finite-width neural networks even beyond the lazy-training regime.
arXiv Detail & Related papers (2020-10-30T18:54:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.