Remember the Past: Distilling Datasets into Addressable Memories for
Neural Networks
- URL: http://arxiv.org/abs/2206.02916v1
- Date: Mon, 6 Jun 2022 21:32:26 GMT
- Title: Remember the Past: Distilling Datasets into Addressable Memories for
Neural Networks
- Authors: Zhiwei Deng and Olga Russakovsky
- Abstract summary: We propose an algorithm that compresses the critical information of a large dataset into compact addressable memories.
These memories can then be recalled to quickly re-train a neural network and recover the performance.
We demonstrate state-of-the-art results on the dataset distillation task across five benchmarks.
- Score: 27.389093857615876
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose an algorithm that compresses the critical information of a large
dataset into compact addressable memories. These memories can then be recalled
to quickly re-train a neural network and recover the performance (instead of
storing and re-training on the full original dataset).
Building upon the dataset distillation framework, we make a key observation
that a shared common representation allows for more efficient and effective
distillation. Concretely, we learn a set of bases (aka "memories") which are
shared between classes and combined through learned flexible addressing
functions to generate a diverse set of training examples. This leads to several
benefits: 1) the size of compressed data does not necessarily grow linearly
with the number of classes; 2) an overall higher compression rate with more
effective distillation is achieved; and 3) more generalized queries are allowed
beyond recalling the original classes.
We demonstrate state-of-the-art results on the dataset distillation task
across five benchmarks, including up to 16.5% and 9.7% in retained accuracy
improvement when distilling CIFAR10 and CIFAR100 respectively. We then leverage
our framework to perform continual learning, achieving state-of-the-art results
on four benchmarks, with 23.2% accuracy improvement on MANY.
Related papers
- Data Distillation Can Be Like Vodka: Distilling More Times For Better
Quality [78.6359306550245]
We argue that using just one synthetic subset for distillation will not yield optimal generalization performance.
PDD synthesizes multiple small sets of synthetic images, each conditioned on the previous sets, and trains the model on the cumulative union of these subsets.
Our experiments show that PDD can effectively improve the performance of existing dataset distillation methods by up to 4.3%.
arXiv Detail & Related papers (2023-10-10T20:04:44Z) - Distill Gold from Massive Ores: Bi-level Data Pruning towards Efficient Dataset Distillation [96.92250565207017]
We study the data efficiency and selection for the dataset distillation task.
By re-formulating the dynamics of distillation, we provide insight into the inherent redundancy in the real dataset.
We find the most contributing samples based on their causal effects on the distillation.
arXiv Detail & Related papers (2023-05-28T06:53:41Z) - Peeling the Onion: Hierarchical Reduction of Data Redundancy for
Efficient Vision Transformer Training [110.79400526706081]
Vision transformers (ViTs) have recently obtained success in many applications, but their intensive computation and heavy memory usage limit their generalization.
Previous compression algorithms usually start from the pre-trained dense models and only focus on efficient inference.
This paper proposes an end-to-end efficient training framework from three sparse perspectives, dubbed Tri-Level E-ViT.
arXiv Detail & Related papers (2022-11-19T21:15:47Z) - Data-Efficient Augmentation for Training Neural Networks [15.870155099135538]
We propose a rigorous technique to select subsets of data points that when augmented, closely capture the training dynamics of full data augmentation.
Our method achieves 6.3x speedup on CIFAR10 and 2.2x speedup on SVHN, and outperforms the baselines by up to 10% across various subset sizes.
arXiv Detail & Related papers (2022-10-15T19:32:20Z) - Dataset Distillation with Infinitely Wide Convolutional Networks [18.837952916998947]
We apply distributed kernel based meta-learning framework to achieve state-of-the-art results for dataset distillation.
We obtain over 64% test accuracy on CIFAR-10 image classification task, a dramatic improvement over the previous best test accuracy of 40%.
Our state-of-the-art results extend across many other settings for MNIST, Fashion-MNIST, CIFAR-10, CIFAR-100, and SVHN.
arXiv Detail & Related papers (2021-07-27T18:31:42Z) - ERNIE-Tiny : A Progressive Distillation Framework for Pretrained
Transformer Compression [20.23732233214849]
We propose a four-stage progressive distillation framework ERNIE-Tiny to compress pretrained language models (PLMs)
Experiments show that a 4-layer ERNIE-Tiny maintains over 98.0%performance of its 12-layer teacher BERT base on GLUE benchmark.
ERNIE-Tiny achieves a new compression SOTA on five Chinese NLP tasks, outperforming BERT base by 0.4% accuracy with 7.5x fewer parameters and9.4x faster inference speed.
arXiv Detail & Related papers (2021-06-04T04:00:16Z) - Distilling Dense Representations for Ranking using Tightly-Coupled
Teachers [52.85472936277762]
We apply knowledge distillation to improve the recently proposed late-interaction ColBERT model.
We distill the knowledge from ColBERT's expressive MaxSim operator for computing relevance scores into a simple dot product.
We empirically show that our approach improves query latency and greatly reduces the onerous storage requirements of ColBERT.
arXiv Detail & Related papers (2020-10-22T02:26:01Z) - Compression-aware Continual Learning using Singular Value Decomposition [2.4283778735260686]
We propose a compression based continual task learning method that can dynamically grow a neural network.
Inspired by the recent model compression techniques, we employ compression-aware training and perform low-rank weight approximations.
Our method achieves compressed representations with minimal performance degradation without the need for costly fine-tuning.
arXiv Detail & Related papers (2020-09-03T23:29:50Z) - Extracurricular Learning: Knowledge Transfer Beyond Empirical
Distribution [17.996541285382463]
We propose extracurricular learning to bridge the gap between a compressed student model and its teacher.
We conduct rigorous evaluations on regression and classification tasks and show that compared to the standard knowledge distillation, extracurricular learning reduces the gap by 46% to 68%.
This leads to major accuracy improvements compared to the empirical risk minimization-based training for various recent neural network architectures.
arXiv Detail & Related papers (2020-06-30T18:21:21Z) - Omni-supervised Facial Expression Recognition via Distilled Data [120.11782405714234]
We propose omni-supervised learning to exploit reliable samples in a large amount of unlabeled data for network training.
We experimentally verify that the new dataset can significantly improve the ability of the learned FER model.
To tackle this, we propose to apply a dataset distillation strategy to compress the created dataset into several informative class-wise images.
arXiv Detail & Related papers (2020-05-18T09:36:51Z) - A Simple Framework for Contrastive Learning of Visual Representations [116.37752766922407]
This paper presents SimCLR: a simple framework for contrastive learning of visual representations.
We show that composition of data augmentations plays a critical role in defining effective predictive tasks.
We are able to considerably outperform previous methods for self-supervised and semi-supervised learning on ImageNet.
arXiv Detail & Related papers (2020-02-13T18:50:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.