Related papers: Remember the Past: Distilling Datasets into Addressable Memories for Neural Networks

Remember the Past: Distilling Datasets into Addressable Memories for Neural Networks

URL: http://arxiv.org/abs/2206.02916v1
Date: Mon, 6 Jun 2022 21:32:26 GMT
Title: Remember the Past: Distilling Datasets into Addressable Memories for Neural Networks
Authors: Zhiwei Deng and Olga Russakovsky
Abstract summary: We propose an algorithm that compresses the critical information of a large dataset into compact addressable memories. These memories can then be recalled to quickly re-train a neural network and recover the performance. We demonstrate state-of-the-art results on the dataset distillation task across five benchmarks.
Score: 27.389093857615876
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We propose an algorithm that compresses the critical information of a large dataset into compact addressable memories. These memories can then be recalled to quickly re-train a neural network and recover the performance (instead of storing and re-training on the full original dataset). Building upon the dataset distillation framework, we make a key observation that a shared common representation allows for more efficient and effective distillation. Concretely, we learn a set of bases (aka "memories") which are shared between classes and combined through learned flexible addressing functions to generate a diverse set of training examples. This leads to several benefits: 1) the size of compressed data does not necessarily grow linearly with the number of classes; 2) an overall higher compression rate with more effective distillation is achieved; and 3) more generalized queries are allowed beyond recalling the original classes. We demonstrate state-of-the-art results on the dataset distillation task across five benchmarks, including up to 16.5% and 9.7% in retained accuracy improvement when distilling CIFAR10 and CIFAR100 respectively. We then leverage our framework to perform continual learning, achieving state-of-the-art results on four benchmarks, with 23.2% accuracy improvement on MANY.

Related papers

FADRM: Fast and Accurate Data Residual Matching for Dataset Distillation [21.910537847630067]
Residual connection has been extensively studied and widely applied at the model architecture level.<n>We introduce the concept of Data Residual Matching for the first time, leveraging data-level skip connections to facilitate data generation and mitigate data information vanishing.
arXiv Detail & Related papers (2025-06-30T17:59:34Z)
The Quest for Efficient Reasoning: A Data-Centric Benchmark to CoT Distillation [37.38634940034755]
This paper introduces DC-CoT, the first data-centric benchmark that investigates data manipulation in chain-of-thought (CoT) distillation.<n>We rigorously evaluate the impact of these data manipulations on student model performance across multiple reasoning datasets.
arXiv Detail & Related papers (2025-05-24T15:54:19Z)
Data Distillation Can Be Like Vodka: Distilling More Times For Better Quality [78.6359306550245]
We argue that using just one synthetic subset for distillation will not yield optimal generalization performance. PDD synthesizes multiple small sets of synthetic images, each conditioned on the previous sets, and trains the model on the cumulative union of these subsets. Our experiments show that PDD can effectively improve the performance of existing dataset distillation methods by up to 4.3%.
arXiv Detail & Related papers (2023-10-10T20:04:44Z)
Distill Gold from Massive Ores: Bi-level Data Pruning towards Efficient Dataset Distillation [96.92250565207017]
We study the data efficiency and selection for the dataset distillation task. By re-formulating the dynamics of distillation, we provide insight into the inherent redundancy in the real dataset. We find the most contributing samples based on their causal effects on the distillation.
arXiv Detail & Related papers (2023-05-28T06:53:41Z)
Peeling the Onion: Hierarchical Reduction of Data Redundancy for Efficient Vision Transformer Training [110.79400526706081]
Vision transformers (ViTs) have recently obtained success in many applications, but their intensive computation and heavy memory usage limit their generalization. Previous compression algorithms usually start from the pre-trained dense models and only focus on efficient inference. This paper proposes an end-to-end efficient training framework from three sparse perspectives, dubbed Tri-Level E-ViT.
arXiv Detail & Related papers (2022-11-19T21:15:47Z)
Data-Efficient Augmentation for Training Neural Networks [15.870155099135538]
We propose a rigorous technique to select subsets of data points that when augmented, closely capture the training dynamics of full data augmentation. Our method achieves 6.3x speedup on CIFAR10 and 2.2x speedup on SVHN, and outperforms the baselines by up to 10% across various subset sizes.
arXiv Detail & Related papers (2022-10-15T19:32:20Z)
Dataset Distillation with Infinitely Wide Convolutional Networks [18.837952916998947]
We apply distributed kernel based meta-learning framework to achieve state-of-the-art results for dataset distillation. We obtain over 64% test accuracy on CIFAR-10 image classification task, a dramatic improvement over the previous best test accuracy of 40%. Our state-of-the-art results extend across many other settings for MNIST, Fashion-MNIST, CIFAR-10, CIFAR-100, and SVHN.
arXiv Detail & Related papers (2021-07-27T18:31:42Z)
ERNIE-Tiny : A Progressive Distillation Framework for Pretrained Transformer Compression [20.23732233214849]
We propose a four-stage progressive distillation framework ERNIE-Tiny to compress pretrained language models (PLMs) Experiments show that a 4-layer ERNIE-Tiny maintains over 98.0%performance of its 12-layer teacher BERT base on GLUE benchmark. ERNIE-Tiny achieves a new compression SOTA on five Chinese NLP tasks, outperforming BERT base by 0.4% accuracy with 7.5x fewer parameters and9.4x faster inference speed.
arXiv Detail & Related papers (2021-06-04T04:00:16Z)
Distilling Dense Representations for Ranking using Tightly-Coupled Teachers [52.85472936277762]
We apply knowledge distillation to improve the recently proposed late-interaction ColBERT model. We distill the knowledge from ColBERT's expressive MaxSim operator for computing relevance scores into a simple dot product. We empirically show that our approach improves query latency and greatly reduces the onerous storage requirements of ColBERT.
arXiv Detail & Related papers (2020-10-22T02:26:01Z)
Compression-aware Continual Learning using Singular Value Decomposition [2.4283778735260686]
We propose a compression based continual task learning method that can dynamically grow a neural network. Inspired by the recent model compression techniques, we employ compression-aware training and perform low-rank weight approximations. Our method achieves compressed representations with minimal performance degradation without the need for costly fine-tuning.
arXiv Detail & Related papers (2020-09-03T23:29:50Z)
Extracurricular Learning: Knowledge Transfer Beyond Empirical Distribution [17.996541285382463]
We propose extracurricular learning to bridge the gap between a compressed student model and its teacher. We conduct rigorous evaluations on regression and classification tasks and show that compared to the standard knowledge distillation, extracurricular learning reduces the gap by 46% to 68%. This leads to major accuracy improvements compared to the empirical risk minimization-based training for various recent neural network architectures.
arXiv Detail & Related papers (2020-06-30T18:21:21Z)
Omni-supervised Facial Expression Recognition via Distilled Data [120.11782405714234]
We propose omni-supervised learning to exploit reliable samples in a large amount of unlabeled data for network training. We experimentally verify that the new dataset can significantly improve the ability of the learned FER model. To tackle this, we propose to apply a dataset distillation strategy to compress the created dataset into several informative class-wise images.
arXiv Detail & Related papers (2020-05-18T09:36:51Z)
A Simple Framework for Contrastive Learning of Visual Representations [116.37752766922407]
This paper presents SimCLR: a simple framework for contrastive learning of visual representations. We show that composition of data augmentations plays a critical role in defining effective predictive tasks. We are able to considerably outperform previous methods for self-supervised and semi-supervised learning on ImageNet.
arXiv Detail & Related papers (2020-02-13T18:50:45Z)

This list is automatically generated from the titles and abstracts of the papers in this site.