AST: Effective Dataset Distillation through Alignment with Smooth and
High-Quality Expert Trajectories
- URL: http://arxiv.org/abs/2310.10541v2
- Date: Mon, 27 Nov 2023 16:45:18 GMT
- Title: AST: Effective Dataset Distillation through Alignment with Smooth and
High-Quality Expert Trajectories
- Authors: Jiyuan Shen, Wenzhuo Yang, Kwok-Yan Lam
- Abstract summary: We propose an effective DD framework named AST, standing for Alignment with Smooth and high-quality expert Trajectories.
We conduct extensive experiments on datasets of different scales, sizes, and resolutions.
- Score: 18.266786462036553
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Training large AI models typically requires large-scale datasets in the
machine learning process, making training and parameter-tuning process both
time-consuming and costly. Some researchers address this problem by carefully
synthesizing a very small number of highly representative and informative
samples from real-world datasets. This approach, known as Dataset Distillation
(DD), proposes a perspective for data-efficient learning. Despite recent
progress in this field, the performance of existing methods still cannot meet
expectations, and distilled datasets cannot effectively replace original
datasets. In this paper, unlike previous methods that focus solely on improving
the effectiveness of student distillation, we recognize and leverage the
important mutual influence between expert and student models. We observed that
the smoothness of expert trajectories has a significant impact on subsequent
student parameter alignment. Based on this, we propose an effective DD
framework named AST, standing for Alignment with Smooth and high-quality expert
Trajectories. We devise the integration of clipping loss and gradient penalty
to regulate the rate of parameter changes in expert trajectory generation. To
further refine the student parameter alignment with expert trajectory, we put
forward representative initialization for the synthetic dataset and balanced
inner-loop loss in response to the sensitivity exhibited towards randomly
initialized variables during distillation. We also propose two enhancement
strategies, namely intermediate matching loss and weight perturbation, to
mitigate the potential occurrence of cumulative errors. We conduct extensive
experiments on datasets of different scales, sizes, and resolutions. The
results demonstrate that the proposed method significantly outperforms prior
methods.
Related papers
- Heterogeneous Learning Rate Scheduling for Neural Architecture Search on Long-Tailed Datasets [0.0]
We propose a novel adaptive learning rate scheduling strategy tailored for the architecture parameters of DARTS.
Our approach dynamically adjusts the learning rate of the architecture parameters based on the training epoch, preventing the disruption of well-trained representations.
arXiv Detail & Related papers (2024-06-11T07:32:25Z) - DiffsFormer: A Diffusion Transformer on Stock Factor Augmentation [36.75453713794983]
We introduce the Diffusion Model to generate stock factors with Transformer architecture (DiffsFormer)
When presented with a specific downstream task, we employ DiffsFormer to augment the training procedure by editing existing samples.
The proposed method achieves relative improvements of 7.2% and 27.8% in annualized return ratio for the respective datasets.
arXiv Detail & Related papers (2024-02-05T03:54:36Z) - Importance-Aware Adaptive Dataset Distillation [53.79746115426363]
Development of deep learning models is enabled by the availability of large-scale datasets.
dataset distillation aims to synthesize a compact dataset that retains the essential information from the large original dataset.
We propose an importance-aware adaptive dataset distillation (IADD) method that can improve distillation performance.
arXiv Detail & Related papers (2024-01-29T03:29:39Z) - End-to-End Meta-Bayesian Optimisation with Transformer Neural Processes [52.818579746354665]
This paper proposes the first end-to-end differentiable meta-BO framework that generalises neural processes to learn acquisition functions via transformer architectures.
We enable this end-to-end framework with reinforcement learning (RL) to tackle the lack of labelled acquisition data.
arXiv Detail & Related papers (2023-05-25T10:58:46Z) - Implicit Counterfactual Data Augmentation for Robust Learning [24.795542869249154]
This study proposes an Implicit Counterfactual Data Augmentation method to remove spurious correlations and make stable predictions.
Experiments have been conducted across various biased learning scenarios covering both image and text datasets.
arXiv Detail & Related papers (2023-04-26T10:36:40Z) - Minimizing the Accumulated Trajectory Error to Improve Dataset
Distillation [151.70234052015948]
We propose a novel approach that encourages the optimization algorithm to seek a flat trajectory.
We show that the weights trained on synthetic data are robust against the accumulated errors perturbations with the regularization towards the flat trajectory.
Our method, called Flat Trajectory Distillation (FTD), is shown to boost the performance of gradient-matching methods by up to 4.7%.
arXiv Detail & Related papers (2022-11-20T15:49:11Z) - Scale-Equivalent Distillation for Semi-Supervised Object Detection [57.59525453301374]
Recent Semi-Supervised Object Detection (SS-OD) methods are mainly based on self-training, generating hard pseudo-labels by a teacher model on unlabeled data as supervisory signals.
We analyze the challenges these methods meet with the empirical experiment results.
We introduce a novel approach, Scale-Equivalent Distillation (SED), which is a simple yet effective end-to-end knowledge distillation framework robust to large object size variance and class imbalance.
arXiv Detail & Related papers (2022-03-23T07:33:37Z) - Extrapolation for Large-batch Training in Deep Learning [72.61259487233214]
We show that a host of variations can be covered in a unified framework that we propose.
We prove the convergence of this novel scheme and rigorously evaluate its empirical performance on ResNet, LSTM, and Transformer.
arXiv Detail & Related papers (2020-06-10T08:22:41Z) - Omni-supervised Facial Expression Recognition via Distilled Data [120.11782405714234]
We propose omni-supervised learning to exploit reliable samples in a large amount of unlabeled data for network training.
We experimentally verify that the new dataset can significantly improve the ability of the learned FER model.
To tackle this, we propose to apply a dataset distillation strategy to compress the created dataset into several informative class-wise images.
arXiv Detail & Related papers (2020-05-18T09:36:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.