AST: Effective Dataset Distillation through Alignment with Smooth and
High-Quality Expert Trajectories
- URL: http://arxiv.org/abs/2310.10541v2
- Date: Mon, 27 Nov 2023 16:45:18 GMT
- Title: AST: Effective Dataset Distillation through Alignment with Smooth and
High-Quality Expert Trajectories
- Authors: Jiyuan Shen, Wenzhuo Yang, Kwok-Yan Lam
- Abstract summary: We propose an effective DD framework named AST, standing for Alignment with Smooth and high-quality expert Trajectories.
We conduct extensive experiments on datasets of different scales, sizes, and resolutions.
- Score: 18.266786462036553
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Training large AI models typically requires large-scale datasets in the
machine learning process, making training and parameter-tuning process both
time-consuming and costly. Some researchers address this problem by carefully
synthesizing a very small number of highly representative and informative
samples from real-world datasets. This approach, known as Dataset Distillation
(DD), proposes a perspective for data-efficient learning. Despite recent
progress in this field, the performance of existing methods still cannot meet
expectations, and distilled datasets cannot effectively replace original
datasets. In this paper, unlike previous methods that focus solely on improving
the effectiveness of student distillation, we recognize and leverage the
important mutual influence between expert and student models. We observed that
the smoothness of expert trajectories has a significant impact on subsequent
student parameter alignment. Based on this, we propose an effective DD
framework named AST, standing for Alignment with Smooth and high-quality expert
Trajectories. We devise the integration of clipping loss and gradient penalty
to regulate the rate of parameter changes in expert trajectory generation. To
further refine the student parameter alignment with expert trajectory, we put
forward representative initialization for the synthetic dataset and balanced
inner-loop loss in response to the sensitivity exhibited towards randomly
initialized variables during distillation. We also propose two enhancement
strategies, namely intermediate matching loss and weight perturbation, to
mitigate the potential occurrence of cumulative errors. We conduct extensive
experiments on datasets of different scales, sizes, and resolutions. The
results demonstrate that the proposed method significantly outperforms prior
methods.
Related papers
- Dynamic Loss-Based Sample Reweighting for Improved Large Language Model Pretraining [55.262510814326035]
Existing reweighting strategies primarily focus on group-level data importance.
We introduce novel algorithms for dynamic, instance-level data reweighting.
Our framework allows us to devise reweighting strategies deprioritizing redundant or uninformative data.
arXiv Detail & Related papers (2025-02-10T17:57:15Z) - Capturing the Temporal Dependence of Training Data Influence [100.91355498124527]
We formalize the concept of trajectory-specific leave-one-out influence, which quantifies the impact of removing a data point during training.
We propose data value embedding, a novel technique enabling efficient approximation of trajectory-specific LOO.
As data value embedding captures training data ordering, it offers valuable insights into model training dynamics.
arXiv Detail & Related papers (2024-12-12T18:28:55Z) - Heterogeneous Learning Rate Scheduling for Neural Architecture Search on Long-Tailed Datasets [0.0]
We propose a novel adaptive learning rate scheduling strategy tailored for the architecture parameters of DARTS.
Our approach dynamically adjusts the learning rate of the architecture parameters based on the training epoch, preventing the disruption of well-trained representations.
arXiv Detail & Related papers (2024-06-11T07:32:25Z) - DiffsFormer: A Diffusion Transformer on Stock Factor Augmentation [36.75453713794983]
We introduce the Diffusion Model to generate stock factors with Transformer architecture (DiffsFormer)
When presented with a specific downstream task, we employ DiffsFormer to augment the training procedure by editing existing samples.
The proposed method achieves relative improvements of 7.2% and 27.8% in annualized return ratio for the respective datasets.
arXiv Detail & Related papers (2024-02-05T03:54:36Z) - Importance-Aware Adaptive Dataset Distillation [53.79746115426363]
Development of deep learning models is enabled by the availability of large-scale datasets.
dataset distillation aims to synthesize a compact dataset that retains the essential information from the large original dataset.
We propose an importance-aware adaptive dataset distillation (IADD) method that can improve distillation performance.
arXiv Detail & Related papers (2024-01-29T03:29:39Z) - Implicit Counterfactual Data Augmentation for Robust Learning [24.795542869249154]
This study proposes an Implicit Counterfactual Data Augmentation method to remove spurious correlations and make stable predictions.
Experiments have been conducted across various biased learning scenarios covering both image and text datasets.
arXiv Detail & Related papers (2023-04-26T10:36:40Z) - Minimizing the Accumulated Trajectory Error to Improve Dataset
Distillation [151.70234052015948]
We propose a novel approach that encourages the optimization algorithm to seek a flat trajectory.
We show that the weights trained on synthetic data are robust against the accumulated errors perturbations with the regularization towards the flat trajectory.
Our method, called Flat Trajectory Distillation (FTD), is shown to boost the performance of gradient-matching methods by up to 4.7%.
arXiv Detail & Related papers (2022-11-20T15:49:11Z) - Scale-Equivalent Distillation for Semi-Supervised Object Detection [57.59525453301374]
Recent Semi-Supervised Object Detection (SS-OD) methods are mainly based on self-training, generating hard pseudo-labels by a teacher model on unlabeled data as supervisory signals.
We analyze the challenges these methods meet with the empirical experiment results.
We introduce a novel approach, Scale-Equivalent Distillation (SED), which is a simple yet effective end-to-end knowledge distillation framework robust to large object size variance and class imbalance.
arXiv Detail & Related papers (2022-03-23T07:33:37Z) - Extrapolation for Large-batch Training in Deep Learning [72.61259487233214]
We show that a host of variations can be covered in a unified framework that we propose.
We prove the convergence of this novel scheme and rigorously evaluate its empirical performance on ResNet, LSTM, and Transformer.
arXiv Detail & Related papers (2020-06-10T08:22:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.