Related papers: Delving into Effective Gradient Matching for Dataset Condensation

Delving into Effective Gradient Matching for Dataset Condensation

URL: http://arxiv.org/abs/2208.00311v1
Date: Sat, 30 Jul 2022 21:31:10 GMT
Title: Delving into Effective Gradient Matching for Dataset Condensation
Authors: Zixuan Jiang, Jiaqi Gu, Mingjie Liu, David Z. Pan
Abstract summary: gradient matching method directly targets the training dynamics by matching the gradient when training on the original and synthetic datasets. We propose to match the multi-level gradients to involve both intra-class and inter-class gradient information. An overfitting-aware adaptive learning step strategy is also proposed to trim unnecessary optimization steps for algorithmic efficiency improvement.
Score: 13.75957901381024
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: As deep learning models and datasets rapidly scale up, network training is extremely time-consuming and resource-costly. Instead of training on the entire dataset, learning with a small synthetic dataset becomes an efficient solution. Extensive research has been explored in the direction of dataset condensation, among which gradient matching achieves state-of-the-art performance. The gradient matching method directly targets the training dynamics by matching the gradient when training on the original and synthetic datasets. However, there are limited deep investigations into the principle and effectiveness of this method. In this work, we delve into the gradient matching method from a comprehensive perspective and answer the critical questions of what, how, and where to match. We propose to match the multi-level gradients to involve both intra-class and inter-class gradient information. We demonstrate that the distance function should focus on the angle, considering the magnitude simultaneously to delay the overfitting. An overfitting-aware adaptive learning step strategy is also proposed to trim unnecessary optimization steps for algorithmic efficiency improvement. Ablation and comparison experiments demonstrate that our proposed methodology shows superior accuracy, efficiency, and generalization compared to prior work.

Related papers

Linearly Convergent Mixup Learning [0.0]
We present two novel algorithms that extend to a broader range of binary classification models. Unlike gradient-based approaches, our algorithms do not require hyper parameters like learning rates, simplifying their implementation and optimization. Our algorithms achieve faster convergence to the optimal solution compared to descent gradient approaches, and that mixup data augmentation consistently improves the predictive performance across various loss functions.
arXiv Detail & Related papers (2025-01-14T02:33:40Z)
Efficient Gradient Estimation via Adaptive Sampling and Importance Sampling [34.50693643119071]
adaptive or importance sampling reduces noise in gradient estimation. We present an algorithm that can incorporate existing importance functions into our framework. We observe improved convergence in classification and regression tasks with minimal computational overhead.
arXiv Detail & Related papers (2023-11-24T13:21:35Z)
Improved Distribution Matching for Dataset Condensation [91.55972945798531]
We propose a novel dataset condensation method based on distribution matching. Our simple yet effective method outperforms most previous optimization-oriented methods with much fewer computational resources.
arXiv Detail & Related papers (2023-07-19T04:07:33Z)
Angle based dynamic learning rate for gradient descent [2.5077510176642805]
We propose a novel yet simple approach to obtain an adaptive learning rate for gradient-based descent methods on classification tasks. Instead of the traditional approach of selecting adaptive learning rates via the expectation of gradient-based terms, we use the angle between the current gradient and the new gradient. We find that our method leads to the highest accuracy in most of the datasets.
arXiv Detail & Related papers (2023-04-20T16:55:56Z)
Minimizing the Accumulated Trajectory Error to Improve Dataset Distillation [151.70234052015948]
We propose a novel approach that encourages the optimization algorithm to seek a flat trajectory. We show that the weights trained on synthetic data are robust against the accumulated errors perturbations with the regularization towards the flat trajectory. Our method, called Flat Trajectory Distillation (FTD), is shown to boost the performance of gradient-matching methods by up to 4.7%.
arXiv Detail & Related papers (2022-11-20T15:49:11Z)
Condensing Graphs via One-Step Gradient Matching [50.07587238142548]
We propose a one-step gradient matching scheme, which performs gradient matching for only one single step without training the network weights. Our theoretical analysis shows this strategy can generate synthetic graphs that lead to lower classification loss on real graphs. In particular, we are able to reduce the dataset size by 90% while approximating up to 98% of the original performance.
arXiv Detail & Related papers (2022-06-15T18:20:01Z)
Dataset Condensation with Contrastive Signals [41.195453119305746]
gradient matching-based dataset synthesis (DC) methods can achieve state-of-the-art performance when applied to data-efficient learning tasks. In this study, we prove that the existing DC methods can perform worse than the random selection method when task-irrelevant information forms a significant part of the training dataset. We propose dataset condensation with Contrastive signals (DCC) by modifying the loss function to enable the DC methods to effectively capture the differences between classes.
arXiv Detail & Related papers (2022-02-07T03:05:32Z)
An Adaptive Framework for Learning Unsupervised Depth Completion [59.17364202590475]
We present a method to infer a dense depth map from a color image and associated sparse depth measurements. We show that regularization and co-visibility are related via the fitness of the model to data and can be unified into a single framework.
arXiv Detail & Related papers (2021-06-06T02:27:55Z)
Layerwise Optimization by Gradient Decomposition for Continual Learning [78.58714373218118]
Deep neural networks achieve state-of-the-art and sometimes super-human performance across various domains. When learning tasks sequentially, the networks easily forget the knowledge of previous tasks, known as "catastrophic forgetting"
arXiv Detail & Related papers (2021-05-17T01:15:57Z)
Disentangling Adaptive Gradient Methods from Learning Rates [65.0397050979662]
We take a deeper look at how adaptive gradient methods interact with the learning rate schedule. We introduce a "grafting" experiment which decouples an update's magnitude from its direction. We present some empirical and theoretical retrospectives on the generalization of adaptive gradient methods.
arXiv Detail & Related papers (2020-02-26T21:42:49Z)
Adaptive Gradient Sparsification for Efficient Federated Learning: An Online Learning Approach [11.986523531539165]
Federated learning (FL) is an emerging technique for training machine learning models using geographically dispersed data. gradient sparsification (GS) can be applied, where instead of the full gradient, only a small subset of important elements of the gradient is communicated. We propose a novel online learning formulation and algorithm for automatically determining the near-optimal communication and trade-off.
arXiv Detail & Related papers (2020-01-14T13:09:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.