Related papers: Generative Dataset Distillation: Balancing Global Structure and Local Details

Generative Dataset Distillation: Balancing Global Structure and Local Details

URL: http://arxiv.org/abs/2404.17732v1
Date: Fri, 26 Apr 2024 23:46:10 GMT
Title: Generative Dataset Distillation: Balancing Global Structure and Local Details
Authors: Longzhen Li, Guang Li, Ren Togo, Keisuke Maeda, Takahiro Ogawa, Miki Haseyama,
Abstract summary: We propose a new dataset distillation method that considers balancing global structure and local details. Our method involves using a conditional generative adversarial network to generate the distilled dataset.
Score: 49.20086587208214
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this paper, we propose a new dataset distillation method that considers balancing global structure and local details when distilling the information from a large dataset into a generative model. Dataset distillation has been proposed to reduce the size of the required dataset when training models. The conventional dataset distillation methods face the problem of long redeployment time and poor cross-architecture performance. Moreover, previous methods focused too much on the high-level semantic attributes between the synthetic dataset and the original dataset while ignoring the local features such as texture and shape. Based on the above understanding, we propose a new method for distilling the original image dataset into a generative model. Our method involves using a conditional generative adversarial network to generate the distilled dataset. Subsequently, we ensure balancing global structure and local details in the distillation process, continuously optimizing the generator for more information-dense dataset generation.

Related papers

Taming Diffusion for Dataset Distillation with High Representativeness [49.3818035378669]
D3HR is a novel diffusion-based framework to generate distilled datasets with high representativeness.<n>Our experiments demonstrate that D3HR can achieve higher accuracy across different model architectures.
arXiv Detail & Related papers (2025-05-23T22:05:59Z)
Dataset Distillation of 3D Point Clouds via Distribution Matching [10.294595110092407]
We propose a distribution matching-based distillation framework for 3D point clouds.<n>To address the semantic misalignment caused by unordered indexing of points, we introduce a Semantically Aligned Distribution Matching loss.<n>To address the rotation variation, we jointly learn the optimal rotation angles while updating the synthetic dataset to better align with the original feature distribution.
arXiv Detail & Related papers (2025-03-28T05:15:22Z)
Personalized Federated Learning via Active Sampling [50.456464838807115]
This paper proposes a novel method for sequentially identifying similar (or relevant) data generators. Our method evaluates the relevance of a data generator by evaluating the effect of a gradient step using its local dataset. We extend this method to non-parametric models by a suitable generalization of the gradient step to update a hypothesis using the local dataset provided by a data generator.
arXiv Detail & Related papers (2024-09-03T17:12:21Z)
Prioritize Alignment in Dataset Distillation [27.71563788300818]
Existing methods use the agent model to extract information from the target dataset and embed it into the distilled dataset. We find that existing methods introduce misaligned information in both information extraction and embedding stages. We propose Prioritize Alignment in dataset Distillation (PAD), which aligns information from the following two perspectives.
arXiv Detail & Related papers (2024-08-06T17:07:28Z)
Hierarchical Features Matter: A Deep Exploration of GAN Priors for Improved Dataset Distillation [51.44054828384487]
We propose a novel parameterization method dubbed Hierarchical Generative Latent Distillation (H-GLaD) This method systematically explores hierarchical layers within the generative adversarial networks (GANs) In addition, we introduce a novel class-relevant feature distance metric to alleviate the computational burden associated with synthetic dataset evaluation.
arXiv Detail & Related papers (2024-06-09T09:15:54Z)
Exploring the potential of prototype-based soft-labels data distillation for imbalanced data classification [0.0]
Main goal is to push further the performance of prototype-based soft-labels distillation in terms of classification accuracy. Experimental studies trace the capability of the method to distill the data, but also the opportunity to act as an augmentation method.
arXiv Detail & Related papers (2024-03-25T19:15:19Z)
Importance-Aware Adaptive Dataset Distillation [53.79746115426363]
Development of deep learning models is enabled by the availability of large-scale datasets. dataset distillation aims to synthesize a compact dataset that retains the essential information from the large original dataset. We propose an importance-aware adaptive dataset distillation (IADD) method that can improve distillation performance.
arXiv Detail & Related papers (2024-01-29T03:29:39Z)
Dataset Distillation via Adversarial Prediction Matching [24.487950991247764]
We propose an adversarial framework to solve the dataset distillation problem efficiently. Our method can produce synthetic datasets just 10% the size of the original, yet achieve, on average, 94% of the test accuracy of models trained on the full original datasets.
arXiv Detail & Related papers (2023-12-14T13:19:33Z)
Exploring Multilingual Text Data Distillation [0.0]
We propose several data distillation techniques for multilingual text classification datasets using language-model-based learning methods. We conduct experiments to analyze their performance in terms of classification strength, and cross-architecture generalization. Our approach builds upon existing techniques, enhancing cross-architecture generalization in the text data distillation domain.
arXiv Detail & Related papers (2023-08-09T14:31:57Z)
Distill Gold from Massive Ores: Bi-level Data Pruning towards Efficient Dataset Distillation [96.92250565207017]
We study the data efficiency and selection for the dataset distillation task. By re-formulating the dynamics of distillation, we provide insight into the inherent redundancy in the real dataset. We find the most contributing samples based on their causal effects on the distillation.
arXiv Detail & Related papers (2023-05-28T06:53:41Z)
Generalizing Dataset Distillation via Deep Generative Prior [75.9031209877651]
We propose to distill an entire dataset's knowledge into a few synthetic images. The idea is to synthesize a small number of synthetic data points that, when given to a learning algorithm as training data, result in a model approximating one trained on the original data. We present a new optimization algorithm that distills a large number of images into a few intermediate feature vectors in the generative model's latent space.
arXiv Detail & Related papers (2023-05-02T17:59:31Z)
Dataset Distillation by Matching Training Trajectories [75.9031209877651]
We propose a new formulation that optimize our distilled data to guide networks to a similar state as those trained on real data. Given a network, we train it for several iterations on our distilled data and optimize the distilled data with respect to the distance between the synthetically trained parameters and the parameters trained on real data. Our method handily outperforms existing methods and also allows us to distill higher-resolution visual data.
arXiv Detail & Related papers (2022-03-22T17:58:59Z)

This list is automatically generated from the titles and abstracts of the papers in this site.