Distilling Dataset into Neural Field
- URL: http://arxiv.org/abs/2503.04835v1
- Date: Wed, 05 Mar 2025 14:33:29 GMT
- Title: Distilling Dataset into Neural Field
- Authors: Donghyeok Shin, HeeSun Bae, Gyuwon Sim, Wanmo Kang, Il-Chul Moon,
- Abstract summary: This paper proposes a novel parameterization framework for dataset distillation, coined Distilling dataset into Neural Field (DDiF)<n>Due to the unique nature of the neural field, DDiF effectively preserves the information and easily generates various shapes of data.<n>We demonstrate that DDiF achieves superior performance on several benchmark datasets, extending beyond the image domain to include video, audio, and 3D voxel.
- Score: 12.551430414723086
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Utilizing a large-scale dataset is essential for training high-performance deep learning models, but it also comes with substantial computation and storage costs. To overcome these challenges, dataset distillation has emerged as a promising solution by compressing the large-scale dataset into a smaller synthetic dataset that retains the essential information needed for training. This paper proposes a novel parameterization framework for dataset distillation, coined Distilling Dataset into Neural Field (DDiF), which leverages the neural field to store the necessary information of the large-scale dataset. Due to the unique nature of the neural field, which takes coordinates as input and output quantity, DDiF effectively preserves the information and easily generates various shapes of data. We theoretically confirm that DDiF exhibits greater expressiveness than some previous literature when the utilized budget for a single synthetic instance is the same. Through extensive experiments, we demonstrate that DDiF achieves superior performance on several benchmark datasets, extending beyond the image domain to include video, audio, and 3D voxel. We release the code at https://github.com/aailab-kaist/DDiF.
Related papers
- DDFAD: Dataset Distillation Framework for Audio Data [16.55650741388241]
Deep neural networks (DNNs) have achieved significant success in numerous applications.
Deep neural networks (DNNs) have achieved significant success in numerous applications.
arXiv Detail & Related papers (2024-07-15T05:23:35Z) - One Category One Prompt: Dataset Distillation using Diffusion Models [22.512552596310176]
We introduce Diffusion Models (D3M) as a novel paradigm for dataset distillation, leveraging recent advancements in generative text-to-image foundation models.
Our approach utilizes textual inversion, a technique for fine-tuning text-to-image generative models, to create concise and informative representations for large datasets.
arXiv Detail & Related papers (2024-03-11T20:23:59Z) - Importance-Aware Adaptive Dataset Distillation [53.79746115426363]
Development of deep learning models is enabled by the availability of large-scale datasets.
dataset distillation aims to synthesize a compact dataset that retains the essential information from the large original dataset.
We propose an importance-aware adaptive dataset distillation (IADD) method that can improve distillation performance.
arXiv Detail & Related papers (2024-01-29T03:29:39Z) - Dataset Quantization [72.61936019738076]
We present dataset quantization (DQ), a new framework to compress large-scale datasets into small subsets.
DQ is the first method that can successfully distill large-scale datasets such as ImageNet-1k with a state-of-the-art compression ratio.
arXiv Detail & Related papers (2023-08-21T07:24:29Z) - Generalizing Dataset Distillation via Deep Generative Prior [75.9031209877651]
We propose to distill an entire dataset's knowledge into a few synthetic images.
The idea is to synthesize a small number of synthetic data points that, when given to a learning algorithm as training data, result in a model approximating one trained on the original data.
We present a new optimization algorithm that distills a large number of images into a few intermediate feature vectors in the generative model's latent space.
arXiv Detail & Related papers (2023-05-02T17:59:31Z) - Dataset Distillation: A Comprehensive Review [76.26276286545284]
dataset distillation (DD) aims to derive a much smaller dataset containing synthetic samples, based on which the trained models yield performance comparable with those trained on the original dataset.
This paper gives a comprehensive review and summary of recent advances in DD and its application.
arXiv Detail & Related papers (2023-01-17T17:03:28Z) - Expanding Small-Scale Datasets with Guided Imagination [92.5276783917845]
dataset expansion is a new task aimed at expanding a ready-to-use small dataset by automatically creating new labeled samples.
GIF conducts data imagination by optimizing the latent features of the seed data in the semantically meaningful space of the prior model.
GIF-SD obtains 13.5% higher model accuracy on natural image datasets than unguided expansion with SD.
arXiv Detail & Related papers (2022-11-25T09:38:22Z) - Primitive3D: 3D Object Dataset Synthesis from Randomly Assembled
Primitives [44.03149443379618]
We propose a cost-effective method for automatically generating a large amount of 3D objects with annotations.
These objects are auto-annotated with part labels originating from primitives.
Considering the large overhead of learning on the generated dataset, we propose a dataset distillation strategy.
arXiv Detail & Related papers (2022-05-25T10:07:07Z) - Data Distillation for Text Classification [7.473576666437028]
Data distillation aims to distill the knowledge from a large training dataset down to a smaller and synthetic one.
We develop a novel data distillation method for text classification.
The results that the distilled data with the size of 0.1% of the original text data achieves approximately 90% performance of the original is rather impressive.
arXiv Detail & Related papers (2021-04-17T04:54:54Z) - Neural Data Server: A Large-Scale Search Engine for Transfer Learning
Data [78.74367441804183]
We introduce Neural Data Server (NDS), a large-scale search engine for finding the most useful transfer learning data to the target domain.
NDS consists of a dataserver which indexes several large popular image datasets, and aims to recommend data to a client.
We show the effectiveness of NDS in various transfer learning scenarios, demonstrating state-of-the-art performance on several target datasets.
arXiv Detail & Related papers (2020-01-09T01:21:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.