Leveraging Hierarchical Feature Sharing for Efficient Dataset Condensation
- URL: http://arxiv.org/abs/2310.07506v2
- Date: Fri, 19 Jul 2024 00:14:59 GMT
- Title: Leveraging Hierarchical Feature Sharing for Efficient Dataset Condensation
- Authors: Haizhong Zheng, Jiachen Sun, Shutong Wu, Bhavya Kailkhura, Zhuoqing Mao, Chaowei Xiao, Atul Prakash,
- Abstract summary: We propose a novel data parameterization architecture, Hierarchical Memory Network (HMN)
HMN stores condensed data in a three-tier structure, representing the dataset-level, class-level, and instance-level features.
We evaluate HMN on five public datasets and show that our proposed method outperforms all baselines.
- Score: 38.59750970617013
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Given a real-world dataset, data condensation (DC) aims to synthesize a small synthetic dataset that captures the knowledge of a natural dataset while being usable for training models with comparable accuracy. Recent works propose to enhance DC with data parameterization, which condenses data into very compact parameterized data containers instead of images. The intuition behind data parameterization is to encode shared features of images to avoid additional storage costs. In this paper, we recognize that images share common features in a hierarchical way due to the inherent hierarchical structure of the classification system, which is overlooked by current data parameterization methods. To better align DC with this hierarchical nature and encourage more efficient information sharing inside data containers, we propose a novel data parameterization architecture, Hierarchical Memory Network (HMN). HMN stores condensed data in a three-tier structure, representing the dataset-level, class-level, and instance-level features. Another helpful property of the hierarchical architecture is that HMN naturally ensures good independence among images despite achieving information sharing. This enables instance-level pruning for HMN to reduce redundant information, thereby further minimizing redundancy and enhancing performance. We evaluate HMN on five public datasets and show that our proposed method outperforms all baselines.
Related papers
- SCORE: Soft Label Compression-Centric Dataset Condensation via Coding Rate Optimization [29.93981107658258]
This paper proposes a textbfSoft label compression-centric dataset condensation framework.
It balances informativeness, discriminativeness, and compressibility of the condensed data.
Experiments on large-scale datasets, including ImageNet-1K and Tiny-ImageNet, demonstrate that SCORE outperforms existing methods in most cases.
arXiv Detail & Related papers (2025-03-18T06:04:44Z) - Incorporating Attributes and Multi-Scale Structures for Heterogeneous Graph Contrastive Learning [8.889313669713918]
We propose a novel contrastive learning framework for heterogeneous graphs (ASHGCL)
ASHGCL incorporates three distinct views, each focusing on node attributes, high-order and low-order structural information, respectively.
We introduce an attribute-enhanced positive sample selection strategy that combines both structural information and attribute information.
arXiv Detail & Related papers (2025-03-18T05:15:21Z) - Efficient Dataset Distillation through Low-Rank Space Sampling [34.29086540681496]
This paper proposes a dataset distillation method based on Matching Training Trajectories with Low-rank Space Sampling.
The synthetic data is represented by basis vectors and shared dimension mappers from these subspaces.
The proposed method is tested on CIFAR-10, CIFAR-100, and SVHN datasets, and outperforms the baseline methods by an average of 9.9%.
arXiv Detail & Related papers (2025-03-11T02:59:17Z) - Going Beyond Feature Similarity: Effective Dataset distillation based on Class-aware Conditional Mutual Information [43.44508080585033]
We introduce conditional mutual information (CMI) to assess the class-aware complexity of a dataset.
We minimize the distillation loss while constraining the class-aware complexity of the synthetic dataset.
arXiv Detail & Related papers (2024-12-13T08:10:47Z) - Topology-aware Reinforcement Feature Space Reconstruction for Graph Data [22.5530178427691]
Reconstructing a good feature space is essential to augment the AI power of data, improve model generalization, and increase the availability of downstream ML models.
We use topology-aware reinforcement learning to automate and optimize feature space reconstruction for graph data.
Our approach combines the extraction of core subgraphs to capture essential structural information with a graph neural network (GNN) to encode topological features and reduce computing complexity.
arXiv Detail & Related papers (2024-11-08T18:01:05Z) - Hierarchical Features Matter: A Deep Exploration of GAN Priors for Improved Dataset Distillation [51.44054828384487]
We propose a novel parameterization method dubbed Hierarchical Generative Latent Distillation (H-GLaD)
This method systematically explores hierarchical layers within the generative adversarial networks (GANs)
In addition, we introduce a novel class-relevant feature distance metric to alleviate the computational burden associated with synthetic dataset evaluation.
arXiv Detail & Related papers (2024-06-09T09:15:54Z) - Improved Distribution Matching for Dataset Condensation [91.55972945798531]
We propose a novel dataset condensation method based on distribution matching.
Our simple yet effective method outperforms most previous optimization-oriented methods with much fewer computational resources.
arXiv Detail & Related papers (2023-07-19T04:07:33Z) - Zero-shot Composed Text-Image Retrieval [72.43790281036584]
We consider the problem of composed image retrieval (CIR)
It aims to train a model that can fuse multi-modal information, e.g., text and images, to accurately retrieve images that match the query, extending the user's expression ability.
arXiv Detail & Related papers (2023-06-12T17:56:01Z) - Dynamic Data Augmentation via MCTS for Prostate MRI Segmentation [19.780410411548935]
We present Dynamic Data Augmentation (DDAug), which is efficient and has negligible cost.
DDAug computation develops a hierarchical tree structure to represent various augmentations.
Our method outperforms the current state-of-the-art data augmentation strategies.
arXiv Detail & Related papers (2023-05-25T06:44:43Z) - Dataset Condensation with Latent Space Knowledge Factorization and
Sharing [73.31614936678571]
We introduce a novel approach for solving dataset condensation problem by exploiting the regularity in a given dataset.
Instead of condensing the dataset directly in the original input space, we assume a generative process of the dataset with a set of learnable codes.
We experimentally show that our method achieves new state-of-the-art records by significant margins on various benchmark datasets.
arXiv Detail & Related papers (2022-08-21T18:14:08Z) - CAFE: Learning to Condense Dataset by Aligning Features [72.99394941348757]
We propose a novel scheme to Condense dataset by Aligning FEatures (CAFE)
At the heart of our approach is an effective strategy to align features from the real and synthetic data across various scales.
We validate the proposed CAFE across various datasets, and demonstrate that it generally outperforms the state of the art.
arXiv Detail & Related papers (2022-03-03T05:58:49Z) - Deep Recursive Embedding for High-Dimensional Data [9.611123249318126]
We propose to combine deep neural networks (DNN) with mathematics-guided embedding rules for high-dimensional data embedding.
We introduce a generic deep embedding network (DEN) framework, which is able to learn a parametric mapping from high-dimensional space to low-dimensional space.
arXiv Detail & Related papers (2021-10-31T23:22:33Z) - Index $t$-SNE: Tracking Dynamics of High-Dimensional Datasets with
Coherent Embeddings [1.7188280334580195]
This paper presents a methodology to reuse an embedding to create a new one, where cluster positions are preserved.
The proposed algorithm has the same complexity as the original $t$-SNE to embed new items, and a lower one when considering the embedding of a dataset sliced into sub-pieces.
arXiv Detail & Related papers (2021-09-22T06:45:37Z) - New advances in enumerative biclustering algorithms with online
partitioning [80.22629846165306]
This paper further extends RIn-Close_CVC, a biclustering algorithm capable of performing an efficient, complete, correct and non-redundant enumeration of maximal biclusters with constant values on columns in numerical datasets.
The improved algorithm is called RIn-Close_CVC3, keeps those attractive properties of RIn-Close_CVC, and is characterized by: a drastic reduction in memory usage; a consistent gain in runtime.
arXiv Detail & Related papers (2020-03-07T14:54:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.