Related papers: Boosting the Cross-Architecture Generalization of Dataset Distillation through an Empirical Study

Boosting the Cross-Architecture Generalization of Dataset Distillation through an Empirical Study

URL: http://arxiv.org/abs/2312.05598v2
Date: Wed, 26 Jun 2024 08:43:43 GMT
Title: Boosting the Cross-Architecture Generalization of Dataset Distillation through an Empirical Study
Authors: Lirui Zhao, Yuxin Zhang, Fei Chao, Rongrong Ji,
Abstract summary: Cross-architecture generalization of dataset distillation weakens its practical significance. We propose a novel method of EvaLuation with distillation Feature (ELF) By performing extensive experiments, we successfully prove that ELF can well enhance the cross-architecture generalization of current DD methods.
Score: 52.83643622795387
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The poor cross-architecture generalization of dataset distillation greatly weakens its practical significance. This paper attempts to mitigate this issue through an empirical study, which suggests that the synthetic datasets undergo an inductive bias towards the distillation model. Therefore, the evaluation model is strictly confined to having similar architectures of the distillation model. We propose a novel method of EvaLuation with distillation Feature (ELF), which utilizes features from intermediate layers of the distillation model for the cross-architecture evaluation. In this manner, the evaluation model learns from bias-free knowledge therefore its architecture becomes unfettered while retaining performance. By performing extensive experiments, we successfully prove that ELF can well enhance the cross-architecture generalization of current DD methods. Code of this project is at \url{https://github.com/Lirui-Zhao/ELF}.

Related papers

Universal Inverse Distillation for Matching Models with Real-Data Supervision (No GANs) [63.681263056053666]
We present RealUID, a universal distillation framework for all matching models that seamlessly incorporates real data into the distillation procedure without GANs.<n>Our RealUID approach offers a simple theoretical foundation that covers previous distillation methods for Flow Matching and Diffusion models, and is also extended to their modifications, such as Bridge Matching and Interpolants.
arXiv Detail & Related papers (2025-09-26T15:12:02Z)
Feature Distillation is the Better Choice for Model-Heterogeneous Federated Learning [8.04716022048554]
Model-heterogeneous Federated Learning (Hetero-FL) has attracted growing attention for its ability to aggregate knowledge from heterogeneous models while keeping private data locally.<n>To better aggregate knowledge from clients, ensemble distillation, as a widely used and effective technique, is often employed after global aggregation to enhance the performance of the global model.<n>We propose a new feature-based ensemble federated knowledge distillation paradigm is proposed.
arXiv Detail & Related papers (2025-07-14T14:51:18Z)
Data-to-Model Distillation: Data-Efficient Learning Framework [14.44010988811002]
We propose a novel framework called Data-to-Model Distillation (D2M) to distill the real dataset's knowledge into the learnable parameters of a pre-trained generative model. Our method effectively scales up to high-resolution 128x128 ImageNet-1K.
arXiv Detail & Related papers (2024-11-19T20:10:28Z)
D$^4$M: Dataset Distillation via Disentangled Diffusion Model [4.568710926635445]
We propose an efficient framework for dataset distillation via Disentangled Diffusion Model (D$4$M) Compared to architecture-dependent methods, D$4$M employs latent diffusion model to guarantee consistency and incorporates label information into category prototypes. D$4$M demonstrates superior performance and robust generalization, surpassing the SOTA methods across most aspects.
arXiv Detail & Related papers (2024-07-21T12:16:20Z)
Improve Cross-Architecture Generalization on Dataset Distillation [1.688134675717698]
"Model pool" is a novel approach to creating a synthetic dataset from a larger existing dataset. Our results validate the effectiveness of the model pool approach across a range of existing models while testing, demonstrating superior performance compared to existing methodologies.
arXiv Detail & Related papers (2024-02-20T13:42:36Z)
One-for-All: Bridge the Gap Between Heterogeneous Architectures in Knowledge Distillation [69.65734716679925]
Knowledge distillation has proven to be a highly effective approach for enhancing model performance through a teacher-student training scheme. Most existing distillation methods are designed under the assumption that the teacher and student models belong to the same model family. We propose a simple yet effective one-for-all KD framework called OFA-KD, which significantly improves the distillation performance between heterogeneous architectures.
arXiv Detail & Related papers (2023-10-30T11:13:02Z)
Universal Domain Adaptation from Foundation Models: A Baseline Study [58.51162198585434]
We make empirical studies of state-of-the-art UniDA methods using foundation models. We introduce textitCLIP distillation, a parameter-free method specifically designed to distill target knowledge from CLIP models. Although simple, our method outperforms previous approaches in most benchmark tasks.
arXiv Detail & Related papers (2023-05-18T16:28:29Z)
EmbedDistill: A Geometric Knowledge Distillation for Information Retrieval [83.79667141681418]
Large neural models (such as Transformers) achieve state-of-the-art performance for information retrieval (IR) We propose a novel distillation approach that leverages the relative geometry among queries and documents learned by the large teacher model. We show that our approach successfully distills from both dual-encoder (DE) and cross-encoder (CE) teacher models to 1/10th size asymmetric students that can retain 95-97% of the teacher performance.
arXiv Detail & Related papers (2023-01-27T22:04:37Z)
Minimizing the Accumulated Trajectory Error to Improve Dataset Distillation [151.70234052015948]
We propose a novel approach that encourages the optimization algorithm to seek a flat trajectory. We show that the weights trained on synthetic data are robust against the accumulated errors perturbations with the regularization towards the flat trajectory. Our method, called Flat Trajectory Distillation (FTD), is shown to boost the performance of gradient-matching methods by up to 4.7%.
arXiv Detail & Related papers (2022-11-20T15:49:11Z)
CAFE: Learning to Condense Dataset by Aligning Features [72.99394941348757]
We propose a novel scheme to Condense dataset by Aligning FEatures (CAFE) At the heart of our approach is an effective strategy to align features from the real and synthetic data across various scales. We validate the proposed CAFE across various datasets, and demonstrate that it generally outperforms the state of the art.
arXiv Detail & Related papers (2022-03-03T05:58:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.