Fugu-MT 論文翻訳(概要): Diffusion Models as Dataset Distillation Priors

論文の概要: Diffusion Models as Dataset Distillation Priors

arxiv url: http://arxiv.org/abs/2510.17421v1
Date: Mon, 20 Oct 2025 11:04:09 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-25 00:56:39.424538
Title: Diffusion Models as Dataset Distillation Priors
Title（参考訳）: データセット蒸留に先立つ拡散モデル
Authors: Duo Su, Huyu Wu, Huanran Chen, Yiming Shi, Yuzhu Wang, Xi Ye, Jun Zhu,
Abstract要約: 本稿では,特徴空間における合成データと実データとの類似性を定量化し,代表性を定式化するAs Diffusion Priors (DAP)を提案する。 DAPは、高忠実度データセットを生成する上で、最先端の手法よりも優れています。我々の研究は、拡散先行とデータセット蒸留の目的との理論的関係を確立する。
参考スコア（独自算出の注目度）: 39.4727398182562
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Dataset distillation aims to synthesize compact yet informative datasets from large ones. A significant challenge in this field is achieving a trifecta of diversity, generalization, and representativeness in a single distilled dataset. Although recent generative dataset distillation methods adopt powerful diffusion models as their foundation models, the inherent representativeness prior in diffusion models is overlooked. Consequently, these approaches often necessitate the integration of external constraints to enhance data quality. To address this, we propose Diffusion As Priors (DAP), which formalizes representativeness by quantifying the similarity between synthetic and real data in feature space using a Mercer kernel. We then introduce this prior as guidance to steer the reverse diffusion process, enhancing the representativeness of distilled samples without any retraining. Extensive experiments on large-scale datasets, such as ImageNet-1K and its subsets, demonstrate that DAP outperforms state-of-the-art methods in generating high-fidelity datasets while achieving superior cross-architecture generalization. Our work not only establishes a theoretical connection between diffusion priors and the objectives of dataset distillation but also provides a practical, training-free framework for improving the quality of the distilled dataset.
Abstract（参考訳）: データセットの蒸留は、大規模なデータセットからコンパクトだが情報的なデータセットを合成することを目的としている。この分野での重要な課題は、単一の蒸留データセットにおいて多様性、一般化、代表性のトリフェクタを達成することである。最近の生成データセット蒸留法では, 基礎モデルとして強力な拡散モデルが採用されているが, 拡散モデルに先行する本質的な代表性は見過ごされている。したがって、これらのアプローチはデータ品質を高めるために外部制約の統合を必要とすることが多い。そこで本稿では,メルサーカーネルを用いた特徴空間における合成データと実データとの類似性を定量化し,代表性を定式化する拡散優先法(DAP)を提案する。次に, 逆拡散過程を制御し, 再学習を伴わない蒸留試料の代表性を高めるためのガイダンスとして, この前報を紹介した。 ImageNet-1Kやそのサブセットのような大規模データセットに対する大規模な実験は、DAPが優れたクロスアーキテクチャの一般化を達成しつつ、高忠実なデータセットを生成する最先端の手法より優れていることを実証している。我々の研究は、拡散先行とデータセット蒸留の目的との理論的関係を確立するだけでなく、蒸留データセットの品質向上のための実践的でトレーニングのないフレームワークも提供する。

論文の概要: Diffusion Models as Dataset Distillation Priors

関連論文リスト