Fugu-MT 論文翻訳(概要): Rectified Decoupled Dataset Distillation: A Closer Look for Fair and Comprehensive Evaluation

論文の概要: Rectified Decoupled Dataset Distillation: A Closer Look for Fair and Comprehensive Evaluation

arxiv url: http://arxiv.org/abs/2509.19743v1
Date: Wed, 24 Sep 2025 03:47:04 GMT
ステータス: 翻訳完了
システム内更新日: 2025-09-25 20:53:19.682611
Title: Rectified Decoupled Dataset Distillation: A Closer Look for Fair and Comprehensive Evaluation
Title（参考訳）: 改質脱カップリング型データセット蒸留 : 公正かつ包括的評価の概観
Authors: Xinhao Zhong, Shuoyang Sun, Xulin Gu, Chenyang Zhu, Bin Chen, Yaowei Wang,
Abstract要約: 本稿では,コンパクトな合成データセットを生成するために,Rectified Decoupled dataset Distillation (RD$3$)を提案する。 RD$3$は、将来のデータセット蒸留研究における公正かつ再現可能な比較の基礎を提供する。
参考スコア（独自算出の注目度）: 36.444254126901065
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Dataset distillation aims to generate compact synthetic datasets that enable models trained on them to achieve performance comparable to those trained on full real datasets, while substantially reducing storage and computational costs. Early bi-level optimization methods (e.g., MTT) have shown promising results on small-scale datasets, but their scalability is limited by high computational overhead. To address this limitation, recent decoupled dataset distillation methods (e.g., SRe$^2$L) separate the teacher model pre-training from the synthetic data generation process. These methods also introduce random data augmentation and epoch-wise soft labels during the post-evaluation phase to improve performance and generalization. However, existing decoupled distillation methods suffer from inconsistent post-evaluation protocols, which hinders progress in the field. In this work, we propose Rectified Decoupled Dataset Distillation (RD$^3$), and systematically investigate how different post-evaluation settings affect test accuracy. We further examine whether the reported performance differences across existing methods reflect true methodological advances or stem from discrepancies in evaluation procedures. Our analysis reveals that much of the performance variation can be attributed to inconsistent evaluation rather than differences in the intrinsic quality of the synthetic data. In addition, we identify general strategies that improve the effectiveness of distilled datasets across settings. By establishing a standardized benchmark and rigorous evaluation protocol, RD$^3$ provides a foundation for fair and reproducible comparisons in future dataset distillation research.
Abstract（参考訳）: データセット蒸留は、トレーニングされたモデルが、フルリアルデータセットでトレーニングされたモデルに匹敵するパフォーマンスを達成し、ストレージと計算コストを大幅に削減する、コンパクトな合成データセットを生成することを目的としている。初期の二段階最適化手法(例えば、MTT)は、小規模データセットに対して有望な結果を示しているが、そのスケーラビリティは高い計算オーバーヘッドによって制限されている。この制限に対処するため、最近の分離データセット蒸留法(例えば、SRe$^2$L)は、教師モデルと合成データ生成プロセスとの事前学習を分離している。これらの手法はまた、性能と一般化を改善するために、評価後の段階で、ランダムなデータ拡張とエポックなソフトラベルを導入している。しかし, 既存の脱カップリング蒸留法は, 現場の進展を妨げる不整合後評価プロトコルに悩まされている。本研究では,Rectified Decoupled Dataset Distillation (RD$^3$)を提案する。さらに,既存の手法間で報告された性能差が,真の方法論的進歩を反映しているか,評価手順の相違から生じるのかについても検討する。分析の結果,合成データの本質的品質の違いよりも,性能のばらつきが不整合性評価に起因していることが判明した。さらに, 蒸留データセットの有効性を高めるための一般的な戦略を, 設定毎に同定する。 RD$^3$は、標準化されたベンチマークと厳密な評価プロトコルを確立することにより、将来のデータセット蒸留研究における公正かつ再現可能な比較の基礎を提供する。

論文の概要: Rectified Decoupled Dataset Distillation: A Closer Look for Fair and Comprehensive Evaluation

関連論文リスト