Fugu-MT 論文翻訳(概要): Dataset Distillation via Committee Voting

論文の概要: Dataset Distillation via Committee Voting

arxiv url: http://arxiv.org/abs/2501.07575v1
Date: Mon, 13 Jan 2025 18:59:48 GMT
ステータス: 翻訳完了
システム内更新日: 2025-01-14 19:20:14.368767
Title: Dataset Distillation via Committee Voting
Title（参考訳）: 委員会投票によるデータセット蒸留
Authors: Jiacheng Cui, Zhaoyi Li, Xiaochen Ma, Xinyue Bi, Yaxin Luo, Zhiqiang Shen,
Abstract要約: 我々は$bf C$ommittee $bf V$oting for $bf D$ataset $bf D$istillation (CV-DD)を紹介する。 CV-DDは、複数のモデルや専門家の集合知を利用して高品質な蒸留データセットを作成する新しいアプローチである。
参考スコア（独自算出の注目度）: 21.018818924580877
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Dataset distillation aims to synthesize a smaller, representative dataset that preserves the essential properties of the original data, enabling efficient model training with reduced computational resources. Prior work has primarily focused on improving the alignment or matching process between original and synthetic data, or on enhancing the efficiency of distilling large datasets. In this work, we introduce ${\bf C}$ommittee ${\bf V}$oting for ${\bf D}$ataset ${\bf D}$istillation (CV-DD), a novel and orthogonal approach that leverages the collective wisdom of multiple models or experts to create high-quality distilled datasets. We start by showing how to establish a strong baseline that already achieves state-of-the-art accuracy through leveraging recent advancements and thoughtful adjustments in model design and optimization processes. By integrating distributions and predictions from a committee of models while generating high-quality soft labels, our method captures a wider spectrum of data features, reduces model-specific biases and the adverse effects of distribution shifts, leading to significant improvements in generalization. This voting-based strategy not only promotes diversity and robustness within the distilled dataset but also significantly reduces overfitting, resulting in improved performance on post-eval tasks. Extensive experiments across various datasets and IPCs (images per class) demonstrate that Committee Voting leads to more reliable and adaptable distilled data compared to single/multi-model distillation methods, demonstrating its potential for efficient and accurate dataset distillation. Code is available at: https://github.com/Jiacheng8/CV-DD.
Abstract（参考訳）: データセット蒸留は、元のデータの本質的な性質を保存し、計算資源を削減した効率的なモデルトレーニングを可能にする、より小さな、代表的なデータセットを合成することを目的としている。これまでの研究は主に、原データと合成データのアライメントやマッチングプロセスの改善、あるいは大規模なデータセットの蒸留効率の向上に重点を置いてきた。本稿では,複数のモデルや専門家の集合知を利用して高品質な蒸留データセットを作成する,新規で直交的なアプローチである${\bf C}$ommittee ${\bf V}$oting for ${\bf D}$ataset ${\bf D}$istillation (CV-DD)を紹介する。まず、モデル設計と最適化プロセスにおける最近の進歩と思慮深い調整を活用することによって、最先端の精度をすでに達成している強力なベースラインを確立する方法を示す。高品質なソフトラベルを生成しながら,モデル委員会から分布と予測を統合することにより,データ特徴の幅広いスペクトルを抽出し,モデル固有のバイアスと分布シフトの悪影響を低減し,一般化の大幅な改善につながった。この投票ベースの戦略は、蒸留データセット内の多様性と堅牢性を促進させるだけでなく、過度な適合を著しく減らし、実行後のタスクのパフォーマンスが向上する。様々なデータセットやICC(クラスごとのイメージ)にわたる大規模な実験により、委員会投票は単一/複数モデル蒸留法よりも信頼性が高く適応可能な蒸留データをもたらし、効率的で正確なデータセット蒸留の可能性を示している。コードは、https://github.com/Jiacheng8/CV-DD.comで入手できる。

論文の概要: Dataset Distillation via Committee Voting

関連論文リスト