Fugu-MT 論文翻訳(概要): Not All Instances Are Equally Valuable: Towards Influence-Weighted Dataset Distillation

論文の概要: Not All Instances Are Equally Valuable: Towards Influence-Weighted Dataset Distillation

arxiv url: http://arxiv.org/abs/2510.27253v1
Date: Fri, 31 Oct 2025 07:41:41 GMT
ステータス: 翻訳完了
システム内更新日: 2025-11-03 17:52:16.024338
Title: Not All Instances Are Equally Valuable: Towards Influence-Weighted Dataset Distillation
Title（参考訳）: すべてのインスタンスが同等に評価できるわけではない:影響重み付きデータセット蒸留を目指して
Authors: Qiyan Deng, Changqian Zheng, Lianpeng Qiao, Yuping Wang, Chengliang Chai, Lei Cao,
Abstract要約: 蒸留プロセスにおけるデータ品質を考慮に入れたインフルエンス関数を利用したフレームワークであるインフルエンス重み蒸留IWDを提案する。 IWDは、蒸留目標に対する推定された影響に基づいて各インスタンスに適応重みを割り当て、有用データを優先順位付けすると同時に、より有用または有害なものを減量する。実験結果から,IWDの統合により蒸留データセットの品質が向上し,モデル性能が向上し,精度が最大7.8%向上することが示唆された。
参考スコア（独自算出の注目度）: 10.625826589163252
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Dataset distillation condenses large datasets into synthetic subsets, achieving performance comparable to training on the full dataset while substantially reducing storage and computation costs. Most existing dataset distillation methods assume that all real instances contribute equally to the process. In practice, real-world datasets contain both informative and redundant or even harmful instances, and directly distilling the full dataset without considering data quality can degrade model performance. In this work, we present Influence-Weighted Distillation IWD, a principled framework that leverages influence functions to explicitly account for data quality in the distillation process. IWD assigns adaptive weights to each instance based on its estimated impact on the distillation objective, prioritizing beneficial data while downweighting less useful or harmful ones. Owing to its modular design, IWD can be seamlessly integrated into diverse dataset distillation frameworks. Our empirical results suggest that integrating IWD tends to improve the quality of distilled datasets and enhance model performance, with accuracy gains of up to 7.8%.
Abstract（参考訳）: データセット蒸留は、大規模なデータセットを合成サブセットに凝縮し、フルデータセットでのトレーニングに匹敵するパフォーマンスを実現し、ストレージと計算コストを大幅に削減する。既存のデータセット蒸留法の多くは、全ての実例がプロセスに等しく寄与していると仮定している。実際には、実世界のデータセットには、情報的、冗長、あるいは有害なインスタンスの両方が含まれており、データ品質を考慮せずにデータセット全体を直接蒸留することは、モデルのパフォーマンスを劣化させる可能性がある。本研究は, 蒸留プロセスにおけるデータ品質を明確に説明するために, 影響関数を利用する原理的フレームワークであるEmpfect-Weighted Distillation IWDを提案する。 IWDは、蒸留目標に対する推定された影響に基づいて各インスタンスに適応重みを割り当て、有用データを優先順位付けすると同時に、より有用または有害なものを減量する。モジュラー設計のため、IWDは多様なデータセット蒸留フレームワークにシームレスに統合できる。実験結果から,IWDの統合により蒸留データセットの品質が向上し,モデル性能が向上し,精度が最大7.8%向上することが示唆された。

論文の概要: Not All Instances Are Equally Valuable: Towards Influence-Weighted Dataset Distillation

関連論文リスト