Fugu-MT 論文翻訳(概要): TGDD: Trajectory Guided Dataset Distillation with Balanced Distribution

論文の概要: TGDD: Trajectory Guided Dataset Distillation with Balanced Distribution

arxiv url: http://arxiv.org/abs/2512.02469v1
Date: Tue, 02 Dec 2025 07:00:07 GMT
ステータス: 翻訳完了
システム内更新日: 2025-12-03 21:04:45.753482
Title: TGDD: Trajectory Guided Dataset Distillation with Balanced Distribution
Title（参考訳）: TGDD: 平衡分布を持つ軌道誘導型データセット蒸留
Authors: Fengli Ran, Xiao Pu, Bo Liu, Xiuli Bi, Bin Xiao,
Abstract要約: 動的アライメントプロセスとして分布マッチングを再構成するトラジェクトリガイド付きデータセット蒸留(TGDD)を提案する。各トレーニング段階では、TGDDは、合成データセットと元のデータセットの間の特徴分布を調整することによって、進化的なセマンティクスをキャプチャする。 10つのデータセットの実験では、TGDDは最先端のパフォーマンスを達成しており、特に高解像度のベンチマークでは5.0%の精度が向上している。
参考スコア（独自算出の注目度）: 22.720901808326122
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Dataset distillation compresses large datasets into compact synthetic ones to reduce storage and computational costs. Among various approaches, distribution matching (DM)-based methods have attracted attention for their high efficiency. However, they often overlook the evolution of feature representations during training, which limits the expressiveness of synthetic data and weakens downstream performance. To address this issue, we propose Trajectory Guided Dataset Distillation (TGDD), which reformulates distribution matching as a dynamic alignment process along the model's training trajectory. At each training stage, TGDD captures evolving semantics by aligning the feature distribution between the synthetic and original dataset. Meanwhile, it introduces a distribution constraint regularization to reduce class overlap. This design helps synthetic data preserve both semantic diversity and representativeness, improving performance in downstream tasks. Without additional optimization overhead, TGDD achieves a favorable balance between performance and efficiency. Experiments on ten datasets demonstrate that TGDD achieves state-of-the-art performance, notably a 5.0% accuracy gain on high-resolution benchmarks.
Abstract（参考訳）: データセット蒸留は、大規模なデータセットをコンパクトな合成データセットに圧縮し、ストレージと計算コストを削減する。様々な手法の中で,分散マッチング(DM)に基づく手法は高い効率性に注目されている。しかし、彼らはしばしば、合成データの表現性を制限し、下流のパフォーマンスを低下させる訓練中の特徴表現の進化を見落としている。この問題に対処するために,モデルのトレーニング軌道に沿った動的アライメントプロセスとして分布マッチングを再構成するトラジェクトリガイドデータセット蒸留(TGDD)を提案する。各トレーニング段階では、TGDDは、合成データセットと元のデータセットの間の特徴分布を調整することによって、進化的なセマンティクスをキャプチャする。一方、クラスオーバーラップを減らすために、分散制約正規化を導入している。この設計は、セマンティックな多様性と代表性の両方を保存するのに役立ち、下流タスクのパフォーマンスを向上させる。追加の最適化オーバーヘッドがなければ、TGDDはパフォーマンスと効率のバランスが良い。 10つのデータセットの実験では、TGDDは最先端のパフォーマンスを実現しており、特に高解像度のベンチマークでは5.0%の精度が向上している。

論文の概要: TGDD: Trajectory Guided Dataset Distillation with Balanced Distribution

関連論文リスト