Fugu-MT 論文翻訳(概要): Beyond the Golden Data: Resolving the Motion-Vision Quality Dilemma via Timestep Selective Training

論文の概要: Beyond the Golden Data: Resolving the Motion-Vision Quality Dilemma via Timestep Selective Training

arxiv url: http://arxiv.org/abs/2603.25527v2
Date: Wed, 01 Apr 2026 13:18:33 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-02 16:44:31.589862
Title: Beyond the Golden Data: Resolving the Motion-Vision Quality Dilemma via Timestep Selective Training
Title（参考訳）: ゴールデンデータを超えて:タイムステップ選択トレーニングによるモーションビジョン品質ジレンマの解消
Authors: Xiangyang Luo, Qingyu Li, Yuming Li, Guanbo Huang, Yongjie Zhu, Wenyu Qin, Meng Wang, Pengfei Wan, Shao-Lun Huang,
Abstract要約: ビデオデータキュレーションにおける重要な課題は、Motion-Vision Quality Dilemmaである。視覚的品質と運動強度は本質的に負の相関を示しており、両面に優れた黄金のデータを得ることが困難である。本稿では,データサンプリング分布をモデル学習プロセスに適合させるために,TQD(Timestep-aware Quality Decoupling)を提案する。
参考スコア（独自算出の注目度）: 36.5956174035203
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent advances in video generation models have achieved impressive results. However, these models heavily rely on the use of high-quality data that combines both high visual quality and high motion quality. In this paper, we identify a key challenge in video data curation: the Motion-Vision Quality Dilemma. We discovered that visual quality and motion intensity inherently exhibit a negative correlation, making it hard to obtain golden data that excels in both aspects. To address this challenge, we first examine the hierarchical learning dynamics of video diffusion models and conduct gradient-based analysis on quality-degraded samples. We discover that quality-imbalanced data can produce gradients similar to golden data at appropriate timesteps. Based on this, we introduce the novel concept of Timestep selection in Training Process. We propose Timestep-aware Quality Decoupling (TQD), which modifies the data sampling distribution to better match the model's learning process. For certain types of data, the sampling distribution is skewed toward higher timesteps for motion-rich data, while high visual quality data is more likely to be sampled during lower timesteps. Through extensive experiments, we demonstrate that TQD enables training exclusively on separated imbalanced data to achieve performance surpassing conventional training with better data, challenging the necessity of perfect data in video generation. Moreover, our method also boosts model performance when trained on high-quality data, showcasing its effectiveness across different data scenarios.
Abstract（参考訳）: 近年の映像生成モデルの進歩は印象的な成果を上げている。しかし、これらのモデルは高画質と高画質の両方を組み合わせた高品質のデータの利用に大きく依存している。本稿では,ビデオデータキュレーションにおける重要な課題であるMotion-Vision Quality Dilemmaについて述べる。その結果、視覚的品質と運動強度は本質的に負の相関を示しており、両面に優れた黄金データを得ることが困難であることが判明した。この課題に対処するために,まず,ビデオ拡散モデルの階層的学習ダイナミクスについて検討し,品質劣化サンプルの勾配に基づく分析を行った。品質不均衡なデータは、適切なタイミングで黄金のデータと同様の勾配を生成することができる。これに基づいて、トレーニングプロセスにおけるタイムステップ選択という新しい概念を紹介します。本稿では,データサンプリング分布をモデル学習プロセスに適合させるために,TQD(Timestep-aware Quality Decoupling)を提案する。特定の種類のデータに対して、サンプリング分布はモーションリッチなデータに対してより高い時間ステップに向けてスキューされ、高い視覚的品質データは低い時間ステップでサンプリングされる可能性が高い。広範にわたる実験により、TQDは、分離された不均衡なデータのみをトレーニングすることで、より優れたデータによる従来のトレーニングを上回るパフォーマンスを実現し、ビデオ生成における完全データの必要性を克服できることを実証した。さらに,本手法は,高品質なデータでトレーニングされた場合のモデル性能を向上し,異なるデータシナリオ間での有効性を示す。

論文の概要: Beyond the Golden Data: Resolving the Motion-Vision Quality Dilemma via Timestep Selective Training

関連論文リスト