Fugu-MT 論文翻訳(概要): PARE: Pruning and Adaptive Routing for Efficient Video Generation

論文の概要: PARE: Pruning and Adaptive Routing for Efficient Video Generation

arxiv url: http://arxiv.org/abs/2605.27336v1
Date: Tue, 26 May 2026 17:43:25 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-27 17:51:42.573409
Title: PARE: Pruning and Adaptive Routing for Efficient Video Generation
Title（参考訳）: PARE:効率的なビデオ生成のためのプルーニングとアダプティブルーティング
Authors: Yutong Wang, Yunke Wang, Tianfan Xue, Yu Qiao, Yaohui Wang, Xinyuan Chen, Chang Xu,
Abstract要約: ビデオ拡散変換器(DiT)は高品質なビデオを生成するが、広いブロック、深いアーキテクチャ、反復的なサンプリングのためにかなりの計算を必要とする。最近の手法では、幅、深さ、サンプリングのステップを圧縮することでコストを削減するが、通常は個々の入力に適応できない固定されたアーキテクチャにコミットする。本稿では,構造対応プルーニングと入力適応ルーティングを併用して,幅と深さを共同で圧縮するPAREを提案する。
参考スコア（独自算出の注目度）: 71.54959622788608
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Video Diffusion Transformers (DiTs) generate high-quality videos but demand substantial compute due to wide blocks, deep architectures, and iterative sampling. Recent methods reduce cost by compressing width, depth, or sampling steps, but typically commit to a fixed architecture that cannot adapt to individual inputs or denoising stages. We propose PARE (Pruning and Adaptive Routing for Efficient video generation), which jointly compresses width and depth with structure-aware pruning and input-adaptive routing. For width, we observe that attention heads specialize into spatial and temporal roles, and design importance scoring that accounts for this distinction to prevent motion-critical temporal heads from being pruned prematurely. For depth, we train a lightweight router conditioned on denoising timestep and visual content to dynamically select which blocks to execute at each step, enabling per-input compute adaptation rather than static block removal. A progressive pipeline first recovers width-pruned quality via distillation, then jointly optimizes the student and router to decouple the two learning objectives. Experiments on Wan2.1-14B for both image-to-video and text-to-video generation show that PARE substantially reduces per-step computation while preserving quality across VBench dimensions, and composes with step distillation for further acceleration.
Abstract（参考訳）: ビデオ拡散変換器(DiT)は高品質なビデオを生成するが、広いブロック、深いアーキテクチャ、反復的なサンプリングのためにかなりの計算を必要とする。最近の手法では、幅、深さ、サンプリングのステップを圧縮することでコストを削減するが、通常は個々の入力に適応できない固定されたアーキテクチャにコミットする。提案するPARE(Pruning and Adaptive Routing for Efficient Video Generation)は,構造対応プルーニングと入力適応ルーティングを併用して,幅と深さを圧縮する。空間的, 時間的役割に特化して注目の頭部を観察し, 動作クリティカルな側頭部が早期に刈り取られるのを防止するために, この区別を考慮に入れた設計上の重要度を評価する。本研究では,各ステップで実行すべきブロックを動的に選択し,静的なブロック除去ではなく,インプット毎の計算適応を可能にする軽量ルータを訓練する。プログレッシブパイプラインは、まず蒸留により幅の幅を割った品質を回復し、次いで学生とルータを共同で最適化し、2つの学習目標を分離する。 Wan2.1-14Bによる画像・ビデオ・テキスト・ビデオ生成実験により、PAREはVBench次元にわたる品質を維持しながらステップごとの計算を大幅に削減し、さらに加速するためにステップ蒸留と合成することを示した。

論文の概要: PARE: Pruning and Adaptive Routing for Efficient Video Generation

関連論文リスト