Fugu-MT 論文翻訳(概要): Less is Enough: Training-Free Video Diffusion Acceleration via Runtime-Adaptive Caching

論文の概要: Less is Enough: Training-Free Video Diffusion Acceleration via Runtime-Adaptive Caching

arxiv url: http://arxiv.org/abs/2507.02860v1
Date: Thu, 03 Jul 2025 17:59:54 GMT
ステータス: 翻訳完了
システム内更新日: 2025-07-04 15:37:16.879232
Title: Less is Enough: Training-Free Video Diffusion Acceleration via Runtime-Adaptive Caching
Title（参考訳）: 十分でない:実行時適応キャッシングによるトレーニング不要なビデオ拡散加速
Authors: Xin Zhou, Dingkang Liang, Kaijin Chen, Tianrui Feng, Xiwu Chen, Hongkai Lin, Yikang Ding, Feiyang Tan, Hengshuang Zhao, Xiang Bai,
Abstract要約: EasyCacheは、ビデオ拡散モデルのためのトレーニング不要のアクセラレーションフレームワークである。我々は,OpenSora,Wan2.1,HunyuanVideoなどの大規模ビデオ生成モデルについて包括的な研究を行っている。提案手法は,従来のベースラインと比較して推定時間を最大2.1-3.3$times$に短縮する。
参考スコア（独自算出の注目度）: 57.7533917467934
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Video generation models have demonstrated remarkable performance, yet their broader adoption remains constrained by slow inference speeds and substantial computational costs, primarily due to the iterative nature of the denoising process. Addressing this bottleneck is essential for democratizing advanced video synthesis technologies and enabling their integration into real-world applications. This work proposes EasyCache, a training-free acceleration framework for video diffusion models. EasyCache introduces a lightweight, runtime-adaptive caching mechanism that dynamically reuses previously computed transformation vectors, avoiding redundant computations during inference. Unlike prior approaches, EasyCache requires no offline profiling, pre-computation, or extensive parameter tuning. We conduct comprehensive studies on various large-scale video generation models, including OpenSora, Wan2.1, and HunyuanVideo. Our method achieves leading acceleration performance, reducing inference time by up to 2.1-3.3$\times$ compared to the original baselines while maintaining high visual fidelity with a significant up to 36% PSNR improvement compared to the previous SOTA method. This improvement makes our EasyCache a efficient and highly accessible solution for high-quality video generation in both research and practical applications. The code is available at https://github.com/H-EmbodVis/EasyCache.
Abstract（参考訳）: ビデオ生成モデルは目覚ましい性能を示しているが、その広範な採用は推論速度の遅さと計算コストに制約されている。このボトルネックに対処することは、高度なビデオ合成技術を民主化し、現実世界のアプリケーションへの統合を可能にするために不可欠である。ビデオ拡散モデルのためのトレーニング不要なアクセラレーションフレームワークであるEasyCacheを提案する。 EasyCacheは、以前計算された変換ベクタを動的に再利用し、推論中に冗長な計算を避ける、軽量でランタイム対応のキャッシュメカニズムを導入している。従来のアプローチとは異なり、EasyCacheはオフラインプロファイリング、プリ計算、広範囲なパラメータチューニングを必要としない。我々は,OpenSora,Wan2.1,HunyuanVideoなどの大規模ビデオ生成モデルについて包括的な研究を行っている。提案手法は,従来のSOTA法に比べて最大36%のPSNR改善を達成し,高い視力を維持しつつ,元のベースラインと比較して推定時間を最大2.1-3.3$\times$に短縮する。この改良により、EasyCacheは、研究と実践の両方のアプリケーションにおいて、高品質なビデオ生成のための効率的かつ高可用性なソリューションになります。コードはhttps://github.com/H-EmbodVis/EasyCacheで入手できる。

論文の概要: Less is Enough: Training-Free Video Diffusion Acceleration via Runtime-Adaptive Caching

関連論文リスト