Fugu-MT 論文翻訳(概要): PyramidalWan: On Making Pretrained Video Model Pyramidal for Efficient Inference

論文の概要: PyramidalWan: On Making Pretrained Video Model Pyramidal for Efficient Inference

arxiv url: http://arxiv.org/abs/2601.04792v1
Date: Thu, 08 Jan 2026 10:16:06 GMT
ステータス: 翻訳完了
システム内更新日: 2026-01-09 17:01:53.154113
Title: PyramidalWan: On Making Pretrained Video Model Pyramidal for Efficient Inference
Title（参考訳）: PyramidalWan: 効果的な推論のための事前訓練ビデオモデルピラミッド作成について
Authors: Denis Korzhenkov, Adil Karjauv, Animesh Karnewar, Mohsen Ghafoorian, Amirhossein Habibian,
Abstract要約: 本稿では,事前学習した拡散モデルを低コストな微調整によりピラミッド型に変換するパイプラインを提案する。本研究では, ピラミッドモデルにおける段階蒸留の各種戦略について検討, 比較を行い, 推論効率をさらに高めることを目的とした。
参考スコア（独自算出の注目度）: 16.7959283896177
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recently proposed pyramidal models decompose the conventional forward and backward diffusion processes into multiple stages operating at varying resolutions. These models handle inputs with higher noise levels at lower resolutions, while less noisy inputs are processed at higher resolutions. This hierarchical approach significantly reduces the computational cost of inference in multi-step denoising models. However, existing open-source pyramidal video models have been trained from scratch and tend to underperform compared to state-of-the-art systems in terms of visual plausibility. In this work, we present a pipeline that converts a pretrained diffusion model into a pyramidal one through low-cost finetuning, achieving this transformation without degradation in quality of output videos. Furthermore, we investigate and compare various strategies for step distillation within pyramidal models, aiming to further enhance the inference efficiency. Our results are available at https://qualcomm-ai-research.github.io/PyramidalWan.
Abstract（参考訳）: 最近提案されたピラミッドモデルでは、従来の前方および後方拡散過程を様々な解像度で動作させる複数の段階に分解する。これらのモデルは低分解能で高ノイズレベルの入力を処理するが、低ノイズの入力は高分解能で処理される。この階層的なアプローチは、多段階のデノナイジングモデルにおける推論の計算コストを大幅に削減する。しかし、既存のオープンソースのピラミッドビデオモデルは、スクラッチから訓練されており、視覚的可視性の観点からは最先端のシステムに比べて性能が劣る傾向にある。本研究では,事前学習した拡散モデルを低コストな微調整によりピラミッド状に変換し,出力ビデオの品質を劣化させることなくこの変換を実現するパイプラインを提案する。さらに,ピラミッドモデルにおける段階蒸留の様々な戦略について検討・比較し,推論効率をさらに高めることを目的とした。私たちの結果はhttps://qualcomm-ai-research.github.io/PyramidalWan.orgで公開されています。

論文の概要: PyramidalWan: On Making Pretrained Video Model Pyramidal for Efficient Inference

関連論文リスト