Fugu-MT 論文翻訳(概要): Image-to-Video Diffusion: From Foundations to Open Frontiers

論文の概要: Image-to-Video Diffusion: From Foundations to Open Frontiers

arxiv url: http://arxiv.org/abs/2605.17248v1
Date: Sun, 17 May 2026 04:10:55 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-19 17:57:47.805084
Title: Image-to-Video Diffusion: From Foundations to Open Frontiers
Title（参考訳）: 画像とビデオの拡散: 基礎からオープンフロンティアへ
Authors: Xianlong Wang, Wenbo Pan, Shijia Zhou, Ke Li, Yuqi Wang, Zeyu Ye, Hangtao Zhang, Leo Yu Zhang, Xiaohua Jia,
Abstract要約: I2V(Diffusion-based textitimage-to-video)生成は、生成モデルにおいて中心的な方向性となっている。本研究は、拡散I2V生成を独立した対象として扱う。タスクの定式化、モデルアーキテクチャ、データセット、評価メトリクスをまずレビューし、アーキテクチャとトレーニングパラダイムに基づいた分類によって既存のメソッドを編成する。
参考スコア（独自算出の注目度）: 39.6216019326071
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Diffusion-based \textit{image-to-video} (I2V) generation has become a central direction in generative models by turning a reference image, with optional conditions, into a temporally coherent video. Compared with broader video generation settings, this task places stricter demands on content consistency, identity preservation, and motion coherence. Although the literature grows rapidly, existing works mostly discuss I2V generation within broader topics and still lack a dedicated taxonomy together with a systematic analysis centered on this field. This work addresses that gap by treating diffusion I2V generation as a standalone subject. It first reviews the task formulation, model architectures, datasets, and evaluation metrics, and then organizes existing methods through a taxonomy based on architecture and training paradigm. It further distills four core designs, namely condition encoding, temporal modeling, noise prior design, and spatial-temporal upsampling, and discusses representative application scenarios together with major open challenges.
Abstract（参考訳）: I2V(Diffusion-based \textit{image-to-video})の生成は、参照画像にオプション条件を伴って時間的コヒーレントなビデオに変換することで、生成モデルの中心的な方向となっている。より広範なビデオ生成設定と比較して、このタスクはコンテンツ一貫性、アイデンティティ保存、モーションコヒーレンスに対する要求を厳格に設定する。文献は急速に成長するが、既存の研究はほとんどがより広範なトピックにおけるI2V生成について論じており、この分野を中心とした体系的な分析とともに専門の分類学がまだ欠落している。この研究は、拡散I2V生成を独立した対象として扱うことにより、そのギャップに対処する。タスクの定式化、モデルアーキテクチャ、データセット、評価メトリクスをまずレビューし、アーキテクチャとトレーニングパラダイムに基づいた分類によって既存のメソッドを編成する。さらに、条件符号化、時間的モデリング、ノイズ事前設計、空間的時間的アップサンプリングという4つのコア設計を精査し、主要なオープン課題と共に代表的アプリケーションシナリオについて議論する。

論文の概要: Image-to-Video Diffusion: From Foundations to Open Frontiers

関連論文リスト