Fugu-MT 論文翻訳(概要): Training-free Motion Factorization for Compositional Video Generation

論文の概要: Training-free Motion Factorization for Compositional Video Generation

arxiv url: http://arxiv.org/abs/2603.09104v1
Date: Tue, 10 Mar 2026 02:27:48 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-11 15:25:23.951786
Title: Training-free Motion Factorization for Compositional Video Generation
Title（参考訳）: 合成ビデオ生成のための学習不要な運動因子化
Authors: Zixuan Wang, Ziqin Zhou, Feng Chen, Duo Peng, Yixin Hu, Changsheng Li, Yinjie Lei,
Abstract要約: 複雑な動きを3つの主要カテゴリに分解する運動因子化フレームワークを提案する。本フレームワークは,実世界のベンチマークにおいて,動作合成における印象的な性能を実現する。
参考スコア（独自算出の注目度）: 57.819757612370374
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Compositional video generation aims to synthesize multiple instances with diverse appearance and motion, which is widely applicable in real-world scenarios. However, current approaches mainly focus on binding semantics, neglecting to understand diverse motion categories specified in prompts. In this paper, we propose a motion factorization framework that decomposes complex motion into three primary categories: motionlessness, rigid motion, and non-rigid motion. Specifically, our framework follows a planning before generation paradigm. (1) During planning, we reason about motion laws on the motion graph to obtain frame-wise changes in the shape and position of each instance. This alleviates semantic ambiguities in the user prompt by organizing it into a structured representation of instances and their interactions. (2) During generation, we modulate the synthesis of distinct motion categories in a disentangled manner. Conditioned on the motion cues, guidance branches stabilize appearance in motionless regions, preserve rigid-body geometry, and regularize local non-rigid deformations. Crucially, our two modules are model-agnostic, which can be seamlessly incorporated into various diffusion model architectures. Extensive experiments demonstrate that our framework achieves impressive performance in motion synthesis on real-world benchmarks. Our code will be released soon.
Abstract（参考訳）: 合成ビデオ生成は、実世界のシナリオに広く適用可能な、多様な外観と動きを持つ複数のインスタンスを合成することを目的としている。しかし、現在のアプローチは主にバインディングのセマンティクスに焦点を当てており、プロンプトで指定された多様な動作カテゴリを理解することを無視している。本稿では、複雑な動きを3つの主要なカテゴリに分解する動き分解フレームワークを提案する。具体的には、我々のフレームワークは、生成前の計画に従う。 1) 計画中, 動きグラフ上の運動法則を推論し, 各インスタンスの形状と位置のフレームワイズ変化を求める。これにより、インスタンスとそのインタラクションの構造化された表現にまとめることで、ユーザプロンプトのセマンティックな曖昧さを軽減することができる。 2) 生成過程において, 異なる動作カテゴリの合成を不整合な方法で変調する。動作キューを条件に、誘導枝は動きのない領域の外観を安定させ、剛体形状を保ち、局所的な非剛体変形を規則化する。重要な点として、我々の2つのモジュールはモデルに依存しないため、様々な拡散モデルアーキテクチャにシームレスに組み込むことができる。大規模な実験により,我々のフレームワークは実世界のベンチマーク上での動作合成において顕著な性能を発揮することが示された。私たちのコードはまもなくリリースされるでしょう。

論文の概要: Training-free Motion Factorization for Compositional Video Generation

関連論文リスト