Fugu-MT 論文翻訳(概要): Layer-Aware Video Composition via Split-then-Merge

論文の概要: Layer-Aware Video Composition via Split-then-Merge

arxiv url: http://arxiv.org/abs/2511.20809v1
Date: Tue, 25 Nov 2025 19:53:15 GMT
ステータス: 翻訳完了
システム内更新日: 2025-11-27 18:37:58.83775
Title: Layer-Aware Video Composition via Split-then-Merge
Title（参考訳）: Split-then-Mergeによる層認識ビデオ合成
Authors: Ozgur Kara, Yujia Chen, Ming-Hsuan Yang, James M. Rehg, Wen-Sheng Chu, Du Tran,
Abstract要約: Split-then-Merge (StM) は、生成ビデオ合成の制御を強化するために設計されたフレームワークである。 StMは、ラベルなしのビデオの大規模なコーパスを、ダイナミックな前景と背景の層に分割し、それらを自己構成して、ダイナミックな被写体が多様なシーンとどのように相互作用するかを学習する。
参考スコア（独自算出の注目度）: 55.12521724893102
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: We present Split-then-Merge (StM), a novel framework designed to enhance control in generative video composition and address its data scarcity problem. Unlike conventional methods relying on annotated datasets or handcrafted rules, StM splits a large corpus of unlabeled videos into dynamic foreground and background layers, then self-composes them to learn how dynamic subjects interact with diverse scenes. This process enables the model to learn the complex compositional dynamics required for realistic video generation. StM introduces a novel transformation-aware training pipeline that utilizes a multi-layer fusion and augmentation to achieve affordance-aware composition, alongside an identity-preservation loss that maintains foreground fidelity during blending. Experiments show StM outperforms SoTA methods in both quantitative benchmarks and in humans/VLLM-based qualitative evaluations. More details are available at our project page: https://split-then-merge.github.io
Abstract（参考訳）: スプリット・テン・マージ(StM)は、生成ビデオ合成の制御を強化し、そのデータ不足問題に対処するために設計された新しいフレームワークである。注釈付きデータセットや手作りルールを頼りにする従来の方法とは異なり、StMはラベルなしビデオの大きなコーパスを動的フォアグラウンドと背景層に分割し、動的被写体が多様なシーンとどのように相互作用するかを自己構成する。このプロセスにより、モデルはリアルなビデオ生成に必要な複雑な構成力学を学習することができる。 StMは、多層核融合と拡張を利用した新しいトランスフォーメーション・アウェア・トレーニング・パイプラインを導入し、ブレンディング中に前景の忠実さを維持するアイデンティティ保存損失と並行して、アベイランス・アウェア・コンポジションを実現する。 StMは定量的ベンチマークと人間/VLLMに基づく定性評価の両方において,SoTA法よりも優れていた。詳細はプロジェクトのページで確認できます。

論文の概要: Layer-Aware Video Composition via Split-then-Merge

関連論文リスト