Fugu-MT 論文翻訳(概要): OmniWeaving: Towards Unified Video Generation with Free-form Composition and Reasoning

論文の概要: OmniWeaving: Towards Unified Video Generation with Free-form Composition and Reasoning

arxiv url: http://arxiv.org/abs/2603.24458v2
Date: Thu, 02 Apr 2026 15:17:15 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-03 14:21:08.974533
Title: OmniWeaving: Towards Unified Video Generation with Free-form Composition and Reasoning
Title（参考訳）: OmniWeaving:自由な構成と推論を備えた統一ビデオ生成を目指して
Authors: Kaihang Pan, Qi Tian, Jianwei Zhang, Weijie Kong, Jiangfeng Xiong, Yanxin Long, Shixue Zhang, Haiyi Qiu, Tan Wang, Zheqi Lv, Yue Wu, Liefeng Bo, Siliang Tang, Zhao Zhong,
Abstract要約: オムニウィービング(OmniWeaving)は、強力なマルチモーダル合成と推論インフォームド機能を備えたオムニレベルのビデオ生成モデルである。我々はIntelligentVBenchを紹介した。IntelligentVBenchは、次世代のインテリジェントな統合ビデオ生成を厳格に評価するために設計された、最初の包括的なベンチマークである。実験により、OmniWeavingはオープンソース統一モデル間でSoTAのパフォーマンスを達成することが示された。
参考スコア（独自算出の注目度）: 81.93748829204145
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: While proprietary systems such as Seedance-2.0 have achieved remarkable success in omni-capable video generation, open-source alternatives significantly lag behind. Most academic models remain heavily fragmented, and the few existing efforts toward unified video generation still struggle to seamlessly integrate diverse tasks within a single framework. To bridge this gap, we propose OmniWeaving, an omni-level video generation model featuring powerful multimodal composition and reasoning-informed capabilities. By leveraging a massive-scale pretraining dataset that encompasses diverse compositional and reasoning-augmented scenarios, OmniWeaving learns to temporally bind interleaved text, multi-image, and video inputs while acting as an intelligent agent to infer complex user intentions for sophisticated video creation. Furthermore, we introduce IntelligentVBench, the first comprehensive benchmark designed to rigorously assess next-level intelligent unified video generation. Extensive experiments demonstrate that OmniWeaving achieves SoTA performance among open-source unified models. The codes and model have already been publicly available. Project Page: https://omniweaving.github.io.
Abstract（参考訳）: Seedance-2.0のようなプロプライエタリなシステムは、Omni対応のビデオ生成において大きな成功を収めているが、オープンソースの代替手段は大幅に遅れている。ほとんどの学術モデルは断片化されており、既存の統合ビデオ生成への取り組みは、単一のフレームワークに多様なタスクをシームレスに統合するのに苦戦している。このギャップを埋めるため、我々は強力なマルチモーダル合成と推論インフォームド機能を備えたオールニレベルのビデオ生成モデルであるOmniWeavingを提案する。 OmniWeavingは、多種多様な構成および推論拡張シナリオを含む大規模な事前トレーニングデータセットを活用することで、インテリジェントエージェントとして機能しながら、インターリーブドテキスト、マルチイメージ、ビデオインプットを時間的にバインドすることを学び、高度なビデオ作成のために複雑なユーザの意図を推測する。さらにIntelligentVBenchは、次世代のインテリジェントな統合ビデオ生成を厳格に評価するために設計された、最初の総合的なベンチマークである。大規模な実験により、OmniWeavingはオープンソース統一モデル間でSoTAのパフォーマンスを達成している。コードとモデルは、すでに公開されている。 Project Page: https://omniweaving.github.io.com

論文の概要: OmniWeaving: Towards Unified Video Generation with Free-form Composition and Reasoning

関連論文リスト