Fugu-MT 論文翻訳(概要): OmniCamera: A Unified Framework for Multi-task Video Generation with Arbitrary Camera Control

論文の概要: OmniCamera: A Unified Framework for Multi-task Video Generation with Arbitrary Camera Control

arxiv url: http://arxiv.org/abs/2604.06010v1
Date: Tue, 07 Apr 2026 16:06:02 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-08 17:42:09.915165
Title: OmniCamera: A Unified Framework for Multi-task Video Generation with Arbitrary Camera Control
Title（参考訳）: OmniCamera: 任意カメラ制御によるマルチタスクビデオ生成のための統一フレームワーク
Authors: Yukun Wang, Ruihuang Li, Jiale Tao, Shiyuan Yang, Liyi Chen, Zhantao Yang, Handz, Yulan Guo, Shuai Shao, Qinglin Lu,
Abstract要約: ビデオは基本的に、シーンのダイナミックな内容と、観察されるカメラの動きの2つの重要な軸に絡み合っている。既存の世代モデルは、しばしばこれらの要因を絡み合わせ、独立した制御を制限する。 OmniCameraは、これらの2つの次元を明示的に切り離し、コマンドするように設計された統一されたフレームワークである。
参考スコア（独自算出の注目度）: 49.41924736941193
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Video fundamentally intertwines two crucial axes: the dynamic content of a scene and the camera motion through which it is observed. However, existing generation models often entangle these factors, limiting independent control. In this work, we introduce OmniCamera, a unified framework designed to explicitly disentangle and command these two dimensions. This compositional approach enables flexible video generation by allowing arbitrary pairings of camera and content conditions, unlocking unprecedented creative control. To overcome the fundamental challenges of modality conflict and data scarcity inherent in such a system, we present two key innovations. First, we construct OmniCAM, a novel hybrid dataset combining curated real-world videos with synthetic data that provides diverse paired examples for robust multi-task learning. Second, we propose a Dual-level Curriculum Co-Training strategy that mitigates modality interference and synergistically learns from diverse data sources. This strategy operates on two levels: first, it progressively introduces control modalities by difficulties (condition-level), and second, trains for precise control on synthetic data before adapting to real data for photorealism (data-level). As a result, OmniCamera achieves state-of-the-art performance, enabling flexible control for complex camera movements while maintaining superior visual quality.
Abstract（参考訳）: ビデオは基本的に、シーンのダイナミックな内容と、観察されるカメラの動きの2つの重要な軸に絡み合っている。しかし、既存の世代モデルは、しばしばこれらの要因を絡み合わせ、独立した制御を制限する。本研究では,これらの2次元を明示的に切り離し,命令するために設計された統合フレームワークであるOmniCameraを紹介する。この構成的アプローチは、カメラとコンテンツ条件の任意のペアリングを可能にし、前例のない創造的制御をアンロックすることで、フレキシブルなビデオ生成を可能にする。このようなシステムに固有のモダリティ・コンフリクトとデータ不足という根本的な課題を克服するために、我々は2つの重要なイノベーションを提示する。まず、実世界のキュレートされたビデオと合成データを組み合わせた新しいハイブリッドデータセットであるOmniCAMを構築し、ロバストなマルチタスク学習のための多様なペア例を提供する。第2に、モーダリティ干渉を緩和し、多様なデータソースから相乗的に学習するデュアルレベルのカリキュラム共同学習戦略を提案する。この戦略は、第1に、難易度(条件レベル)による制御モダリティを段階的に導入し、第2に、フォトリアリズム(データレベル)のための実際のデータに適応する前に、合成データを正確に制御するための訓練を行う。その結果、OmniCameraは最先端のパフォーマンスを実現し、より優れた視覚的品質を維持しつつ、複雑なカメラの動きの柔軟な制御を可能にした。

論文の概要: OmniCamera: A Unified Framework for Multi-task Video Generation with Arbitrary Camera Control

関連論文リスト