Fugu-MT 論文翻訳(概要): TriMotion: Modality-Agnostic Camera Control for Video Generation

論文の概要: TriMotion: Modality-Agnostic Camera Control for Video Generation

arxiv url: http://arxiv.org/abs/2606.20774v1
Date: Thu, 18 Jun 2026 16:07:05 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-26 12:59:12.816811
Title: TriMotion: Modality-Agnostic Camera Control for Video Generation
Title（参考訳）: TriMotion:ビデオ生成のためのモダリティ非依存カメラ制御
Authors: Seunghyun Shin, Jifei Song, Wooseok Jeon, Hae-Gon Jeon, Jiankang Deng,
Abstract要約: TriMotionは、カメラ制御ビデオ生成のためのフレームワークで、ビデオ、ポーズ、テキスト入力をマッピングし、同じカメラ軌跡を共有モーション埋め込み空間に記述する。また,TriMotionは3つのモダリティ全てを対象とするカメラ軌跡を正確に追従する高品質なビデオを生成する。
参考スコア（独自算出の注目度）: 65.5913929253637
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Camera motion control is essential for directing viewpoint changes in generative systems. However, existing methods typically condition the generation process on a single specific modality, such as explicit pose trajectories or reference videos, limiting their ability to support heterogeneous user inputs. To address this limitation, we present TriMotion, a modality-agnostic framework for camera-controlled video generation that maps video, pose, and text inputs, describing the same camera trajectory into a shared motion embedding space. Learning such a space requires synchronized supervision across modalities. Therefore, we build the Motion Triplet Dataset by extending a Multi-Cam Video Dataset with geometry-grounded motion descriptions derived from camera extrinsics. We further introduce a latent motion consistency objective that leverages the motion embedding space to encourage the generated video to follow the target camera trajectory directly in latent space, avoiding the cost of pixel-space decoding. Extensive experiments show that TriMotion generates high-quality videos that accurately follow the target camera trajectories across all three modalities. Beyond standard generation, the shared motion embedding space also enables flexible applications such as sequential motion composition and cross-modal motion interpolation.
Abstract（参考訳）: カメラモーションコントロールは、生成系における視点変化の誘導に不可欠である。しかし、既存の手法では、露骨なポーズ軌跡や参照ビデオのような単一の特定のモードで生成プロセスを規定し、不均一なユーザ入力をサポートする能力を制限するのが一般的である。この制限に対処するため、TriMotionは、ビデオ、ポーズ、テキスト入力をマッピングし、同じカメラ軌跡を共有モーション埋め込み空間に記述する、カメラ制御ビデオ生成のためのモダリティに依存しないフレームワークである。そのような空間を学習するには、モダリティを越えて同期された監督が必要である。そこで,本研究では,カメラ外部からの映像記述を幾何学的に表現したマルチカメラ映像データセットを拡張して,モーショントリプレットデータセットを構築する。さらに、静止空間におけるカメラの軌跡を直接追従するように、動画の組込みスペースを活用し、画素空間デコーディングのコストを回避するために、潜時運動整合性目標を導入する。大規模な実験により、TriMotionは高品質なビデオを生成し、ターゲットカメラの軌跡を正確に3つのモードで追従することがわかった。標準生成以外にも、共有モーション埋め込み空間は、シーケンシャルモーション合成やクロスモーダルモーション補間といった柔軟な応用を可能にする。

論文の概要: TriMotion: Modality-Agnostic Camera Control for Video Generation

関連論文リスト