Fugu-MT 論文翻訳(概要): DynaVid: Learning to Generate Highly Dynamic Videos using Synthetic Motion Data

論文の概要: DynaVid: Learning to Generate Highly Dynamic Videos using Synthetic Motion Data

arxiv url: http://arxiv.org/abs/2604.01666v1
Date: Thu, 02 Apr 2026 06:12:38 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-03 14:21:10.473922
Title: DynaVid: Learning to Generate Highly Dynamic Videos using Synthetic Motion Data
Title（参考訳）: DynaVid: 合成モーションデータを用いた高ダイナミックビデオ生成学習
Authors: Wonjoon Jin, Jiyun Won, Janghyeok Han, Qi Dai, Chong Luo, Seung-Hwan Baek, Sunghyun Cho,
Abstract要約: DynaVidは、トレーニングで合成モーションデータを活用するビデオ合成フレームワークである。ダイナミックモーション生成とカメラモーション制御において,DynaVidはリアリズムと制御性を向上することを示す。
参考スコア（独自算出の注目度）: 51.316274891736164
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Despite recent progress, video diffusion models still struggle to synthesize realistic videos involving highly dynamic motions or requiring fine-grained motion controllability. A central limitation lies in the scarcity of such examples in commonly used training datasets. To address this, we introduce DynaVid, a video synthesis framework that leverages synthetic motion data in training, which is represented as optical flow and rendered using computer graphics pipelines. This approach offers two key advantages. First, synthetic motion offers diverse motion patterns and precise control signals that are difficult to obtain from real data. Second, unlike rendered videos with artificial appearances, rendered optical flow encodes only motion and is decoupled from appearance, thereby preventing models from reproducing the unnatural look of synthetic videos. Building on this idea, DynaVid adopts a two-stage generation framework: a motion generator first synthesizes motion, and then a motion-guided video generator produces video frames conditioned on that motion. This decoupled formulation enables the model to learn dynamic motion patterns from synthetic data while preserving visual realism from real-world videos. We validate our framework on two challenging scenarios, vigorous human motion generation and extreme camera motion control, where existing datasets are particularly limited. Extensive experiments demonstrate that DynaVid improves the realism and controllability in dynamic motion generation and camera motion control.
Abstract（参考訳）: 近年の進歩にもかかわらず、ビデオ拡散モデルは、非常にダイナミックな動きを含むリアルなビデオの合成や、きめ細かい動きの制御に苦慮している。中心的な制限は、一般的に使用されるトレーニングデータセットにおけるそのような例の不足にある。そこで我々はDynaVidを紹介した。DynaVidはトレーニングで合成動作データを活用するビデオ合成フレームワークで、光学フローとして表現され、コンピュータグラフィックスパイプラインを用いてレンダリングされる。このアプローチには2つの大きな利点があります。まず、合成運動は、実際のデータから得るのが難しい多様な動きパターンと正確な制御信号を提供する。第二に、人工的な外観を持つレンダリングビデオとは異なり、レンダリングされた光学フローは動きのみを符号化し、外観から切り離され、モデルが合成ビデオの不自然な外観を再現するのを防ぐ。モーションジェネレータは、まずモーションを合成し、次にモーション誘導ビデオジェネレータは、そのモーションに条件付けされたビデオフレームを生成する。この分離された定式化により、実世界のビデオから視覚的リアリズムを保ちながら、合成データから動的動きパターンを学習することができる。我々は、既存のデータセットが特に制限されている、活発な人間のモーション生成と極端なカメラモーション制御という、2つの挑戦的なシナリオで、我々のフレームワークを検証する。広汎な実験により、DynaVidはダイナミックモーション生成とカメラモーション制御におけるリアリズムと制御性を向上することが示された。

論文の概要: DynaVid: Learning to Generate Highly Dynamic Videos using Synthetic Motion Data

関連論文リスト