Fugu-MT 論文翻訳(概要): Video4DGen: Enhancing Video and 4D Generation through Mutual Optimization

論文の概要: Video4DGen: Enhancing Video and 4D Generation through Mutual Optimization

arxiv url: http://arxiv.org/abs/2504.04153v1
Date: Sat, 05 Apr 2025 12:13:05 GMT
ステータス: 翻訳完了
システム内更新日: 2025-04-16 05:38:08.73973
Title: Video4DGen: Enhancing Video and 4D Generation through Mutual Optimization
Title（参考訳）: Video4DGen: 相互最適化によるビデオと4D生成の強化
Authors: Yikai Wang, Guangce Liu, Xinzhou Wang, Zilong Chen, Jiafang Li, Xin Liang, Fuchun Sun, Jun Zhu,
Abstract要約: Video4DGenは、単一または複数の生成されたビデオから4D表現を生成するのに優れている新しいフレームワークである。 Video4DGenは、仮想現実、アニメーションなどにおけるアプリケーションのための強力なツールを提供する。
参考スコア（独自算出の注目度）: 31.956858341885436
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The advancement of 4D (i.e., sequential 3D) generation opens up new possibilities for lifelike experiences in various applications, where users can explore dynamic objects or characters from any viewpoint. Meanwhile, video generative models are receiving particular attention given their ability to produce realistic and imaginative frames. These models are also observed to exhibit strong 3D consistency, indicating the potential to act as world simulators. In this work, we present Video4DGen, a novel framework that excels in generating 4D representations from single or multiple generated videos as well as generating 4D-guided videos. This framework is pivotal for creating high-fidelity virtual contents that maintain both spatial and temporal coherence. The 4D outputs generated by Video4DGen are represented using our proposed Dynamic Gaussian Surfels (DGS), which optimizes time-varying warping functions to transform Gaussian surfels (surface elements) from a static state to a dynamically warped state. We design warped-state geometric regularization and refinements on Gaussian surfels, to preserve the structural integrity and fine-grained appearance details. To perform 4D generation from multiple videos and capture representation across spatial, temporal, and pose dimensions, we design multi-video alignment, root pose optimization, and pose-guided frame sampling strategies. The leveraging of continuous warping fields also enables a precise depiction of pose, motion, and deformation over per-video frames. Further, to improve the overall fidelity from the observation of all camera poses, Video4DGen performs novel-view video generation guided by the 4D content, with the proposed confidence-filtered DGS to enhance the quality of generated sequences. With the ability of 4D and video generation, Video4DGen offers a powerful tool for applications in virtual reality, animation, and beyond.
Abstract（参考訳）: 4D(シーケンシャル3D)生成の進歩は、ユーザがあらゆる視点から動的オブジェクトや文字を探索できる様々なアプリケーションにおいて、ライフライクな体験の新たな可能性を開く。一方、ビデオ生成モデルは、現実的で想像力のあるフレームを生成する能力から、特に注目を集めている。これらのモデルはまた、強力な3D整合性を示し、世界シミュレーターとして機能する可能性を示している。本研究では,単一または複数生成ビデオから4D表現を生成するのに優れる新しいフレームワークであるVideo4DGenを紹介する。この枠組みは空間的コヒーレンスと時間的コヒーレンスの両方を維持する高忠実度仮想コンテンツを作成する上で重要である。 The 4D outputs generated by Video4DGen is presented by our proposed Dynamic Gaussian Surfels (DGS) which is presented to improve time-variant warping function to transform Gaussian surfels ( surface element) from a static state to a dynamic warped state。我々は、構造的整合性ときめ細かな外観の詳細を維持するために、ガウス波上における歪状態の幾何正則化と微細化を設計する。複数のビデオから4D生成を行い、空間的・時間的・ポーズ次元の表現をキャプチャするために、複数ビデオアライメント、ルートポーズ最適化、ポーズ誘導フレームサンプリング戦略を設計する。連続的なワープフィールドの活用はまた、ビデオフレームごとのポーズ、動き、変形の正確な描写を可能にする。さらに、全カメラポーズの観察から全体の忠実度を向上させるために、Video4DGenは、4Dコンテンツでガイドされた新規ビュー映像生成を行い、提案した信頼度フィルタDGSを用いて生成シーケンスの品質を向上させる。 4Dとビデオ生成の能力により、Video4DGenはバーチャルリアリティー、アニメーションなどのアプリケーションに強力なツールを提供する。

論文の概要: Video4DGen: Enhancing Video and 4D Generation through Mutual Optimization

関連論文リスト