3D-Aware Video Generation
- URL: http://arxiv.org/abs/2206.14797v4
- Date: Wed, 9 Aug 2023 07:34:03 GMT
- Title: 3D-Aware Video Generation
- Authors: Sherwin Bahmani, Jeong Joon Park, Despoina Paschalidou, Hao Tang,
Gordon Wetzstein, Leonidas Guibas, Luc Van Gool, Radu Timofte
- Abstract summary: We explore 4D generative adversarial networks (GANs) that learn generation of 3D-aware videos.
By combining neural implicit representations with time-aware discriminator, we develop a GAN framework that synthesizes 3D video supervised only with monocular videos.
- Score: 149.5230191060692
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Generative models have emerged as an essential building block for many image
synthesis and editing tasks. Recent advances in this field have also enabled
high-quality 3D or video content to be generated that exhibits either
multi-view or temporal consistency. With our work, we explore 4D generative
adversarial networks (GANs) that learn unconditional generation of 3D-aware
videos. By combining neural implicit representations with time-aware
discriminator, we develop a GAN framework that synthesizes 3D video supervised
only with monocular videos. We show that our method learns a rich embedding of
decomposable 3D structures and motions that enables new visual effects of
spatio-temporal renderings while producing imagery with quality comparable to
that of existing 3D or video GANs.
Related papers
- Hi3D: Pursuing High-Resolution Image-to-3D Generation with Video Diffusion Models [112.2625368640425]
High-resolution Image-to-3D model (Hi3D) is a new video diffusion based paradigm that redefines a single image to multi-view images as 3D-aware sequential image generation.
Hi3D first empowers the pre-trained video diffusion model with 3D-aware prior, yielding multi-view images with low-resolution texture details.
arXiv Detail & Related papers (2024-09-11T17:58:57Z) - ViewCrafter: Taming Video Diffusion Models for High-fidelity Novel View Synthesis [63.169364481672915]
We propose textbfViewCrafter, a novel method for synthesizing high-fidelity novel views of generic scenes from single or sparse images.
Our method takes advantage of the powerful generation capabilities of video diffusion model and the coarse 3D clues offered by point-based representation to generate high-quality video frames.
arXiv Detail & Related papers (2024-09-03T16:53:19Z) - Vid3D: Synthesis of Dynamic 3D Scenes using 2D Video Diffusion [3.545941891218148]
We investigate whether it is necessary to explicitly enforce multiview consistency over time, as current approaches do, or if it is sufficient for a model to generate 3D representations of each timestep independently.
We propose a model, Vid3D, that leverages 2D video diffusion to generate 3D videos by first generating a 2D "seed" of the video's temporal dynamics and then independently generating a 3D representation for each timestep in the seed video.
arXiv Detail & Related papers (2024-06-17T04:09:04Z) - CC3D: Layout-Conditioned Generation of Compositional 3D Scenes [49.281006972028194]
We introduce CC3D, a conditional generative model that synthesizes complex 3D scenes conditioned on 2D semantic scene layouts.
Our evaluations on synthetic 3D-FRONT and real-world KITTI-360 datasets demonstrate that our model generates scenes of improved visual and geometric quality.
arXiv Detail & Related papers (2023-03-21T17:59:02Z) - PV3D: A 3D Generative Model for Portrait Video Generation [94.96025739097922]
We propose PV3D, the first generative framework that can synthesize multi-view consistent portrait videos.
PV3D is able to support many downstream applications such as animating static portraits and view-consistent video motion editing.
arXiv Detail & Related papers (2022-12-13T05:42:44Z) - Efficient Geometry-aware 3D Generative Adversarial Networks [50.68436093869381]
Existing 3D GANs are either compute-intensive or make approximations that are not 3D-consistent.
In this work, we improve the computational efficiency and image quality of 3D GANs without overly relying on these approximations.
We introduce an expressive hybrid explicit-implicit network architecture that synthesizes not only high-resolution multi-view-consistent images in real time but also produces high-quality 3D geometry.
arXiv Detail & Related papers (2021-12-15T08:01:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.