3D-Aware Video Generation
- URL: http://arxiv.org/abs/2206.14797v4
- Date: Wed, 9 Aug 2023 07:34:03 GMT
- Title: 3D-Aware Video Generation
- Authors: Sherwin Bahmani, Jeong Joon Park, Despoina Paschalidou, Hao Tang,
Gordon Wetzstein, Leonidas Guibas, Luc Van Gool, Radu Timofte
- Abstract summary: We explore 4D generative adversarial networks (GANs) that learn generation of 3D-aware videos.
By combining neural implicit representations with time-aware discriminator, we develop a GAN framework that synthesizes 3D video supervised only with monocular videos.
- Score: 149.5230191060692
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Generative models have emerged as an essential building block for many image
synthesis and editing tasks. Recent advances in this field have also enabled
high-quality 3D or video content to be generated that exhibits either
multi-view or temporal consistency. With our work, we explore 4D generative
adversarial networks (GANs) that learn unconditional generation of 3D-aware
videos. By combining neural implicit representations with time-aware
discriminator, we develop a GAN framework that synthesizes 3D video supervised
only with monocular videos. We show that our method learns a rich embedding of
decomposable 3D structures and motions that enables new visual effects of
spatio-temporal renderings while producing imagery with quality comparable to
that of existing 3D or video GANs.
Related papers
- Sharp-It: A Multi-view to Multi-view Diffusion Model for 3D Synthesis and Manipulation [15.215597253086612]
We bridge the quality gap between methods that directly generate 3D representations and ones that reconstruct 3D objects from multi-view images.
We introduce a multi-view to multi-view diffusion model called Sharp-It, which takes a 3D consistent set of multi-view images.
We demonstrate that Sharp-It enables various 3D applications, such as fast synthesis, editing, and controlled generation, while attaining high-quality assets.
arXiv Detail & Related papers (2024-12-03T17:58:07Z) - Hi3D: Pursuing High-Resolution Image-to-3D Generation with Video Diffusion Models [112.2625368640425]
High-resolution Image-to-3D model (Hi3D) is a new video diffusion based paradigm that redefines a single image to multi-view images as 3D-aware sequential image generation.
Hi3D first empowers the pre-trained video diffusion model with 3D-aware prior, yielding multi-view images with low-resolution texture details.
arXiv Detail & Related papers (2024-09-11T17:58:57Z) - ViewCrafter: Taming Video Diffusion Models for High-fidelity Novel View Synthesis [63.169364481672915]
We propose textbfViewCrafter, a novel method for synthesizing high-fidelity novel views of generic scenes from single or sparse images.
Our method takes advantage of the powerful generation capabilities of video diffusion model and the coarse 3D clues offered by point-based representation to generate high-quality video frames.
arXiv Detail & Related papers (2024-09-03T16:53:19Z) - CC3D: Layout-Conditioned Generation of Compositional 3D Scenes [49.281006972028194]
We introduce CC3D, a conditional generative model that synthesizes complex 3D scenes conditioned on 2D semantic scene layouts.
Our evaluations on synthetic 3D-FRONT and real-world KITTI-360 datasets demonstrate that our model generates scenes of improved visual and geometric quality.
arXiv Detail & Related papers (2023-03-21T17:59:02Z) - PV3D: A 3D Generative Model for Portrait Video Generation [94.96025739097922]
We propose PV3D, the first generative framework that can synthesize multi-view consistent portrait videos.
PV3D is able to support many downstream applications such as animating static portraits and view-consistent video motion editing.
arXiv Detail & Related papers (2022-12-13T05:42:44Z) - Efficient Geometry-aware 3D Generative Adversarial Networks [50.68436093869381]
Existing 3D GANs are either compute-intensive or make approximations that are not 3D-consistent.
In this work, we improve the computational efficiency and image quality of 3D GANs without overly relying on these approximations.
We introduce an expressive hybrid explicit-implicit network architecture that synthesizes not only high-resolution multi-view-consistent images in real time but also produces high-quality 3D geometry.
arXiv Detail & Related papers (2021-12-15T08:01:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.