Generative Rendering: Controllable 4D-Guided Video Generation with 2D
Diffusion Models
- URL: http://arxiv.org/abs/2312.01409v1
- Date: Sun, 3 Dec 2023 14:17:11 GMT
- Title: Generative Rendering: Controllable 4D-Guided Video Generation with 2D
Diffusion Models
- Authors: Shengqu Cai and Duygu Ceylan and Matheus Gadelha and Chun-Hao Paul
Huang and Tuanfeng Yang Wang and Gordon Wetzstein
- Abstract summary: We present a novel approach that combines the controllability of dynamic 3D meshes with the expressivity and editability of emerging diffusion models.
We demonstrate our approach on various examples where motion can be obtained by animating rigged assets or changing the camera path.
- Score: 40.71940056121056
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Traditional 3D content creation tools empower users to bring their
imagination to life by giving them direct control over a scene's geometry,
appearance, motion, and camera path. Creating computer-generated videos,
however, is a tedious manual process, which can be automated by emerging
text-to-video diffusion models. Despite great promise, video diffusion models
are difficult to control, hindering a user to apply their own creativity rather
than amplifying it. To address this challenge, we present a novel approach that
combines the controllability of dynamic 3D meshes with the expressivity and
editability of emerging diffusion models. For this purpose, our approach takes
an animated, low-fidelity rendered mesh as input and injects the ground truth
correspondence information obtained from the dynamic mesh into various stages
of a pre-trained text-to-image generation model to output high-quality and
temporally consistent frames. We demonstrate our approach on various examples
where motion can be obtained by animating rigged assets or changing the camera
path.
Related papers
- Replace Anyone in Videos [39.4019337319795]
We propose the ReplaceAnyone framework, which focuses on localizing and manipulating human motion in videos.
Specifically, we formulate this task as an image-conditioned pose-driven video inpainting paradigm.
We introduce diverse mask forms involving regular and irregular shapes to avoid shape leakage and allow granular local control.
arXiv Detail & Related papers (2024-09-30T03:27:33Z) - Scene123: One Prompt to 3D Scene Generation via Video-Assisted and Consistency-Enhanced MAE [22.072200443502457]
We propose Scene123, a 3D scene generation model that ensures realism and diversity through the video generation framework.
Specifically, we warp the input image (or an image generated from text) to simulate adjacent views, filling the invisible areas with the MAE model.
To further enhance the details and texture fidelity of generated views, we employ a GAN-based Loss against images derived from the input image through the video generation model.
arXiv Detail & Related papers (2024-08-10T08:09:57Z) - VD3D: Taming Large Video Diffusion Transformers for 3D Camera Control [74.5434726968562]
We tame transformers video for 3D camera control using a ControlNet-like conditioning mechanism based on Plucker coordinates.
Our work is the first to enable camera control for transformer-based video diffusion models.
arXiv Detail & Related papers (2024-07-17T17:59:05Z) - WildVidFit: Video Virtual Try-On in the Wild via Image-Based Controlled Diffusion Models [132.77237314239025]
Video virtual try-on aims to generate realistic sequences that maintain garment identity and adapt to a person's pose and body shape in source videos.
Traditional image-based methods, relying on warping and blending, struggle with complex human movements and occlusions.
We reconceptualize video try-on as a process of generating videos conditioned on garment descriptions and human motion.
Our solution, WildVidFit, employs image-based controlled diffusion models for a streamlined, one-stage approach.
arXiv Detail & Related papers (2024-07-15T11:21:03Z) - 4Real: Towards Photorealistic 4D Scene Generation via Video Diffusion Models [53.89348957053395]
We introduce a novel pipeline designed for text-to-4D scene generation.
Our method begins by generating a reference video using the video generation model.
We then learn the canonical 3D representation of the video using a freeze-time video.
arXiv Detail & Related papers (2024-06-11T17:19:26Z) - MotionDreamer: Zero-Shot 3D Mesh Animation from Video Diffusion Models [10.263762787854862]
We propose a technique for automatic re-animation of arbitrary 3D shapes based on a motion prior extracted from a video diffusion model.
We leverage an explicit mesh-based representation compatible with existing computer-graphics pipelines.
Our time-efficient zero-shot method achieves a superior performance re-animating a diverse set of 3D shapes.
arXiv Detail & Related papers (2024-05-30T15:30:38Z) - Investigating the Effectiveness of Cross-Attention to Unlock Zero-Shot Editing of Text-to-Video Diffusion Models [52.28245595257831]
Cross-attention guidance can be a promising approach for editing videos.
We show that despite the limitations of current T2V models, cross-attention guidance can be a promising approach for editing videos.
arXiv Detail & Related papers (2024-04-08T13:40:01Z) - VMC: Video Motion Customization using Temporal Attention Adaption for
Text-to-Video Diffusion Models [58.93124686141781]
Video Motion Customization (VMC) is a novel one-shot tuning approach crafted to adapt temporal attention layers within video diffusion models.
Our approach introduces a novel motion distillation objective using residual vectors between consecutive frames as a motion reference.
We validate our method against state-of-the-art video generative models across diverse real-world motions and contexts.
arXiv Detail & Related papers (2023-12-01T06:50:11Z) - High-Fidelity and Freely Controllable Talking Head Video Generation [31.08828907637289]
We propose a novel model that produces high-fidelity talking head videos with free control over head pose and expression.
We introduce a novel motion-aware multi-scale feature alignment module to effectively transfer the motion without face distortion.
We evaluate our model on challenging datasets and demonstrate its state-of-the-art performance.
arXiv Detail & Related papers (2023-04-20T09:02:41Z) - Dreamix: Video Diffusion Models are General Video Editors [22.127604561922897]
Text-driven image and video diffusion models have recently achieved unprecedented generation realism.
We present the first diffusion-based method that is able to perform text-based motion and appearance editing of general videos.
arXiv Detail & Related papers (2023-02-02T18:58:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.