Related papers: Generative Rendering: Controllable 4D-Guided Video Generation with 2D Diffusion Models

Generative Rendering: Controllable 4D-Guided Video Generation with 2D Diffusion Models

URL: http://arxiv.org/abs/2312.01409v1
Date: Sun, 3 Dec 2023 14:17:11 GMT
Title: Generative Rendering: Controllable 4D-Guided Video Generation with 2D Diffusion Models
Authors: Shengqu Cai and Duygu Ceylan and Matheus Gadelha and Chun-Hao Paul Huang and Tuanfeng Yang Wang and Gordon Wetzstein
Abstract summary: We present a novel approach that combines the controllability of dynamic 3D meshes with the expressivity and editability of emerging diffusion models. We demonstrate our approach on various examples where motion can be obtained by animating rigged assets or changing the camera path.
Score: 40.71940056121056
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Traditional 3D content creation tools empower users to bring their imagination to life by giving them direct control over a scene's geometry, appearance, motion, and camera path. Creating computer-generated videos, however, is a tedious manual process, which can be automated by emerging text-to-video diffusion models. Despite great promise, video diffusion models are difficult to control, hindering a user to apply their own creativity rather than amplifying it. To address this challenge, we present a novel approach that combines the controllability of dynamic 3D meshes with the expressivity and editability of emerging diffusion models. For this purpose, our approach takes an animated, low-fidelity rendered mesh as input and injects the ground truth correspondence information obtained from the dynamic mesh into various stages of a pre-trained text-to-image generation model to output high-quality and temporally consistent frames. We demonstrate our approach on various examples where motion can be obtained by animating rigged assets or changing the camera path.

Related papers

I2V3D: Controllable image-to-video generation with 3D guidance [42.23117201457898]
IV23D is a framework for animating static images into dynamic videos with precise 3D control. Our approach combines the precision of a computer graphics pipeline with advanced generative models.
arXiv Detail & Related papers (2025-03-12T18:26:34Z)
MotionCanvas: Cinematic Shot Design with Controllable Image-to-Video Generation [65.74312406211213]
This paper presents a method that allows users to design cinematic video shots in the context of image-to-video generation. By connecting insights from classical computer graphics and contemporary video generation techniques, we demonstrate the ability to achieve 3D-aware motion control in I2V synthesis.
arXiv Detail & Related papers (2025-02-06T18:41:04Z)
InfiniCube: Unbounded and Controllable Dynamic 3D Driving Scene Generation with World-Guided Video Models [75.03495065452955]
We present InfiniCube, a scalable method for generating dynamic 3D driving scenes with high fidelity and controllability. Our method can generate controllable and realistic 3D driving scenes, and extensive experiments validate the effectiveness and superiority of our model.
arXiv Detail & Related papers (2024-12-05T07:32:20Z)
Replace Anyone in Videos [39.4019337319795]
We propose the ReplaceAnyone framework, which focuses on localizing and manipulating human motion in videos. Specifically, we formulate this task as an image-conditioned pose-driven video inpainting paradigm. We introduce diverse mask forms involving regular and irregular shapes to avoid shape leakage and allow granular local control.
arXiv Detail & Related papers (2024-09-30T03:27:33Z)
Scene123: One Prompt to 3D Scene Generation via Video-Assisted and Consistency-Enhanced MAE [22.072200443502457]
We propose Scene123, a 3D scene generation model that ensures realism and diversity through the video generation framework. Specifically, we warp the input image (or an image generated from text) to simulate adjacent views, filling the invisible areas with the MAE model. To further enhance the details and texture fidelity of generated views, we employ a GAN-based Loss against images derived from the input image through the video generation model.
arXiv Detail & Related papers (2024-08-10T08:09:57Z)
VD3D: Taming Large Video Diffusion Transformers for 3D Camera Control [74.5434726968562]
We tame transformers video for 3D camera control using a ControlNet-like conditioning mechanism based on Plucker coordinates. Our work is the first to enable camera control for transformer-based video diffusion models.
arXiv Detail & Related papers (2024-07-17T17:59:05Z)
WildVidFit: Video Virtual Try-On in the Wild via Image-Based Controlled Diffusion Models [132.77237314239025]
Video virtual try-on aims to generate realistic sequences that maintain garment identity and adapt to a person's pose and body shape in source videos. Traditional image-based methods, relying on warping and blending, struggle with complex human movements and occlusions. We reconceptualize video try-on as a process of generating videos conditioned on garment descriptions and human motion. Our solution, WildVidFit, employs image-based controlled diffusion models for a streamlined, one-stage approach.
arXiv Detail & Related papers (2024-07-15T11:21:03Z)
4Real: Towards Photorealistic 4D Scene Generation via Video Diffusion Models [53.89348957053395]
We introduce a novel pipeline designed for text-to-4D scene generation. Our method begins by generating a reference video using the video generation model. We then learn the canonical 3D representation of the video using a freeze-time video.
arXiv Detail & Related papers (2024-06-11T17:19:26Z)
MotionDreamer: Exploring Semantic Video Diffusion features for Zero-Shot 3D Mesh Animation [10.263762787854862]
We propose a technique for automatic re-animation of various 3D shapes based on a motion prior extracted from a video diffusion model. We leverage an explicit mesh-based representation compatible with existing computer-graphics pipelines. Our time-efficient zero-shot method achieves a superior performance re-animating a diverse set of 3D shapes.
arXiv Detail & Related papers (2024-05-30T15:30:38Z)
Investigating the Effectiveness of Cross-Attention to Unlock Zero-Shot Editing of Text-to-Video Diffusion Models [52.28245595257831]
Cross-attention guidance can be a promising approach for editing videos. We show that despite the limitations of current T2V models, cross-attention guidance can be a promising approach for editing videos.
arXiv Detail & Related papers (2024-04-08T13:40:01Z)
VMC: Video Motion Customization using Temporal Attention Adaption for Text-to-Video Diffusion Models [58.93124686141781]
Video Motion Customization (VMC) is a novel one-shot tuning approach crafted to adapt temporal attention layers within video diffusion models. Our approach introduces a novel motion distillation objective using residual vectors between consecutive frames as a motion reference. We validate our method against state-of-the-art video generative models across diverse real-world motions and contexts.
arXiv Detail & Related papers (2023-12-01T06:50:11Z)
LaMD: Latent Motion Diffusion for Image-Conditional Video Generation [63.34574080016687]
latent motion diffusion (LaMD) framework consists of a motion-decomposed video autoencoder and a diffusion-based motion generator. LaMD generates high-quality videos on various benchmark datasets, including BAIR, Landscape, NATOPS, MUG and CATER-GEN.
arXiv Detail & Related papers (2023-04-23T10:32:32Z)
High-Fidelity and Freely Controllable Talking Head Video Generation [31.08828907637289]
We propose a novel model that produces high-fidelity talking head videos with free control over head pose and expression. We introduce a novel motion-aware multi-scale feature alignment module to effectively transfer the motion without face distortion. We evaluate our model on challenging datasets and demonstrate its state-of-the-art performance.
arXiv Detail & Related papers (2023-04-20T09:02:41Z)
Dreamix: Video Diffusion Models are General Video Editors [22.127604561922897]
Text-driven image and video diffusion models have recently achieved unprecedented generation realism. We present the first diffusion-based method that is able to perform text-based motion and appearance editing of general videos.
arXiv Detail & Related papers (2023-02-02T18:58:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.