Portrait4D-v2: Pseudo Multi-View Data Creates Better 4D Head Synthesizer
- URL: http://arxiv.org/abs/2403.13570v2
- Date: Thu, 11 Jul 2024 07:59:02 GMT
- Title: Portrait4D-v2: Pseudo Multi-View Data Creates Better 4D Head Synthesizer
- Authors: Yu Deng, Duomin Wang, Baoyuan Wang,
- Abstract summary: We propose a novel learning approach for feed-forward one-shot 4D head avatar synthesis.
We employ pseudo multi-view videos to learn a 4D head synthesizer in a data-driven manner.
- Score: 13.969883154405995
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we propose a novel learning approach for feed-forward one-shot 4D head avatar synthesis. Different from existing methods that often learn from reconstructing monocular videos guided by 3DMM, we employ pseudo multi-view videos to learn a 4D head synthesizer in a data-driven manner, avoiding reliance on inaccurate 3DMM reconstruction that could be detrimental to the synthesis performance. The key idea is to first learn a 3D head synthesizer using synthetic multi-view images to convert monocular real videos into multi-view ones, and then utilize the pseudo multi-view videos to learn a 4D head synthesizer via cross-view self-reenactment. By leveraging a simple vision transformer backbone with motion-aware cross-attentions, our method exhibits superior performance compared to previous methods in terms of reconstruction fidelity, geometry consistency, and motion control accuracy. We hope our method offers novel insights into integrating 3D priors with 2D supervisions for improved 4D head avatar creation.
Related papers
- Joint 3D Geometry Reconstruction and Motion Generation for 4D Synthesis from a Single Image [88.71287865590273]
We introduce TrajScene-60K, a large-scale dataset of 60,000 video samples with dense point trajectories.<n>We propose a diffusion-based 4D Scene Trajectory Generator (4D-STraG) to jointly generate geometrically consistent and motion-plausible 4D trajectories.<n>We then propose a 4D View Synthesis Module (4D-Vi) to render videos with arbitrary camera trajectories from 4D point track representations.
arXiv Detail & Related papers (2025-12-04T17:59:10Z) - Diff4Splat: Controllable 4D Scene Generation with Latent Dynamic Reconstruction Models [79.06910348413861]
We introduce Diff4Splat, a feed-forward method that synthesizes controllable and explicit 4D scenes from a single image.<n>Given a single input image, a camera trajectory, and an optional text prompt, Diff4Splat directly predicts a deformable 3D Gaussian field that encodes appearance, geometry, and motion.
arXiv Detail & Related papers (2025-11-01T11:16:25Z) - Restage4D: Reanimating Deformable 3D Reconstruction from a Single Video [56.781766315691854]
We introduce textbfRestage4D, a geometry-preserving pipeline for video-conditioned 4D restaging.<n>We validate Restage4D on DAVIS and PointOdyssey, demonstrating improved geometry consistency, motion quality, and 3D tracking performance.
arXiv Detail & Related papers (2025-08-08T21:31:51Z) - IM-Portrait: Learning 3D-aware Video Diffusion for Photorealistic Talking Heads from Monocular Videos [33.12653115668027]
Our method generates Multiplane Images (MPIs) that ensure geometric consistency.
Our approach directly generates the final output through a single denoising process.
To effectively learn from monocular videos, we introduce a training mechanism that reconstructs the output MPI randomly in either the target or the reference camera space.
arXiv Detail & Related papers (2025-04-27T08:56:02Z) - FaceCraft4D: Animated 3D Facial Avatar Generation from a Single Image [41.598551483524666]
We present a novel framework for generating high-quality, animatable 4D avatar from a single image.
Our method achieves superior quality compared to the prior art, while maintaining consistency across different viewpoints and expressions.
arXiv Detail & Related papers (2025-04-21T15:40:14Z) - Vivid4D: Improving 4D Reconstruction from Monocular Video by Video Inpainting [26.54811754399946]
We introduce Vivid4D, a novel approach that enhances 4D monocular video synthesis by augmenting observation views.
This reformulates view augmentation as a video inpainting task, where observed views are warped into new viewpoints.
Experiments demonstrate that our method effectively improves monocular 4D scene reconstruction and completion.
arXiv Detail & Related papers (2025-04-15T11:38:14Z) - Free4D: Tuning-free 4D Scene Generation with Spatial-Temporal Consistency [49.875459658889355]
Free4D is a tuning-free framework for 4D scene generation from a single image.<n>Our key insight is to distill pre-trained foundation models for consistent 4D scene representation.<n>The resulting 4D representation enables real-time, controllable rendering.
arXiv Detail & Related papers (2025-03-26T17:59:44Z) - FaceLift: Single Image to 3D Head with View Generation and GS-LRM [54.24070918942727]
FaceLift is a feed-forward approach for rapid, high-quality, 360-degree head reconstruction from a single image.
We show that FaceLift outperforms state-of-the-art methods in 3D head reconstruction, highlighting its practical applicability and robust performance on real-world images.
arXiv Detail & Related papers (2024-12-23T18:59:49Z) - CAT4D: Create Anything in 4D with Multi-View Video Diffusion Models [98.03734318657848]
We present CAT4D, a method for creating 4D (dynamic 3D) scenes from monocular video.
We leverage a multi-view video diffusion model trained on a diverse combination of datasets to enable novel view synthesis.
We demonstrate competitive performance on novel view synthesis and dynamic scene reconstruction benchmarks.
arXiv Detail & Related papers (2024-11-27T18:57:16Z) - Trans4D: Realistic Geometry-Aware Transition for Compositional Text-to-4D Synthesis [60.853577108780414]
Existing 4D generation methods can generate high-quality 4D objects or scenes based on user-friendly conditions.
We propose Trans4D, a novel text-to-4D synthesis framework that enables realistic complex scene transitions.
In experiments, Trans4D consistently outperforms existing state-of-the-art methods in generating 4D scenes with accurate and high-quality transitions.
arXiv Detail & Related papers (2024-10-09T17:56:03Z) - Coherent 3D Portrait Video Reconstruction via Triplane Fusion [21.381482393260406]
Per-frame 3D reconstruction exhibits temporal inconsistency and forgets the user's appearance.
We propose a new fusion-based method that fuses a personalized 3D subject prior with per-frame information.
Our method achieves both state-of-the-art 3D reconstruction accuracy and temporal consistency on in-studio and in-the-wild datasets.
arXiv Detail & Related papers (2024-05-01T18:08:51Z) - The More You See in 2D, the More You Perceive in 3D [32.578628729549145]
SAP3D is a system for 3D reconstruction and novel view synthesis from an arbitrary number of unposed images.
We show that as the number of input images increases, the performance of our approach improves.
arXiv Detail & Related papers (2024-04-04T17:59:40Z) - Efficient4D: Fast Dynamic 3D Object Generation from a Single-view Video [42.10482273572879]
We propose an efficient video-to-4D object generation framework called Efficient4D.
It generates high-quality spacetime-consistent images under different camera views, and then uses them as labeled data.
Experiments on both synthetic and real videos show that Efficient4D offers a remarkable 10-fold increase in speed.
arXiv Detail & Related papers (2024-01-16T18:58:36Z) - Real3D-Portrait: One-shot Realistic 3D Talking Portrait Synthesis [88.17520303867099]
One-shot 3D talking portrait generation aims to reconstruct a 3D avatar from an unseen image, and then animate it with a reference video or audio.
We present Real3D-Potrait, a framework that improves the one-shot 3D reconstruction power with a large image-to-plane model.
Experiments show that Real3D-Portrait generalizes well to unseen identities and generates more realistic talking portrait videos.
arXiv Detail & Related papers (2024-01-16T17:04:30Z) - 4DGen: Grounded 4D Content Generation with Spatial-temporal Consistency [118.15258850780417]
This work introduces 4DGen, a novel framework for grounded 4D content creation.
We identify static 3D assets and monocular video sequences as key components in constructing the 4D content.
Our pipeline facilitates conditional 4D generation, enabling users to specify geometry (3D assets) and motion (monocular videos)
arXiv Detail & Related papers (2023-12-28T18:53:39Z) - Align Your Gaussians: Text-to-4D with Dynamic 3D Gaussians and Composed
Diffusion Models [94.07744207257653]
We focus on the underexplored text-to-4D setting and synthesize dynamic, animated 3D objects.
We combine text-to-image, text-to-video, and 3D-aware multiview diffusion models to provide feedback during 4D object optimization.
arXiv Detail & Related papers (2023-12-21T11:41:02Z) - Portrait4D: Learning One-Shot 4D Head Avatar Synthesis using Synthetic Data [27.109881339132258]
We present a method to learn one-shot 4D head synthesis via large-scale synthetic data.
A novel learning strategy is enforced to enhance the generalizability to real images by disentangling the learning process of 3D reconstruction and reenactment.
arXiv Detail & Related papers (2023-11-30T17:26:33Z) - 3D-Aware Video Generation [149.5230191060692]
We explore 4D generative adversarial networks (GANs) that learn generation of 3D-aware videos.
By combining neural implicit representations with time-aware discriminator, we develop a GAN framework that synthesizes 3D video supervised only with monocular videos.
arXiv Detail & Related papers (2022-06-29T17:56:03Z) - Learning Compositional Representation for 4D Captures with Neural ODE [72.56606274691033]
We introduce a compositional representation for 4D captures, that disentangles shape, initial state, and motion respectively.
To model the motion, a neural Ordinary Differential Equation (ODE) is trained to update the initial state conditioned on the learned motion code.
A decoder takes the shape code and the updated pose code to reconstruct 4D captures at each time stamp.
arXiv Detail & Related papers (2021-03-15T10:55:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.