Real-time 3D-aware Portrait Video Relighting
- URL: http://arxiv.org/abs/2410.18355v1
- Date: Thu, 24 Oct 2024 01:34:11 GMT
- Title: Real-time 3D-aware Portrait Video Relighting
- Authors: Ziqi Cai, Kaiwen Jiang, Shu-Yu Chen, Yu-Kun Lai, Hongbo Fu, Boxin Shi, Lin Gao,
- Abstract summary: We present the first real-time 3D-aware method for relighting in-the-wild videos of talking faces based on Neural Radiance Fields (NeRF)
We infer an albedo tri-plane, as well as a shading tri-plane based on a desired lighting condition for each video frame with fast dual-encoders.
Our method runs at 32.98 fps on consumer-level hardware and achieves state-of-the-art results in terms of reconstruction quality, lighting error, lighting instability, temporal consistency and inference speed.
- Score: 89.41078798641732
- License:
- Abstract: Synthesizing realistic videos of talking faces under custom lighting conditions and viewing angles benefits various downstream applications like video conferencing. However, most existing relighting methods are either time-consuming or unable to adjust the viewpoints. In this paper, we present the first real-time 3D-aware method for relighting in-the-wild videos of talking faces based on Neural Radiance Fields (NeRF). Given an input portrait video, our method can synthesize talking faces under both novel views and novel lighting conditions with a photo-realistic and disentangled 3D representation. Specifically, we infer an albedo tri-plane, as well as a shading tri-plane based on a desired lighting condition for each video frame with fast dual-encoders. We also leverage a temporal consistency network to ensure smooth transitions and reduce flickering artifacts. Our method runs at 32.98 fps on consumer-level hardware and achieves state-of-the-art results in terms of reconstruction quality, lighting error, lighting instability, temporal consistency and inference speed. We demonstrate the effectiveness and interactivity of our method on various portrait videos with diverse lighting and viewing conditions.
Related papers
- Sun Off, Lights On: Photorealistic Monocular Nighttime Simulation for Robust Semantic Perception [53.631644875171595]
Nighttime scenes are hard to semantically perceive with learned models and annotate for humans.
Our method, named Sun Off, Lights On (SOLO), is the first to perform nighttime simulation on single images in a photorealistic fashion by operating in 3D.
Not only is the visual quality and photorealism of our nighttime images superior to competing approaches including diffusion models, but the former images are also proven more beneficial for semantic nighttime segmentation in day-to-night adaptation.
arXiv Detail & Related papers (2024-07-29T18:00:09Z) - Lite2Relight: 3D-aware Single Image Portrait Relighting [87.62069509622226]
Lite2Relight is a novel technique that can predict 3D consistent head poses of portraits.
By utilizing a pre-trained geometry-aware encoder and a feature alignment module, we map input images into a relightable 3D space.
This includes producing 3D-consistent results of the full head, including hair, eyes, and expressions.
arXiv Detail & Related papers (2024-07-15T07:16:11Z) - Personalized Video Relighting With an At-Home Light Stage [0.0]
We develop a personalized video relighting algorithm that produces high-quality and temporally consistent relit videos in real-time.
We show that by just capturing recordings of a user watching YouTube videos on a monitor we can train a personalized algorithm capable of performing high-quality relighting under any condition.
arXiv Detail & Related papers (2023-11-15T10:33:20Z) - ReliTalk: Relightable Talking Portrait Generation from a Single Video [62.47116237654984]
ReliTalk is a novel framework for relightable audio-driven talking portrait generation from monocular videos.
Our key insight is to decompose the portrait's reflectance from implicitly learned audio-driven facial normals and images.
arXiv Detail & Related papers (2023-09-05T17:59:42Z) - 3D Gaussian Splatting for Real-Time Radiance Field Rendering [4.320393382724066]
We introduce three key elements that allow us to achieve state-of-the-art visual quality while maintaining competitive training times.
We demonstrate state-of-the-art visual quality and real-time rendering on several established datasets.
arXiv Detail & Related papers (2023-08-08T06:37:06Z) - Physically-Based Editing of Indoor Scene Lighting from a Single Image [106.60252793395104]
We present a method to edit complex indoor lighting from a single image with its predicted depth and light source segmentation masks.
We tackle this problem using two novel components: 1) a holistic scene reconstruction method that estimates scene reflectance and parametric 3D lighting, and 2) a neural rendering framework that re-renders the scene from our predictions.
arXiv Detail & Related papers (2022-05-19T06:44:37Z) - Deep 3D Mask Volume for View Synthesis of Dynamic Scenes [49.45028543279115]
We introduce a multi-view video dataset, captured with a custom 10-camera rig in 120FPS.
The dataset contains 96 high-quality scenes showing various visual effects and human interactions in outdoor scenes.
We develop a new algorithm, Deep 3D Mask Volume, which enables temporally-stable view extrapolation from binocular videos of dynamic scenes, captured by static cameras.
arXiv Detail & Related papers (2021-08-30T17:55:28Z) - LipSync3D: Data-Efficient Learning of Personalized 3D Talking Faces from
Video using Pose and Lighting Normalization [4.43316916502814]
We present a video-based learning framework for animating personalized 3D talking faces from audio.
We introduce two training-time data normalizations that significantly improve data sample efficiency.
Our method outperforms contemporary state-of-the-art audio-driven video reenactment benchmarks in terms of realism, lip-sync and visual quality scores.
arXiv Detail & Related papers (2021-06-08T08:56:40Z) - Relightable 3D Head Portraits from a Smartphone Video [15.639140551193073]
We present a system for creating a relightable 3D portrait of a human head.
Our neural pipeline operates on a sequence of frames captured by a smartphone camera with the flash blinking.
A deep rendering network is trained to regress dense albedo, normals, and environmental lighting maps for arbitrary new viewpoints.
arXiv Detail & Related papers (2020-12-17T22:49:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.