Learning to Deblur and Rotate Motion-Blurred Faces
- URL: http://arxiv.org/abs/2112.07599v1
- Date: Tue, 14 Dec 2021 17:51:19 GMT
- Title: Learning to Deblur and Rotate Motion-Blurred Faces
- Authors: Givi Meishvili, Attila Szab\'o, Simon Jenni, Paolo Favaro
- Abstract summary: We train a neural network to reconstruct a 3D video representation from a single image and the corresponding face gaze.
We then provide a camera viewpoint relative to the estimated gaze and the blurry image as input to an encoder-decoder network to generate a video of sharp frames with a novel camera viewpoint.
- Score: 43.673660541417995
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose a solution to the novel task of rendering sharp videos from new
viewpoints from a single motion-blurred image of a face. Our method handles the
complexity of face blur by implicitly learning the geometry and motion of faces
through the joint training on three large datasets: FFHQ and 300VW, which are
publicly available, and a new Bern Multi-View Face Dataset (BMFD) that we
built. The first two datasets provide a large variety of faces and allow our
model to generalize better. BMFD instead allows us to introduce multi-view
constraints, which are crucial to synthesizing sharp videos from a new camera
view. It consists of high frame rate synchronized videos from multiple views of
several subjects displaying a wide range of facial expressions. We use the high
frame rate videos to simulate realistic motion blur through averaging. Thanks
to this dataset, we train a neural network to reconstruct a 3D video
representation from a single image and the corresponding face gaze. We then
provide a camera viewpoint relative to the estimated gaze and the blurry image
as input to an encoder-decoder network to generate a video of sharp frames with
a novel camera viewpoint. We demonstrate our approach on test subjects of our
multi-view dataset and VIDTIMIT.
Related papers
- Zero-Shot Novel View and Depth Synthesis with Multi-View Geometric Diffusion [27.836518920611557]
We introduce MVGD, a diffusion-based architecture capable of direct pixel-level generation of images and depth maps from novel viewpoints.
We train this model on a collection of more than 60 million multi-view samples from publicly available datasets.
We report state-of-the-art results in multiple novel view synthesis benchmarks, as well as multi-view stereo and video depth estimation.
arXiv Detail & Related papers (2025-01-30T23:43:06Z) - VividFace: A Diffusion-Based Hybrid Framework for High-Fidelity Video Face Swapping [43.30061680192465]
We present the first diffusion-based framework specifically designed for video face swapping.
Our approach incorporates a specially designed diffusion model coupled with a VidFaceVAE.
Our framework achieves superior performance in identity preservation, temporal consistency, and visual quality compared to existing methods.
arXiv Detail & Related papers (2024-12-15T18:58:32Z) - SynCamMaster: Synchronizing Multi-Camera Video Generation from Diverse Viewpoints [43.14498014617223]
We propose a plug-and-play module that enhances a pre-trained text-to-video model for multi-camera video generation.
We introduce a multi-view synchronization module to maintain appearance and geometry consistency across different viewpoints.
Our method enables intriguing extensions, such as re-rendering a video from novel viewpoints.
arXiv Detail & Related papers (2024-12-10T18:55:17Z) - Generating 3D-Consistent Videos from Unposed Internet Photos [68.944029293283]
We train a scalable, 3D-aware video model without any 3D annotations such as camera parameters.
Our results suggest that we can scale up scene-level 3D learning using only 2D data such as videos and multiview internet photos.
arXiv Detail & Related papers (2024-11-20T18:58:31Z) - Flex3D: Feed-Forward 3D Generation With Flexible Reconstruction Model And Input View Curation [61.040832373015014]
We propose Flex3D, a novel framework for generating high-quality 3D content from text, single images, or sparse view images.
We employ a fine-tuned multi-view image diffusion model and a video diffusion model to generate a pool of candidate views, enabling a rich representation of the target 3D object.
In the second stage, the curated views are fed into a Flexible Reconstruction Model (FlexRM), built upon a transformer architecture that can effectively process an arbitrary number of inputs.
arXiv Detail & Related papers (2024-10-01T17:29:43Z) - MV2MAE: Multi-View Video Masked Autoencoders [33.61642891911761]
We present a method for self-supervised learning from synchronized multi-view videos.
We use a cross-view reconstruction task to inject geometry information in the model.
Our approach is based on the masked autoencoder (MAE) framework.
arXiv Detail & Related papers (2024-01-29T05:58:23Z) - MVTN: Learning Multi-View Transformations for 3D Understanding [60.15214023270087]
We introduce the Multi-View Transformation Network (MVTN), which uses differentiable rendering to determine optimal view-points for 3D shape recognition.
MVTN can be trained end-to-end with any multi-view network for 3D shape recognition.
Our approach demonstrates state-of-the-art performance in 3D classification and shape retrieval on several benchmarks.
arXiv Detail & Related papers (2022-12-27T12:09:16Z) - Vision Transformer for NeRF-Based View Synthesis from a Single Input
Image [49.956005709863355]
We propose to leverage both the global and local features to form an expressive 3D representation.
To synthesize a novel view, we train a multilayer perceptron (MLP) network conditioned on the learned 3D representation to perform volume rendering.
Our method can render novel views from only a single input image and generalize across multiple object categories using a single model.
arXiv Detail & Related papers (2022-07-12T17:52:04Z) - DeepFaceFlow: In-the-wild Dense 3D Facial Motion Estimation [56.56575063461169]
DeepFaceFlow is a robust, fast, and highly-accurate framework for the estimation of 3D non-rigid facial flow.
Our framework was trained and tested on two very large-scale facial video datasets.
Given registered pairs of images, our framework generates 3D flow maps at 60 fps.
arXiv Detail & Related papers (2020-05-14T23:56:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.