Related papers: Learning to Deblur and Rotate Motion-Blurred Faces

Learning to Deblur and Rotate Motion-Blurred Faces

URL: http://arxiv.org/abs/2112.07599v1
Date: Tue, 14 Dec 2021 17:51:19 GMT
Title: Learning to Deblur and Rotate Motion-Blurred Faces
Authors: Givi Meishvili, Attila Szab\'o, Simon Jenni, Paolo Favaro
Abstract summary: We train a neural network to reconstruct a 3D video representation from a single image and the corresponding face gaze. We then provide a camera viewpoint relative to the estimated gaze and the blurry image as input to an encoder-decoder network to generate a video of sharp frames with a novel camera viewpoint.
Score: 43.673660541417995
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We propose a solution to the novel task of rendering sharp videos from new viewpoints from a single motion-blurred image of a face. Our method handles the complexity of face blur by implicitly learning the geometry and motion of faces through the joint training on three large datasets: FFHQ and 300VW, which are publicly available, and a new Bern Multi-View Face Dataset (BMFD) that we built. The first two datasets provide a large variety of faces and allow our model to generalize better. BMFD instead allows us to introduce multi-view constraints, which are crucial to synthesizing sharp videos from a new camera view. It consists of high frame rate synchronized videos from multiple views of several subjects displaying a wide range of facial expressions. We use the high frame rate videos to simulate realistic motion blur through averaging. Thanks to this dataset, we train a neural network to reconstruct a 3D video representation from a single image and the corresponding face gaze. We then provide a camera viewpoint relative to the estimated gaze and the blurry image as input to an encoder-decoder network to generate a video of sharp frames with a novel camera viewpoint. We demonstrate our approach on test subjects of our multi-view dataset and VIDTIMIT.

Related papers

Zero-Shot Novel View and Depth Synthesis with Multi-View Geometric Diffusion [27.836518920611557]
We introduce MVGD, a diffusion-based architecture capable of direct pixel-level generation of images and depth maps from novel viewpoints. We train this model on a collection of more than 60 million multi-view samples from publicly available datasets. We report state-of-the-art results in multiple novel view synthesis benchmarks, as well as multi-view stereo and video depth estimation.
arXiv Detail & Related papers (2025-01-30T23:43:06Z)
VividFace: A Diffusion-Based Hybrid Framework for High-Fidelity Video Face Swapping [43.30061680192465]
We present the first diffusion-based framework specifically designed for video face swapping. Our approach incorporates a specially designed diffusion model coupled with a VidFaceVAE. Our framework achieves superior performance in identity preservation, temporal consistency, and visual quality compared to existing methods.
arXiv Detail & Related papers (2024-12-15T18:58:32Z)
SynCamMaster: Synchronizing Multi-Camera Video Generation from Diverse Viewpoints [43.14498014617223]
We propose a plug-and-play module that enhances a pre-trained text-to-video model for multi-camera video generation. We introduce a multi-view synchronization module to maintain appearance and geometry consistency across different viewpoints. Our method enables intriguing extensions, such as re-rendering a video from novel viewpoints.
arXiv Detail & Related papers (2024-12-10T18:55:17Z)
Generating 3D-Consistent Videos from Unposed Internet Photos [68.944029293283]
We train a scalable, 3D-aware video model without any 3D annotations such as camera parameters. Our results suggest that we can scale up scene-level 3D learning using only 2D data such as videos and multiview internet photos.
arXiv Detail & Related papers (2024-11-20T18:58:31Z)
Flex3D: Feed-Forward 3D Generation With Flexible Reconstruction Model And Input View Curation [61.040832373015014]
We propose Flex3D, a novel framework for generating high-quality 3D content from text, single images, or sparse view images. We employ a fine-tuned multi-view image diffusion model and a video diffusion model to generate a pool of candidate views, enabling a rich representation of the target 3D object. In the second stage, the curated views are fed into a Flexible Reconstruction Model (FlexRM), built upon a transformer architecture that can effectively process an arbitrary number of inputs.
arXiv Detail & Related papers (2024-10-01T17:29:43Z)
MV2MAE: Multi-View Video Masked Autoencoders [33.61642891911761]
We present a method for self-supervised learning from synchronized multi-view videos. We use a cross-view reconstruction task to inject geometry information in the model. Our approach is based on the masked autoencoder (MAE) framework.
arXiv Detail & Related papers (2024-01-29T05:58:23Z)
DUSt3R: Geometric 3D Vision Made Easy [8.471330244002564]
We introduce DUSt3R, a novel paradigm for Dense and Unconstrained Stereo 3D Reconstruction of arbitrary image collections. We show that this formulation smoothly unifies the monocular and binocular reconstruction cases. Our formulation directly provides a 3D model of the scene as well as depth information, but interestingly, we can seamlessly recover from it, pixel matches, relative and absolute camera.
arXiv Detail & Related papers (2023-12-21T18:52:14Z)
DrivingDiffusion: Layout-Guided multi-view driving scene video generation with latent diffusion model [19.288610627281102]
We propose DrivingDiffusion to generate realistic multi-view videos controlled by 3D layout. Our model can generate large-scale realistic multi-camera driving videos in complex urban scenes.
arXiv Detail & Related papers (2023-10-11T18:00:08Z)
Diffusion-Guided Reconstruction of Everyday Hand-Object Interaction Clips [38.02945794078731]
We tackle the task of reconstructing hand-object interactions from short video clips. Our approach casts 3D inference as a per-video optimization and recovers a neural 3D representation of the object shape. We empirically evaluate our approach on egocentric videos, and observe significant improvements over prior single-view and multi-view methods.
arXiv Detail & Related papers (2023-09-11T17:58:30Z)
MVTN: Learning Multi-View Transformations for 3D Understanding [60.15214023270087]
We introduce the Multi-View Transformation Network (MVTN), which uses differentiable rendering to determine optimal view-points for 3D shape recognition. MVTN can be trained end-to-end with any multi-view network for 3D shape recognition. Our approach demonstrates state-of-the-art performance in 3D classification and shape retrieval on several benchmarks.
arXiv Detail & Related papers (2022-12-27T12:09:16Z)
Vision Transformer for NeRF-Based View Synthesis from a Single Input Image [49.956005709863355]
We propose to leverage both the global and local features to form an expressive 3D representation. To synthesize a novel view, we train a multilayer perceptron (MLP) network conditioned on the learned 3D representation to perform volume rendering. Our method can render novel views from only a single input image and generalize across multiple object categories using a single model.
arXiv Detail & Related papers (2022-07-12T17:52:04Z)
DeepFaceFlow: In-the-wild Dense 3D Facial Motion Estimation [56.56575063461169]
DeepFaceFlow is a robust, fast, and highly-accurate framework for the estimation of 3D non-rigid facial flow. Our framework was trained and tested on two very large-scale facial video datasets. Given registered pairs of images, our framework generates 3D flow maps at 60 fps.
arXiv Detail & Related papers (2020-05-14T23:56:48Z)

This list is automatically generated from the titles and abstracts of the papers in this site.