VidFace: A Full-Transformer Solver for Video FaceHallucination with
Unaligned Tiny Snapshots
- URL: http://arxiv.org/abs/2105.14954v1
- Date: Mon, 31 May 2021 13:40:41 GMT
- Title: VidFace: A Full-Transformer Solver for Video FaceHallucination with
Unaligned Tiny Snapshots
- Authors: Yuan Gan, Yawei Luo, Xin Yu, Bang Zhang, Yi Yang
- Abstract summary: We propose a pure transformer-based model, dubbed VidFace, to exploit the full-range-temporal and facial structure among multiple thumbnails.
We also curate a new large-scale video face hallucination dataset from the public Voxceleb2 benchmark.
- Score: 40.24311157634526
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we investigate the task of hallucinating an authentic
high-resolution (HR) human face from multiple low-resolution (LR) video
snapshots. We propose a pure transformer-based model, dubbed VidFace, to fully
exploit the full-range spatio-temporal information and facial structure cues
among multiple thumbnails. Specifically, VidFace handles multiple snapshots all
at once and harnesses the spatial and temporal information integrally to
explore face alignments across all the frames, thus avoiding accumulating
alignment errors. Moreover, we design a recurrent position embedding module to
equip our transformer with facial priors, which not only effectively
regularises the alignment mechanism but also supplants notorious pre-training.
Finally, we curate a new large-scale video face hallucination dataset from the
public Voxceleb2 benchmark, which challenges prior arts on tackling unaligned
and tiny face snapshots. To the best of our knowledge, we are the first attempt
to develop a unified transformer-based solver tailored for video-based face
hallucination. Extensive experiments on public video face benchmarks show that
the proposed method significantly outperforms the state of the arts.
Related papers
- GaussianHeads: End-to-End Learning of Drivable Gaussian Head Avatars from Coarse-to-fine Representations [54.94362657501809]
We propose a new method to generate highly dynamic and deformable human head avatars from multi-view imagery in real-time.
At the core of our method is a hierarchical representation of head models that allows to capture the complex dynamics of facial expressions and head movements.
We train this coarse-to-fine facial avatar model along with the head pose as a learnable parameter in an end-to-end framework.
arXiv Detail & Related papers (2024-09-18T13:05:43Z) - Kalman-Inspired Feature Propagation for Video Face Super-Resolution [78.84881180336744]
We introduce a novel framework to maintain a stable face prior to time.
The Kalman filtering principles offer our method a recurrent ability to use the information from previously restored frames to guide and regulate the restoration process of the current frame.
Experiments demonstrate the effectiveness of our method in capturing facial details consistently across video frames.
arXiv Detail & Related papers (2024-08-09T17:57:12Z) - VOODOO XP: Expressive One-Shot Head Reenactment for VR Telepresence [14.010324388059866]
VOODOO XP is a 3D-aware one-shot head reenactment method that can generate highly expressive facial expressions from any input driver video and a single 2D portrait.
We show our solution on a monocular video setting and an end-to-end VR telepresence system for two-way communication.
arXiv Detail & Related papers (2024-05-25T12:33:40Z) - Beyond Alignment: Blind Video Face Restoration via Parsing-Guided Temporal-Coherent Transformer [21.323165895036354]
We propose the first blind video face restoration approach with a novel parsing-guided temporal-coherent transformer (PGTFormer) without pre-alignment.
Specifically, we pre-train a temporal-spatial vector quantized auto-encoder on high-quality video face datasets to extract expressive context-rich priors.
This strategy reduces artifacts and mitigates jitter caused by cumulative errors from face pre-alignment.
arXiv Detail & Related papers (2024-04-21T12:33:07Z) - Image-to-Video Generation via 3D Facial Dynamics [78.01476554323179]
We present a versatile model, FaceAnime, for various video generation tasks from still images.
Our model is versatile for various AR/VR and entertainment applications, such as face video and face video prediction.
arXiv Detail & Related papers (2021-05-31T02:30:11Z) - Face2Face: Real-time Face Capture and Reenactment of RGB Videos [66.38142459175191]
Face2Face is a novel approach for real-time facial reenactment of a monocular target video sequence.
We track facial expressions of both source and target video using a dense photometric consistency measure.
We convincingly re-render the synthesized target face on top of the corresponding video stream.
arXiv Detail & Related papers (2020-07-29T12:47:16Z) - Audio-driven Talking Face Video Generation with Learning-based
Personalized Head Pose [67.31838207805573]
We propose a deep neural network model that takes an audio signal A of a source person and a short video V of a target person as input.
We outputs a synthesized high-quality talking face video with personalized head pose.
Our method can generate high-quality talking face videos with more distinguishing head movement effects than state-of-the-art methods.
arXiv Detail & Related papers (2020-02-24T10:02:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.