Related papers: RTGaze: Real-Time 3D-Aware Gaze Redirection from a Single Image

RTGaze: Real-Time 3D-Aware Gaze Redirection from a Single Image

URL: http://arxiv.org/abs/2511.11289v1
Date: Fri, 14 Nov 2025 13:24:13 GMT
Title: RTGaze: Real-Time 3D-Aware Gaze Redirection from a Single Image
Authors: Hengfei Wang, Zhongqun Zhang, Yihua Cheng, Hyung Jin Chang,
Abstract summary: RTGaze is a real-time and high-quality gaze redirection method.<n>We learn a gaze-controllable facial representation from face images and gaze prompts, then decodes this representation via neural rendering for gaze redirection.<n>We evaluate RTGaze both qualitatively and quantitatively, demonstrating state-of-the-art performance in efficiency, redirection accuracy, and image quality.
Score: 30.839568008275577
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Gaze redirection methods aim to generate realistic human face images with controllable eye movement. However, recent methods often struggle with 3D consistency, efficiency, or quality, limiting their practical applications. In this work, we propose RTGaze, a real-time and high-quality gaze redirection method. Our approach learns a gaze-controllable facial representation from face images and gaze prompts, then decodes this representation via neural rendering for gaze redirection. Additionally, we distill face geometric priors from a pretrained 3D portrait generator to enhance generation quality. We evaluate RTGaze both qualitatively and quantitatively, demonstrating state-of-the-art performance in efficiency, redirection accuracy, and image quality across multiple datasets. Our system achieves real-time, 3D-aware gaze redirection with a feedforward network (~0.06 sec/image), making it 800x faster than the previous state-of-the-art 3D-aware methods.

Related papers

Wonder3D++: Cross-domain Diffusion for High-fidelity 3D Generation from a Single Image [68.55613894952177]
We introduce textbfWonder3D++, a novel method for efficiently generating high-fidelity textured meshes from single-view images.<n>We propose a cross-domain diffusion model that generates multi-view normal maps and the corresponding color images.<n> Lastly, we introduce a cascaded 3D mesh extraction algorithm that drives high-quality surfaces from the multi-view 2D representations in only about $3$ minute in a coarse-to-fine manner.
arXiv Detail & Related papers (2025-11-03T17:24:18Z)
GaussianIP: Identity-Preserving Realistic 3D Human Generation via Human-Centric Diffusion Prior [25.72805054203982]
We propose a two-stage framework for generating identity-preserving realistic 3D humans from text and image prompts.<n>Our core insight is to leverage human-centric knowledge to facilitate the generation process.<n>Experiments demonstrate that GaussianIP outperforms existing methods in both visual quality and training efficiency.
arXiv Detail & Related papers (2025-03-14T07:16:43Z)
G3FA: Geometry-guided GAN for Face Animation [14.488117084637631]
We introduce Geometry-guided GAN for Face Animation (G3FA) to tackle this limitation. Our novel approach empowers the face animation model to incorporate 3D information using only 2D images. In our face reenactment model, we leverage 2D motion warping to capture motion dynamics.
arXiv Detail & Related papers (2024-08-23T13:13:24Z)
Instant3D: Fast Text-to-3D with Sparse-View Generation and Large Reconstruction Model [68.98311213582949]
We propose Instant3D, a novel method that generates high-quality and diverse 3D assets from text prompts in a feed-forward manner. Our method can generate diverse 3D assets of high visual quality within 20 seconds, two orders of magnitude faster than previous optimization-based methods.
arXiv Detail & Related papers (2023-11-10T18:03:44Z)
Wonder3D: Single Image to 3D using Cross-Domain Diffusion [105.16622018766236]
Wonder3D is a novel method for efficiently generating high-fidelity textured meshes from single-view images. To holistically improve the quality, consistency, and efficiency of image-to-3D tasks, we propose a cross-domain diffusion model.
arXiv Detail & Related papers (2023-10-23T15:02:23Z)
GazeNeRF: 3D-Aware Gaze Redirection with Neural Radiance Fields [100.53114092627577]
Existing gaze redirection methods operate on 2D images and struggle to generate 3D consistent results. We build on the intuition that the face region and eyeballs are separate 3D structures that move in a coordinated yet independent fashion.
arXiv Detail & Related papers (2022-12-08T13:19:11Z)
GazeOnce: Real-Time Multi-Person Gaze Estimation [18.16091280655655]
Appearance-based gaze estimation aims to predict the 3D eye gaze direction from a single image. Recent deep learning-based approaches have demonstrated excellent performance, but cannot output multi-person gaze in real time. We propose GazeOnce, which is capable of simultaneously predicting gaze directions for multiple faces in an image.
arXiv Detail & Related papers (2022-04-20T14:21:47Z)
Head2Head++: Deep Facial Attributes Re-Targeting [6.230979482947681]
We leverage the 3D geometry of faces and Generative Adversarial Networks (GANs) to design a novel deep learning architecture for the task of facial and head reenactment. We manage to capture the complex non-rigid facial motion from the driving monocular performances and synthesise temporally consistent videos. Our system performs end-to-end reenactment in nearly real-time speed (18 fps)
arXiv Detail & Related papers (2020-06-17T23:38:37Z)
DeepFaceFlow: In-the-wild Dense 3D Facial Motion Estimation [56.56575063461169]
DeepFaceFlow is a robust, fast, and highly-accurate framework for the estimation of 3D non-rigid facial flow. Our framework was trained and tested on two very large-scale facial video datasets. Given registered pairs of images, our framework generates 3D flow maps at 60 fps.
arXiv Detail & Related papers (2020-05-14T23:56:48Z)
It's Written All Over Your Face: Full-Face Appearance-Based Gaze Estimation [82.16380486281108]
We propose an appearance-based method that only takes the full face image as input. Our method encodes the face image using a convolutional neural network with spatial weights applied on the feature maps. We show that our full-face method significantly outperforms the state of the art for both 2D and 3D gaze estimation.
arXiv Detail & Related papers (2016-11-27T15:00:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.