3D Gaussian and Diffusion-Based Gaze Redirection
- URL: http://arxiv.org/abs/2511.11231v1
- Date: Fri, 14 Nov 2025 12:32:22 GMT
- Title: 3D Gaussian and Diffusion-Based Gaze Redirection
- Authors: Abiram Panchalingam, Indu Bodala, Stuart Middleton,
- Abstract summary: DiT-Gaze is a framework that enhances 3D gaze redirection models.<n>Experiments show that DiT-Gaze sets a new state-of-the-art in both perceptual quality and redirection accuracy.
- Score: 0.05097809301149341
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: High-fidelity gaze redirection is critical for generating augmented data to improve the generalization of gaze estimators. 3D Gaussian Splatting (3DGS) models like GazeGaussian represent the state-of-the-art but can struggle with rendering subtle, continuous gaze shifts. In this paper, we propose DiT-Gaze, a framework that enhances 3D gaze redirection models using a novel combination of Diffusion Transformer (DiT), weak supervision across gaze angles, and an orthogonality constraint loss. DiT allows higher-fidelity image synthesis, while our weak supervision strategy using synthetically generated intermediate gaze angles provides a smooth manifold of gaze directions during training. The orthogonality constraint loss mathematically enforces the disentanglement of internal representations for gaze, head pose, and expression. Comprehensive experiments show that DiT-Gaze sets a new state-of-the-art in both perceptual quality and redirection accuracy, reducing the state-of-the-art gaze error by 4.1% to 6.353 degrees, providing a superior method for creating synthetic training data. Our code and models will be made available for the research community to benchmark against.
Related papers
- Hybrid-Domain Adaptative Representation Learning for Gaze Estimation [20.422491630669885]
We present a novel Hybrid-domain Adaptative Representation Learning framework to learn robust gaze representation.<n>We propose to disentangle gaze-relevant representation from low-quality facial images by aligning features extracted from high-quality near-eye images.<n>Experiments on EyeDiap, MPIIFaceGaze, and Gaze360 datasets demonstrate that our approach achieves state-of-the-art accuracy.
arXiv Detail & Related papers (2025-11-17T10:38:50Z) - OmniGaze: Reward-inspired Generalizable Gaze Estimation In The Wild [104.57404324262556]
Current 3D gaze estimation methods struggle to generalize across diverse data domains.<n>We present OmniGaze, a semi-supervised framework for 3D gaze estimation.<n>We show that OmniGaze achieves state-of-the-art performance on five datasets.
arXiv Detail & Related papers (2025-10-15T15:19:52Z) - VA-GS: Enhancing the Geometric Representation of Gaussian Splatting via View Alignment [48.147381011235446]
3D Gaussian Splatting has recently emerged as an efficient solution for real-time novel view synthesis.<n>We propose a novel method that enhances the geometric representation of 3D Gaussians through view alignment.<n>Our method achieves state-of-the-art performance in both surface reconstruction and novel view synthesis.
arXiv Detail & Related papers (2025-10-13T14:44:50Z) - OracleGS: Grounding Generative Priors for Sparse-View Gaussian Splatting [78.70702961852119]
OracleGS reconciles generative completeness with regressive fidelity for sparse view Gaussian Splatting.<n>Our approach conditions the powerful generative prior on multi-view geometric evidence, filtering hallucinatory artifacts while preserving plausible completions in under-constrained regions.
arXiv Detail & Related papers (2025-09-27T11:19:32Z) - Diffusion-Guided Gaussian Splatting for Large-Scale Unconstrained 3D Reconstruction and Novel View Synthesis [22.767866875051013]
We propose GS-Diff, a novel 3DGS framework guided by a multi-view diffusion model to address limitations of current methods.<n>By generating pseudo-observations conditioned on multi-view inputs, our method transforms under-constrained 3D reconstruction problems into well-posed ones.<n> Experiments on four benchmarks demonstrate that GS-Diff consistently outperforms state-of-the-art baselines by significant margins.
arXiv Detail & Related papers (2025-04-02T17:59:46Z) - Enhancing 3D Gaze Estimation in the Wild using Weak Supervision with Gaze Following Labels [10.827081942898506]
We introduce a novel Self-Training Weakly-Supervised Gaze Estimation framework (ST-WSGE)<n>We propose the Gaze Transformer (GaT), a modality-agnostic architecture capable of simultaneously learning static and dynamic gaze information from both image and video datasets.<n>By combining 3D video datasets with 2D gaze target labels from gaze following tasks, our approach achieves the following key contributions.
arXiv Detail & Related papers (2025-02-27T16:35:25Z) - GazeGaussian: High-Fidelity Gaze Redirection with 3D Gaussian Splatting [15.509324745484141]
We propose GazeGaussian, the first high-fidelity gaze redirection method that uses a two-stream 3DGS model to represent the face and eye regions separately.<n> Comprehensive experiments show that GazeGaussian outperforms existing methods in rendering head speed, gaze redirection accuracy, and facial synthesis across multiple datasets.
arXiv Detail & Related papers (2024-11-20T02:15:23Z) - Binocular-Guided 3D Gaussian Splatting with View Consistency for Sparse View Synthesis [53.702118455883095]
We propose a novel method for synthesizing novel views from sparse views with Gaussian Splatting.
Our key idea lies in exploring the self-supervisions inherent in the binocular stereo consistency between each pair of binocular images.
Our method significantly outperforms the state-of-the-art methods.
arXiv Detail & Related papers (2024-10-24T15:10:27Z) - Irregular Tensor Low-Rank Representation for Hyperspectral Image Representation [71.69331824668954]
Spectral variations pose a common challenge in analyzing hyperspectral images (HSI)<n>Low-rank tensor representation has emerged as a robust strategy, leveraging inherent correlations within HSI data.<n>We propose a novel model for irregular tensor lowrank representation tailored to efficiently model irregular 3D cubes.
arXiv Detail & Related papers (2024-10-24T02:56:22Z) - Gaussian Opacity Fields: Efficient Adaptive Surface Reconstruction in Unbounded Scenes [50.92217884840301]
Gaussian Opacity Fields (GOF) is a novel approach for efficient, high-quality, and adaptive surface reconstruction in scenes.
GOF is derived from ray-tracing-based volume rendering of 3D Gaussians.
GOF surpasses existing 3DGS-based methods in surface reconstruction and novel view synthesis.
arXiv Detail & Related papers (2024-04-16T17:57:19Z) - Spec-Gaussian: Anisotropic View-Dependent Appearance for 3D Gaussian Splatting [55.71424195454963]
Spec-Gaussian is an approach that utilizes an anisotropic spherical Gaussian appearance field instead of spherical harmonics.
Our experimental results demonstrate that our method surpasses existing approaches in terms of rendering quality.
This improvement extends the applicability of 3D GS to handle intricate scenarios with specular and anisotropic surfaces.
arXiv Detail & Related papers (2024-02-24T17:22:15Z) - NeRF-Gaze: A Head-Eye Redirection Parametric Model for Gaze Estimation [37.977032771941715]
We propose a novel Head-Eye redirection parametric model based on Neural Radiance Field.
Our model can decouple the face and eyes for separate neural rendering.
It can achieve the purpose of separately controlling the attributes of the face, identity, illumination, and eye gaze direction.
arXiv Detail & Related papers (2022-12-30T13:52:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.