Related papers: GazeProphet: Software-Only Gaze Prediction for VR Foveated Rendering

GazeProphet: Software-Only Gaze Prediction for VR Foveated Rendering

URL: http://arxiv.org/abs/2508.13546v2
Date: Thu, 09 Oct 2025 12:07:51 GMT
Title: GazeProphet: Software-Only Gaze Prediction for VR Foveated Rendering
Authors: Farhaan Ebadulla, Chiraag Mudlapur, Gaurav BV,
Abstract summary: Foveated rendering significantly reduces computational demands in virtual reality applications.<n>Current approaches require expensive hardware-based eye tracking systems.<n>This paper presents GazeProphet, a software-only approach for predicting gaze locations in VR environments.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Foveated rendering significantly reduces computational demands in virtual reality applications by concentrating rendering quality where users focus their gaze. Current approaches require expensive hardware-based eye tracking systems, limiting widespread adoption due to cost, calibration complexity, and hardware compatibility constraints. This paper presents GazeProphet, a software-only approach for predicting gaze locations in VR environments without requiring dedicated eye tracking hardware. The approach combines a Spherical Vision Transformer for processing 360-degree VR scenes with an LSTM-based temporal encoder that captures gaze sequence patterns. A multi-modal fusion network integrates spatial scene features with temporal gaze dynamics to predict future gaze locations with associated confidence estimates. Experimental evaluation on a comprehensive VR dataset demonstrates that GazeProphet achieves a median angular error of 3.83 degrees, outperforming traditional saliency-based baselines by 24% while providing reliable confidence calibration. The approach maintains consistent performance across different spatial regions and scene types, enabling practical deployment in VR systems without additional hardware requirements. Statistical analysis confirms the significance of improvements across all evaluation metrics. These results show that software-only gaze prediction can work for VR foveated rendering, making this performance boost more accessible to different VR platforms and apps.

Related papers

Gaze Prediction in Virtual Reality Without Eye Tracking Using Visual and Head Motion Cues [3.4383905541567583]
We present a novel gaze prediction framework that combines Head-Mounted Display (HMD) motion signals with visual saliency cues derived from video frames.<n>Our method employs UniSal, a lightweight saliency encoder, to extract visual features, which are then fused with HMD motion data and processed through a time-series prediction module.<n>Experiments on the EHTask dataset, along with deployment on commercial VR hardware, show that our approach consistently outperforms baselines such as Center-of-HMD and Mean Gaze.
arXiv Detail & Related papers (2026-01-26T11:26:27Z)
EyeTheia: A Lightweight and Accessible Eye-Tracking Toolbox [0.0]
EyeTheia is a lightweight and open deep learning pipeline for webcam-based gaze estimation.<n>It enables real-time gaze tracking using only a standard laptop webcam.<n>It combines MediaPipe-based landmark extraction with a convolutional neural network inspired by iTracker and optional user-specific fine-tuning.
arXiv Detail & Related papers (2026-01-09T19:49:01Z)
GazeProphetV2: Head-Movement-Based Gaze Prediction Enabling Efficient Foveated Rendering on Mobile VR [0.0]
This paper introduces a multimodal approach to VR gaze prediction that combines temporal gaze patterns, head movement data, and visual scene information.<n> Evaluations using a dataset spanning 22 VR scenes with 5.3M gaze samples show improvements in predictive accuracy when combining modalities.<n>Cross-scene generalization testing shows consistent performance with 93.1% validation accuracy and temporal consistency in predicted gaze trajectories.
arXiv Detail & Related papers (2025-11-25T06:55:39Z)
ESCA: Enabling Seamless Codec Avatar Execution through Algorithm and Hardware Co-Optimization for Virtual Reality [8.437724028285682]
Photo Codec Avatars (PCAs) generate high-fidelity human face renderings for Virtual Reality (VR) environments.<n>We propose an efficient post-training quantization (PTQ) method tailored for Codec Avatar models, enabling low-precision execution without compromising output quality.<n>We introduce ESCA, a full-stack optimization framework that accelerates PCA inference on edge VR platforms.
arXiv Detail & Related papers (2025-10-27T02:31:20Z)
SeedVR2: One-Step Video Restoration via Diffusion Adversarial Post-Training [82.68200031146299]
We propose a one-step diffusion-based VR model, termed as SeedVR2, which performs adversarial VR training against real data.<n>To handle the challenging high-resolution VR within a single step, we introduce several enhancements to both model architecture and training procedures.
arXiv Detail & Related papers (2025-06-05T17:51:05Z)
VRSplat: Fast and Robust Gaussian Splatting for Virtual Reality [47.738522999465864]
We introduce VRSplat: we combine and extend several recent advancements in 3DGS to address challenges of VR holistically.<n> VRSplat is the first, systematically evaluated 3DGS approach capable of supporting modern VR applications, achieving 72+ FPS while eliminating popping and stereo-disrupting floaters.
arXiv Detail & Related papers (2025-05-15T10:17:48Z)
Towards Consumer-Grade Cybersickness Prediction: Multi-Model Alignment for Real-Time Vision-Only Inference [3.4667973471411853]
Cybersickness is a major obstacle to the widespread adoption of immersive virtual reality (VR)<n>We propose a scalable, deployable framework for personalized cybersickness prediction.<n>Our framework supports real-time applications, ideal for integration into consumer-grade VR platforms.
arXiv Detail & Related papers (2025-01-02T11:41:43Z)
Extrapolated Urban View Synthesis Benchmark [53.657271730352214]
Photo simulators are essential for the training and evaluation of vision-centric autonomous vehicles (AVs)<n>At their core is Novel View Synthesis (NVS), a capability that generates diverse unseen viewpoints to accommodate the broad and continuous pose distribution of AVs.<n>Recent advances in radiance fields, such as 3D Gaussian Splatting, achieve photorealistic rendering at real-time speeds and have been widely used in modeling large-scale driving scenes.<n>We will release the data to help advance self-driving and urban robotics simulation technology.
arXiv Detail & Related papers (2024-12-06T18:41:39Z)
VR-Splatting: Foveated Radiance Field Rendering via 3D Gaussian Splatting and Neural Points [4.962171160815189]
We propose a novel hybrid approach that combines the strengths of both point rendering directions regarding performance sweet spots.<n>For the fovea only, we use neural points with a convolutional neural network for the small pixel footprint, which provides sharp, detailed output.<n>Our evaluation confirms that our approach increases sharpness and details compared to a standard VR-ready 3DGS configuration.
arXiv Detail & Related papers (2024-10-23T14:54:48Z)
Self-Avatar Animation in Virtual Reality: Impact of Motion Signals Artifacts on the Full-Body Pose Reconstruction [13.422686350235615]
We aim to measure the impact on the reconstruction of the articulated self-avatar's full-body pose. We analyze the motion reconstruction errors using ground truth and 3D Cartesian coordinates estimated from textitYOLOv8 pose estimation.
arXiv Detail & Related papers (2024-04-29T12:02:06Z)
Deep Learning for Real Time Satellite Pose Estimation on Low Power Edge TPU [58.720142291102135]
In this paper we propose a pose estimation software exploiting neural network architectures. We show how low power machine learning accelerators could enable Artificial Intelligence exploitation in space.
arXiv Detail & Related papers (2022-04-07T08:53:18Z)
Towards Scale Consistent Monocular Visual Odometry by Learning from the Virtual World [83.36195426897768]
We propose VRVO, a novel framework for retrieving the absolute scale from virtual data. We first train a scale-aware disparity network using both monocular real images and stereo virtual data. The resulting scale-consistent disparities are then integrated with a direct VO system.
arXiv Detail & Related papers (2022-03-11T01:51:54Z)
Meta-Reinforcement Learning for Reliable Communication in THz/VLC Wireless VR Networks [157.42035777757292]
The problem of enhancing the quality of virtual reality (VR) services is studied for an indoor terahertz (THz)/visible light communication (VLC) wireless network. Small base stations (SBSs) transmit high-quality VR images to VR users over THz bands and light-emitting diodes (LEDs) provide accurate indoor positioning services. To control the energy consumption of the studied THz/VLC wireless VR network, VLC access points (VAPs) must be selectively turned on.
arXiv Detail & Related papers (2021-01-29T15:57:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.