Related papers: GazeProphetV2: Head-Movement-Based Gaze Prediction Enabling Efficient Foveated Rendering on Mobile VR

GazeProphetV2: Head-Movement-Based Gaze Prediction Enabling Efficient Foveated Rendering on Mobile VR

URL: http://arxiv.org/abs/2511.19988v1
Date: Tue, 25 Nov 2025 06:55:39 GMT
Title: GazeProphetV2: Head-Movement-Based Gaze Prediction Enabling Efficient Foveated Rendering on Mobile VR
Authors: Farhaan Ebadulla, Chiraag Mudlpaur, Shreya Chaurasia, Gaurav BV,
Abstract summary: This paper introduces a multimodal approach to VR gaze prediction that combines temporal gaze patterns, head movement data, and visual scene information.<n> Evaluations using a dataset spanning 22 VR scenes with 5.3M gaze samples show improvements in predictive accuracy when combining modalities.<n>Cross-scene generalization testing shows consistent performance with 93.1% validation accuracy and temporal consistency in predicted gaze trajectories.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Predicting gaze behavior in virtual reality environments remains a significant challenge with implications for rendering optimization and interface design. This paper introduces a multimodal approach to VR gaze prediction that combines temporal gaze patterns, head movement data, and visual scene information. By leveraging a gated fusion mechanism with cross-modal attention, the approach learns to adaptively weight gaze history, head movement, and scene content based on contextual relevance. Evaluations using a dataset spanning 22 VR scenes with 5.3M gaze samples demonstrate improvements in predictive accuracy when combining modalities compared to using individual data streams alone. The results indicate that integrating past gaze trajectories with head orientation and scene content enhances prediction accuracy across 1-3 future frames. Cross-scene generalization testing shows consistent performance with 93.1% validation accuracy and temporal consistency in predicted gaze trajectories. These findings contribute to understanding attention mechanisms in virtual environments while suggesting potential applications in rendering optimization, interaction design, and user experience evaluation. The approach represents a step toward more efficient virtual reality systems that can anticipate user attention patterns without requiring expensive eye tracking hardware.

Related papers

Gaze Prediction in Virtual Reality Without Eye Tracking Using Visual and Head Motion Cues [3.4383905541567583]
We present a novel gaze prediction framework that combines Head-Mounted Display (HMD) motion signals with visual saliency cues derived from video frames.<n>Our method employs UniSal, a lightweight saliency encoder, to extract visual features, which are then fused with HMD motion data and processed through a time-series prediction module.<n>Experiments on the EHTask dataset, along with deployment on commercial VR hardware, show that our approach consistently outperforms baselines such as Center-of-HMD and Mean Gaze.
arXiv Detail & Related papers (2026-01-26T11:26:27Z)
EyeTheia: A Lightweight and Accessible Eye-Tracking Toolbox [0.0]
EyeTheia is a lightweight and open deep learning pipeline for webcam-based gaze estimation.<n>It enables real-time gaze tracking using only a standard laptop webcam.<n>It combines MediaPipe-based landmark extraction with a convolutional neural network inspired by iTracker and optional user-specific fine-tuning.
arXiv Detail & Related papers (2026-01-09T19:49:01Z)
GazeTrack: High-Precision Eye Tracking Based on Regularization and Spatial Computing [2.4294291235324867]
We design a gaze collection framework and utilize high-precision equipment to gather the first precise benchmark dataset, GazeTrack.<n>We propose a novel shape error regularization method to constrain pupil ellipse fitting and train on open-source datasets.<n>We also invent a novel coordinate transformation method similar to paper unfolding to accurately predict gaze vectors on the GazeTrack dataset.
arXiv Detail & Related papers (2025-11-27T16:41:32Z)
See, Think, Act: Online Shopper Behavior Simulation with VLM Agents [58.92444959954643]
This paper investigates the integration of visual information, specifically webpage screenshots, into behavior simulation via VLMs.<n>We employ SFT for joint action prediction and rationale generation, conditioning on the full interaction context.<n>To further enhance reasoning capabilities, we integrate RL with a hierarchical reward structure, scaled by a difficulty-aware factor.
arXiv Detail & Related papers (2025-10-22T05:07:14Z)
Seeing My Future: Predicting Situated Interaction Behavior in Virtual Reality [44.83390932656039]
We introduce a hierarchical, intention-aware framework that models human intentions and predicts detailed situated behaviors.<n>We propose a dynamic Graph Convolutional Network (GCN) to effectively capture human-environment relationships.<n>Experiments on challenging real-world benchmarks and live VR environment demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2025-10-12T18:29:01Z)
GazeProphet: Software-Only Gaze Prediction for VR Foveated Rendering [0.0]
Foveated rendering significantly reduces computational demands in virtual reality applications.<n>Current approaches require expensive hardware-based eye tracking systems.<n>This paper presents GazeProphet, a software-only approach for predicting gaze locations in VR environments.
arXiv Detail & Related papers (2025-08-19T06:09:23Z)
Predicting User Grasp Intentions in Virtual Reality [0.0]
We evaluate classification and regression approaches across 810 trials with varied object types, sizes, and manipulations.<n>Regression-based approaches demonstrate more robust performance, with timing errors within 0.25 seconds and distance errors around 5-20 cm.<n>Our results underscore the potential of machine learning models to enhance VR interactions.
arXiv Detail & Related papers (2025-08-05T15:17:19Z)
V-HOP: Visuo-Haptic 6D Object Pose Tracking [18.25135101142697]
Humans naturally integrate vision and haptics for robust object perception during manipulation.<n>Prior object pose estimation research has attempted to combine visual and haptic/tactile feedback.<n>We introduce a new visuo-haptic transformer-based object pose tracker.
arXiv Detail & Related papers (2025-02-24T18:59:50Z)
GaussianPrediction: Dynamic 3D Gaussian Prediction for Motion Extrapolation and Free View Synthesis [71.24791230358065]
We introduce a novel framework that empowers 3D Gaussian representations with dynamic scene modeling and future scenario synthesis. GaussianPrediction can forecast future states from any viewpoint, using video observations of dynamic scenes. Our framework shows outstanding performance on both synthetic and real-world datasets, demonstrating its efficacy in predicting and rendering future environments.
arXiv Detail & Related papers (2024-05-30T06:47:55Z)
What Makes Pre-Trained Visual Representations Successful for Robust Manipulation? [57.92924256181857]
We find that visual representations designed for manipulation and control tasks do not necessarily generalize under subtle changes in lighting and scene texture. We find that emergent segmentation ability is a strong predictor of out-of-distribution generalization among ViT models.
arXiv Detail & Related papers (2023-11-03T18:09:08Z)
GIMO: Gaze-Informed Human Motion Prediction in Context [75.52839760700833]
We propose a large-scale human motion dataset that delivers high-quality body pose sequences, scene scans, and ego-centric views with eye gaze. Our data collection is not tied to specific scenes, which further boosts the motion dynamics observed from our subjects. To realize the full potential of gaze, we propose a novel network architecture that enables bidirectional communication between the gaze and motion branches.
arXiv Detail & Related papers (2022-04-20T13:17:39Z)
Towards Scale Consistent Monocular Visual Odometry by Learning from the Virtual World [83.36195426897768]
We propose VRVO, a novel framework for retrieving the absolute scale from virtual data. We first train a scale-aware disparity network using both monocular real images and stereo virtual data. The resulting scale-consistent disparities are then integrated with a direct VO system.
arXiv Detail & Related papers (2022-03-11T01:51:54Z)
Towards End-to-end Video-based Eye-Tracking [50.0630362419371]
Estimating eye-gaze from images alone is a challenging task due to un-observable person-specific factors. We propose a novel dataset and accompanying method which aims to explicitly learn these semantic and temporal relationships. We demonstrate that the fusion of information from visual stimuli as well as eye images can lead towards achieving performance similar to literature-reported figures.
arXiv Detail & Related papers (2020-07-26T12:39:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.