Related papers: Gaze Prediction in Virtual Reality Without Eye Tracking Using Visual and Head Motion Cues

Gaze Prediction in Virtual Reality Without Eye Tracking Using Visual and Head Motion Cues

URL: http://arxiv.org/abs/2601.18372v1
Date: Mon, 26 Jan 2026 11:26:27 GMT
Title: Gaze Prediction in Virtual Reality Without Eye Tracking Using Visual and Head Motion Cues
Authors: Christos Petrou, Harris Partaourides, Athanasios Balomenos, Yannis Kopsinis, Sotirios Chatzis,
Abstract summary: We present a novel gaze prediction framework that combines Head-Mounted Display (HMD) motion signals with visual saliency cues derived from video frames.<n>Our method employs UniSal, a lightweight saliency encoder, to extract visual features, which are then fused with HMD motion data and processed through a time-series prediction module.<n>Experiments on the EHTask dataset, along with deployment on commercial VR hardware, show that our approach consistently outperforms baselines such as Center-of-HMD and Mean Gaze.
Score: 3.4383905541567583
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Gaze prediction plays a critical role in Virtual Reality (VR) applications by reducing sensor-induced latency and enabling computationally demanding techniques such as foveated rendering, which rely on anticipating user attention. However, direct eye tracking is often unavailable due to hardware limitations or privacy concerns. To address this, we present a novel gaze prediction framework that combines Head-Mounted Display (HMD) motion signals with visual saliency cues derived from video frames. Our method employs UniSal, a lightweight saliency encoder, to extract visual features, which are then fused with HMD motion data and processed through a time-series prediction module. We evaluate two lightweight architectures, TSMixer and LSTM, for forecasting future gaze directions. Experiments on the EHTask dataset, along with deployment on commercial VR hardware, show that our approach consistently outperforms baselines such as Center-of-HMD and Mean Gaze. These results demonstrate the effectiveness of predictive gaze modeling in reducing perceptual lag and enhancing natural interaction in VR environments where direct eye tracking is constrained.

Related papers

EyeTheia: A Lightweight and Accessible Eye-Tracking Toolbox [0.0]
EyeTheia is a lightweight and open deep learning pipeline for webcam-based gaze estimation.<n>It enables real-time gaze tracking using only a standard laptop webcam.<n>It combines MediaPipe-based landmark extraction with a convolutional neural network inspired by iTracker and optional user-specific fine-tuning.
arXiv Detail & Related papers (2026-01-09T19:49:01Z)
GazeProphetV2: Head-Movement-Based Gaze Prediction Enabling Efficient Foveated Rendering on Mobile VR [0.0]
This paper introduces a multimodal approach to VR gaze prediction that combines temporal gaze patterns, head movement data, and visual scene information.<n> Evaluations using a dataset spanning 22 VR scenes with 5.3M gaze samples show improvements in predictive accuracy when combining modalities.<n>Cross-scene generalization testing shows consistent performance with 93.1% validation accuracy and temporal consistency in predicted gaze trajectories.
arXiv Detail & Related papers (2025-11-25T06:55:39Z)
GazeProphet: Software-Only Gaze Prediction for VR Foveated Rendering [0.0]
Foveated rendering significantly reduces computational demands in virtual reality applications.<n>Current approaches require expensive hardware-based eye tracking systems.<n>This paper presents GazeProphet, a software-only approach for predicting gaze locations in VR environments.
arXiv Detail & Related papers (2025-08-19T06:09:23Z)
SeedVR2: One-Step Video Restoration via Diffusion Adversarial Post-Training [82.68200031146299]
We propose a one-step diffusion-based VR model, termed as SeedVR2, which performs adversarial VR training against real data.<n>To handle the challenging high-resolution VR within a single step, we introduce several enhancements to both model architecture and training procedures.
arXiv Detail & Related papers (2025-06-05T17:51:05Z)
Towards Consumer-Grade Cybersickness Prediction: Multi-Model Alignment for Real-Time Vision-Only Inference [3.4667973471411853]
Cybersickness is a major obstacle to the widespread adoption of immersive virtual reality (VR)<n>We propose a scalable, deployable framework for personalized cybersickness prediction.<n>Our framework supports real-time applications, ideal for integration into consumer-grade VR platforms.
arXiv Detail & Related papers (2025-01-02T11:41:43Z)
Predictive Context-Awareness for Full-Immersive Multiuser Virtual Reality with Redirected Walking [5.393569497095572]
Future VR systems will require supporting wireless networking infrastructures operating in millimeter Wave (mmWave) frequencies. We propose the use of predictive context-awareness to optimize transmitter and receiver-side beamforming and beamsteering. We show that Long Short-Term Memory (LSTM) networks feature promising accuracy in predicting lateral movements.
arXiv Detail & Related papers (2023-03-31T09:09:17Z)
Propagate And Calibrate: Real-time Passive Non-line-of-sight Tracking [84.38335117043907]
We propose a purely passive method to track a person walking in an invisible room by only observing a relay wall. To excavate imperceptible changes in videos of the relay wall, we introduce difference frames as an essential carrier of temporal-local motion messages. To evaluate the proposed method, we build and publish the first dynamic passive NLOS tracking dataset, NLOS-Track.
arXiv Detail & Related papers (2023-03-21T12:18:57Z)
GIMO: Gaze-Informed Human Motion Prediction in Context [75.52839760700833]
We propose a large-scale human motion dataset that delivers high-quality body pose sequences, scene scans, and ego-centric views with eye gaze. Our data collection is not tied to specific scenes, which further boosts the motion dynamics observed from our subjects. To realize the full potential of gaze, we propose a novel network architecture that enables bidirectional communication between the gaze and motion branches.
arXiv Detail & Related papers (2022-04-20T13:17:39Z)
Real-time Object Detection for Streaming Perception [84.2559631820007]
Streaming perception is proposed to jointly evaluate the latency and accuracy into a single metric for video online perception. We build a simple and effective framework for streaming perception. Our method achieves competitive performance on Argoverse-HD dataset and improves the AP by 4.9% compared to the strong baseline.
arXiv Detail & Related papers (2022-03-23T11:33:27Z)
MotionHint: Self-Supervised Monocular Visual Odometry with Motion Constraints [70.76761166614511]
We present a novel self-supervised algorithm named MotionHint for monocular visual odometry (VO) Our MotionHint algorithm can be easily applied to existing open-sourced state-of-the-art SSM-VO systems.
arXiv Detail & Related papers (2021-09-14T15:35:08Z)
Towards End-to-end Video-based Eye-Tracking [50.0630362419371]
Estimating eye-gaze from images alone is a challenging task due to un-observable person-specific factors. We propose a novel dataset and accompanying method which aims to explicitly learn these semantic and temporal relationships. We demonstrate that the fusion of information from visual stimuli as well as eye images can lead towards achieving performance similar to literature-reported figures.
arXiv Detail & Related papers (2020-07-26T12:39:15Z)
Gaze-Sensing LEDs for Head Mounted Displays [73.88424800314634]
We exploit the sensing capability of LEDs to create low-power gaze tracker for virtual reality (VR) applications. We show that our gaze estimation method does not require complex dimension reduction techniques.
arXiv Detail & Related papers (2020-03-18T23:03:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.