Towards Consumer-Grade Cybersickness Prediction: Multi-Model Alignment for Real-Time Vision-Only Inference
- URL: http://arxiv.org/abs/2501.01212v3
- Date: Mon, 18 Aug 2025 01:24:12 GMT
- Title: Towards Consumer-Grade Cybersickness Prediction: Multi-Model Alignment for Real-Time Vision-Only Inference
- Authors: Yitong Zhu, Zhuowen Liang, Yiming Wu, Tangyao Li, Yuyang Wang,
- Abstract summary: Cybersickness is a major obstacle to the widespread adoption of immersive virtual reality (VR)<n>We propose a scalable, deployable framework for personalized cybersickness prediction.<n>Our framework supports real-time applications, ideal for integration into consumer-grade VR platforms.
- Score: 3.4667973471411853
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Cybersickness remains a major obstacle to the widespread adoption of immersive virtual reality (VR), particularly in consumer-grade environments. While prior methods rely on invasive signals such as electroencephalography (EEG) for high predictive accuracy, these approaches require specialized hardware and are impractical for real-world applications. In this work, we propose a scalable, deployable framework for personalized cybersickness prediction leveraging only non-invasive signals readily available from commercial VR headsets, including head motion, eye tracking, and physiological responses. Our model employs a modality-specific graph neural network enhanced with a Difference Attention Module to extract temporal-spatial embeddings capturing dynamic changes across modalities. A cross-modal alignment module jointly trains the video encoder to learn personalized traits by aligning video features with sensor-derived representations. Consequently, the model accurately predicts individual cybersickness using only video input during inference. Experimental results show our model achieves 88.4\% accuracy, closely matching EEG-based approaches (89.16\%), while reducing deployment complexity. With an average inference latency of 90ms, our framework supports real-time applications, ideal for integration into consumer-grade VR platforms without compromising personalization or performance. The code will be relesed at https://github.com/U235-Aurora/PTGNN.
Related papers
- DiffusionHarmonizer: Bridging Neural Reconstruction and Photorealistic Simulation with Online Diffusion Enhancer [62.18680935878919]
We introduce DiffusionHarmonizer, an online generative enhancement framework that transforms renderings into temporally consistent outputs.<n>At its core is a single-step temporally-conditioned enhancer capable of running in online simulators on a single GPU.
arXiv Detail & Related papers (2026-02-27T15:35:30Z) - Gaze Prediction in Virtual Reality Without Eye Tracking Using Visual and Head Motion Cues [3.4383905541567583]
We present a novel gaze prediction framework that combines Head-Mounted Display (HMD) motion signals with visual saliency cues derived from video frames.<n>Our method employs UniSal, a lightweight saliency encoder, to extract visual features, which are then fused with HMD motion data and processed through a time-series prediction module.<n>Experiments on the EHTask dataset, along with deployment on commercial VR hardware, show that our approach consistently outperforms baselines such as Center-of-HMD and Mean Gaze.
arXiv Detail & Related papers (2026-01-26T11:26:27Z) - REFA: Real-time Egocentric Facial Animations for Virtual Reality [56.82169742343143]
We present a novel system for real-time tracking of facial expressions using egocentric views captured from a set of infrared cameras embedded in a virtual reality (VR) headset.<n>Our technology facilitates any user to accurately drive the facial expressions of virtual characters in a non-intrusive manner.
arXiv Detail & Related papers (2026-01-07T01:41:46Z) - Predicting User Grasp Intentions in Virtual Reality [0.0]
We evaluate classification and regression approaches across 810 trials with varied object types, sizes, and manipulations.<n>Regression-based approaches demonstrate more robust performance, with timing errors within 0.25 seconds and distance errors around 5-20 cm.<n>Our results underscore the potential of machine learning models to enhance VR interactions.
arXiv Detail & Related papers (2025-08-05T15:17:19Z) - Securing Virtual Reality Experiences: Unveiling and Tackling Cybersickness Attacks with Explainable AI [2.076342899890871]
We present a new type of VR attack, i.e., a cybersickness attack, which successfully stops the triggering of cybersickness mitigation.
We propose a novel explainable artificial intelligence (XAI)-guided cybersickness attack detection framework to detect such attacks.
arXiv Detail & Related papers (2025-03-17T17:49:51Z) - ZIA: A Theoretical Framework for Zero-Input AI [0.0]
Zero-Input AI (ZIA) introduces a novel framework for human-computer interaction by enabling proactive intent prediction without explicit user commands.<n>It integrates gaze tracking, bio-signals (EEG, heart rate), and contextual data (time, location, usage history) into a multi-modal model for real-time inference.<n>ZIA provides a scalable, privacy-preserving framework for accessibility, healthcare, and consumer applications, advancing AI toward anticipatory intelligence.
arXiv Detail & Related papers (2025-02-22T07:42:05Z) - Robust Dual Gaussian Splatting for Immersive Human-centric Volumetric Videos [44.50599475213118]
We present a novel approach, dubbed textitDualGS, for real-time and high-fidelity playback of complex human performance.
Our approach achieves a compression ratio of up to 120 times, only requiring approximately 350KB of storage per frame.
We demonstrate the efficacy of our representation through photo-realistic, free-view experiences on VR headsets.
arXiv Detail & Related papers (2024-09-12T18:33:13Z) - Mazed and Confused: A Dataset of Cybersickness, Working Memory, Mental Load, Physical Load, and Attention During a Real Walking Task in VR [11.021668923244803]
Relationship between cognitive activities, physical activities, and familiar feelings of cybersickness is not well understood.
We collected head orientation, head position, eye tracking, images, physiological readings from external sensors, and self-reported cybersickness severity, physical load, and mental load in VR.
arXiv Detail & Related papers (2024-09-10T22:41:14Z) - Cybersickness Detection through Head Movement Patterns: A Promising
Approach [1.1562071835482226]
This research investigates head movement patterns as a novel physiological marker for cybersickness detection.
Head movements provide a continuous, non-invasive measure that can be easily captured through the sensors embedded in all commercial VR headsets.
arXiv Detail & Related papers (2024-02-05T04:49:59Z) - Deep Motion Masking for Secure, Usable, and Scalable Real-Time Anonymization of Virtual Reality Motion Data [49.68609500290361]
Recent studies have demonstrated that the motion tracking "telemetry" data used by nearly all VR applications is as uniquely identifiable as a fingerprint scan.
We present in this paper a state-of-the-art VR identification model that can convincingly bypass known defensive countermeasures.
arXiv Detail & Related papers (2023-11-09T01:34:22Z) - LiteVR: Interpretable and Lightweight Cybersickness Detection using
Explainable AI [1.1470070927586016]
Cybersickness is a common ailment associated with virtual reality (VR) user experiences.
We present an explainable artificial intelligence (XAI)-based framework, LiteVR, for cybersickness detection.
arXiv Detail & Related papers (2023-02-05T21:51:12Z) - VR-LENS: Super Learning-based Cybersickness Detection and Explainable
AI-Guided Deployment in Virtual Reality [1.9642496463491053]
This work presents an explainable artificial intelligence (XAI)-based framework VR-LENS for developing cybersickness detection ML models.
We first develop a novel super learning-based ensemble ML model for cybersickness detection.
Our proposed method identified eye tracking, player position, and galvanic skin/heart rate response as the most dominant features for the integrated sensor, gameplay, and bio-physiological datasets.
arXiv Detail & Related papers (2023-02-03T20:15:51Z) - Force-Aware Interface via Electromyography for Natural VR/AR Interaction [69.1332992637271]
We design a learning-based neural interface for natural and intuitive force inputs in VR/AR.
We show that our interface can decode finger-wise forces in real-time with 3.3% mean error, and generalize to new users with little calibration.
We envision our findings to push forward research towards more realistic physicality in future VR/AR.
arXiv Detail & Related papers (2022-10-03T20:51:25Z) - Differentiable Frequency-based Disentanglement for Aerial Video Action
Recognition [56.91538445510214]
We present a learning algorithm for human activity recognition in videos.
Our approach is designed for UAV videos, which are mainly acquired from obliquely placed dynamic cameras.
We conduct extensive experiments on the UAV Human dataset and the NEC Drone dataset.
arXiv Detail & Related papers (2022-09-15T22:16:52Z) - TruVR: Trustworthy Cybersickness Detection using Explainable Machine
Learning [1.9642496463491053]
Cybersickness can be characterized by nausea, vertigo, headache, eye strain, and other discomforts when using virtual reality (VR) systems.
The previously reported machine learning (ML) and deep learning (DL) algorithms for detecting (classification) and predicting (regression) VR cybersickness use black-box models.
We present three explainable machine learning (xML) models to detect and predict cybersickness.
arXiv Detail & Related papers (2022-09-12T13:55:13Z) - Real-time Object Detection for Streaming Perception [84.2559631820007]
Streaming perception is proposed to jointly evaluate the latency and accuracy into a single metric for video online perception.
We build a simple and effective framework for streaming perception.
Our method achieves competitive performance on Argoverse-HD dataset and improves the AP by 4.9% compared to the strong baseline.
arXiv Detail & Related papers (2022-03-23T11:33:27Z) - Towards Scale Consistent Monocular Visual Odometry by Learning from the
Virtual World [83.36195426897768]
We propose VRVO, a novel framework for retrieving the absolute scale from virtual data.
We first train a scale-aware disparity network using both monocular real images and stereo virtual data.
The resulting scale-consistent disparities are then integrated with a direct VO system.
arXiv Detail & Related papers (2022-03-11T01:51:54Z) - One to Many: Adaptive Instrument Segmentation via Meta Learning and
Dynamic Online Adaptation in Robotic Surgical Video [71.43912903508765]
MDAL is a dynamic online adaptive learning scheme for instrument segmentation in robot-assisted surgery.
It learns the general knowledge of instruments and the fast adaptation ability through the video-specific meta-learning paradigm.
It outperforms other state-of-the-art methods on two datasets.
arXiv Detail & Related papers (2021-03-24T05:02:18Z) - Relational Graph Learning on Visual and Kinematics Embeddings for
Accurate Gesture Recognition in Robotic Surgery [84.73764603474413]
We propose a novel online approach of multi-modal graph network (i.e., MRG-Net) to dynamically integrate visual and kinematics information.
The effectiveness of our method is demonstrated with state-of-the-art results on the public JIGSAWS dataset.
arXiv Detail & Related papers (2020-11-03T11:00:10Z) - AEGIS: A real-time multimodal augmented reality computer vision based
system to assist facial expression recognition for individuals with autism
spectrum disorder [93.0013343535411]
This paper presents the development of a multimodal augmented reality (AR) system which combines the use of computer vision and deep convolutional neural networks (CNN)
The proposed system, which we call AEGIS, is an assistive technology deployable on a variety of user devices including tablets, smartphones, video conference systems, or smartglasses.
We leverage both spatial and temporal information in order to provide an accurate expression prediction, which is then converted into its corresponding visualization and drawn on top of the original video frame.
arXiv Detail & Related papers (2020-10-22T17:20:38Z) - ThreeDWorld: A Platform for Interactive Multi-Modal Physical Simulation [75.0278287071591]
ThreeDWorld (TDW) is a platform for interactive multi-modal physical simulation.
TDW enables simulation of high-fidelity sensory data and physical interactions between mobile agents and objects in rich 3D environments.
We present initial experiments enabled by TDW in emerging research directions in computer vision, machine learning, and cognitive science.
arXiv Detail & Related papers (2020-07-09T17:33:27Z) - An Emerging Coding Paradigm VCM: A Scalable Coding Approach Beyond
Feature and Signal [99.49099501559652]
Video Coding for Machine (VCM) aims to bridge the gap between visual feature compression and classical video coding.
We employ a conditional deep generation network to reconstruct video frames with the guidance of learned motion pattern.
By learning to extract sparse motion pattern via a predictive model, the network elegantly leverages the feature representation to generate the appearance of to-be-coded frames.
arXiv Detail & Related papers (2020-01-09T14:18:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.