Related papers: Understanding Cognitive States from Head & Hand Motion Data

Understanding Cognitive States from Head & Hand Motion Data

URL: http://arxiv.org/abs/2509.24255v1
Date: Mon, 29 Sep 2025 03:59:56 GMT
Title: Understanding Cognitive States from Head & Hand Motion Data
Authors: Kaiang Wen, Mark Roman Miller,
Abstract summary: We introduce a novel dataset of head and hand motion with frame-level annotations of states collected during structured decision-making tasks.<n>Our findings suggest that deep temporal models can infer subtle cognitive states from motion alone, achieving comparable performance with human observers.<n>This work demonstrates that standard VR telemetry contains strong patterns related to users' internal cognitive processes, which opens the door for a new generation of adaptive virtual environments.
Score: 1.0742675209112622
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: As virtual reality (VR) and augmented reality (AR) continue to gain popularity, head and hand motion data captured by consumer VR systems have become ubiquitous. Prior work shows that such telemetry can be highly identifying and reflect broad user traits, often aligning with intuitive "folk theories" of body language. However, it remains unclear to what extent motion kinematics encode more nuanced cognitive states, such as confusion, hesitation, and readiness, which lack clear correlates with motion. To investigate this, we introduce a novel dataset of head and hand motion with frame-level annotations of these states collected during structured decision-making tasks. Our findings suggest that deep temporal models can infer subtle cognitive states from motion alone, achieving comparable performance with human observers. This work demonstrates that standard VR telemetry contains strong patterns related to users' internal cognitive processes, which opens the door for a new generation of adaptive virtual environments. To enhance reproducibility and support future work, we will make our dataset and modeling framework publicly available.

Related papers

SARAH: Spatially Aware Real-time Agentic Humans [58.32612596034656]
We present the first real-time, fully causal method for spatially-aware conversational motion, deployable on a streaming VR headset.<n>Given a user's position and dyadic audio, our approach produces full-body motion that aligns gestures with speech while orienting the agent according to the user.<n>We validate our approach on a live VR system, bringing spatially-aware conversational agents to real-time deployment.
arXiv Detail & Related papers (2026-02-20T18:59:35Z)
Predicting User Grasp Intentions in Virtual Reality [0.0]
We evaluate classification and regression approaches across 810 trials with varied object types, sizes, and manipulations.<n>Regression-based approaches demonstrate more robust performance, with timing errors within 0.25 seconds and distance errors around 5-20 cm.<n>Our results underscore the potential of machine learning models to enhance VR interactions.
arXiv Detail & Related papers (2025-08-05T15:17:19Z)
Seamless Interaction: Dyadic Audiovisual Motion Modeling and Large-Scale Dataset [113.25650486482762]
We introduce the Seamless Interaction dataset, a large-scale collection of over 4,000 hours of face-to-face interaction footage.<n>This dataset enables the development of AI technologies that understand dyadic embodied dynamics.<n>We develop a suite of models that utilize the dataset to generate dyadic motion gestures and facial expressions aligned with human speech.
arXiv Detail & Related papers (2025-06-27T18:09:49Z)
Exploring Context-aware and LLM-driven Locomotion for Immersive Virtual Reality [8.469329222500726]
We propose a novel locomotion technique powered by large language models (LLMs)<n>We evaluate three locomotion methods: controller-based teleportation, voice-based steering, and our language model-driven approach.<n>Our findings indicate that the LLM-driven locomotion possesses comparable usability, presence, and cybersickness scores to established methods.
arXiv Detail & Related papers (2025-04-24T07:48:09Z)
ViRAC: A Vision-Reasoning Agent Head Movement Control Framework in Arbitrary Virtual Environments [0.13654846342364302]
We propose ViRAC, which exploits the common-sense knowledge and reasoning capabilities of large-scale models.<n>ViRAC produces more natural and context-aware head rotations than recent state-of-the-art techniques.
arXiv Detail & Related papers (2025-02-14T09:46:43Z)
Tremor Reduction for Accessible Ray Based Interaction in VR Applications [0.0]
Many traditional 2D interface interaction methods have been directly converted to work in a VR space with little alteration to the input mechanism. In this paper we propose the use of a low pass filter, to normalize user input noise, alleviating fine motor requirements during ray-based interaction.
arXiv Detail & Related papers (2024-05-12T17:07:16Z)
Deep Motion Masking for Secure, Usable, and Scalable Real-Time Anonymization of Virtual Reality Motion Data [49.68609500290361]
Recent studies have demonstrated that the motion tracking "telemetry" data used by nearly all VR applications is as uniquely identifiable as a fingerprint scan. We present in this paper a state-of-the-art VR identification model that can convincingly bypass known defensive countermeasures.
arXiv Detail & Related papers (2023-11-09T01:34:22Z)
Force-Aware Interface via Electromyography for Natural VR/AR Interaction [69.1332992637271]
We design a learning-based neural interface for natural and intuitive force inputs in VR/AR. We show that our interface can decode finger-wise forces in real-time with 3.3% mean error, and generalize to new users with little calibration. We envision our findings to push forward research towards more realistic physicality in future VR/AR.
arXiv Detail & Related papers (2022-10-03T20:51:25Z)
Stochastic Coherence Over Attention Trajectory For Continuous Learning In Video Streams [64.82800502603138]
This paper proposes a novel neural-network-based approach to progressively and autonomously develop pixel-wise representations in a video stream. The proposed method is based on a human-like attention mechanism that allows the agent to learn by observing what is moving in the attended locations. Our experiments leverage 3D virtual environments and they show that the proposed agents can learn to distinguish objects just by observing the video stream.
arXiv Detail & Related papers (2022-04-26T09:52:31Z)
GIMO: Gaze-Informed Human Motion Prediction in Context [75.52839760700833]
We propose a large-scale human motion dataset that delivers high-quality body pose sequences, scene scans, and ego-centric views with eye gaze. Our data collection is not tied to specific scenes, which further boosts the motion dynamics observed from our subjects. To realize the full potential of gaze, we propose a novel network architecture that enables bidirectional communication between the gaze and motion branches.
arXiv Detail & Related papers (2022-04-20T13:17:39Z)
Dynamic Modeling of Hand-Object Interactions via Tactile Sensing [133.52375730875696]
In this work, we employ a high-resolution tactile glove to perform four different interactive activities on a diversified set of objects. We build our model on a cross-modal learning framework and generate the labels using a visual processing pipeline to supervise the tactile model. This work takes a step on dynamics modeling in hand-object interactions from dense tactile sensing.
arXiv Detail & Related papers (2021-09-09T16:04:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.