Seeing My Future: Predicting Situated Interaction Behavior in Virtual Reality
- URL: http://arxiv.org/abs/2510.10742v1
- Date: Sun, 12 Oct 2025 18:29:01 GMT
- Title: Seeing My Future: Predicting Situated Interaction Behavior in Virtual Reality
- Authors: Yuan Xu, Zimu Zhang, Xiaoxuan Ma, Wentao Zhu, Yu Qiao, Yizhou Wang,
- Abstract summary: We introduce a hierarchical, intention-aware framework that models human intentions and predicts detailed situated behaviors.<n>We propose a dynamic Graph Convolutional Network (GCN) to effectively capture human-environment relationships.<n>Experiments on challenging real-world benchmarks and live VR environment demonstrate the effectiveness of our approach.
- Score: 44.83390932656039
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Virtual and augmented reality systems increasingly demand intelligent adaptation to user behaviors for enhanced interaction experiences. Achieving this requires accurately understanding human intentions and predicting future situated behaviors - such as gaze direction and object interactions - which is vital for creating responsive VR/AR environments and applications like personalized assistants. However, accurate behavioral prediction demands modeling the underlying cognitive processes that drive human-environment interactions. In this work, we introduce a hierarchical, intention-aware framework that models human intentions and predicts detailed situated behaviors by leveraging cognitive mechanisms. Given historical human dynamics and the observation of scene contexts, our framework first identifies potential interaction targets and forecasts fine-grained future behaviors. We propose a dynamic Graph Convolutional Network (GCN) to effectively capture human-environment relationships. Extensive experiments on challenging real-world benchmarks and live VR environment demonstrate the effectiveness of our approach, achieving superior performance across all metrics and enabling practical applications for proactive VR systems that anticipate user behaviors and adapt virtual environments accordingly.
Related papers
- Seeing is Believing (and Predicting): Context-Aware Multi-Human Behavior Prediction with Vision Language Models [8.568706722040421]
We present CAMP-VLM (Context-Aware Multi-human behavior Prediction): a Vision Language Model (VLM)-based framework.<n> CAMP-VLM incorporates contextual features from visual input and spatial awareness from scene graphs to enhance prediction of humans-scene interactions.<n>It outperforms the best-performing baseline by up to 66.9% in prediction accuracy.
arXiv Detail & Related papers (2025-12-17T20:44:32Z) - PhysHSI: Towards a Real-World Generalizable and Natural Humanoid-Scene Interaction System [67.2851799763138]
PhysHSI comprises a simulation training pipeline and a real-world deployment system.<n>In simulation, we adopt adversarial motion prior-based policy learning to imitate natural humanoid-scene interaction data.<n>For real-world deployment, we introduce a coarse-to-fine object localization module that combines LiDAR and camera inputs.
arXiv Detail & Related papers (2025-10-13T07:11:37Z) - Towards Immersive Human-X Interaction: A Real-Time Framework for Physically Plausible Motion Synthesis [51.95817740348585]
Human-X is a novel framework designed to enable immersive and physically plausible human interactions across diverse entities.<n>Our method jointly predicts actions and reactions in real-time using an auto-regressive reaction diffusion planner.<n>Our framework is validated in real-world applications, including virtual reality interface for human-robot interaction.
arXiv Detail & Related papers (2025-08-04T06:35:48Z) - HUMOF: Human Motion Forecasting in Interactive Social Scenes [29.621970821619424]
Complex scenes present significant challenges for predicting human behaviour due to the abundance of interaction information.<n>We propose an effective method for human motion forecasting in interactive scenes.<n>Our method achieves state-of-the-art performance across four public datasets.
arXiv Detail & Related papers (2025-06-04T09:21:54Z) - ViRAC: A Vision-Reasoning Agent Head Movement Control Framework in Arbitrary Virtual Environments [0.13654846342364302]
We propose ViRAC, which exploits the common-sense knowledge and reasoning capabilities of large-scale models.<n>ViRAC produces more natural and context-aware head rotations than recent state-of-the-art techniques.
arXiv Detail & Related papers (2025-02-14T09:46:43Z) - Multi-Agent Dynamic Relational Reasoning for Social Robot Navigation [50.01551945190676]
Social robot navigation can be helpful in various contexts of daily life but requires safe human-robot interactions and efficient trajectory planning.
We propose a systematic relational reasoning approach with explicit inference of the underlying dynamically evolving relational structures.
We demonstrate its effectiveness for multi-agent trajectory prediction and social robot navigation.
arXiv Detail & Related papers (2024-01-22T18:58:22Z) - Learn to Predict How Humans Manipulate Large-sized Objects from
Interactive Motions [82.90906153293585]
We propose a graph neural network, HO-GCN, to fuse motion data and dynamic descriptors for the prediction task.
We show the proposed network that consumes dynamic descriptors can achieve state-of-the-art prediction results and help the network better generalize to unseen objects.
arXiv Detail & Related papers (2022-06-25T09:55:39Z) - GIMO: Gaze-Informed Human Motion Prediction in Context [75.52839760700833]
We propose a large-scale human motion dataset that delivers high-quality body pose sequences, scene scans, and ego-centric views with eye gaze.
Our data collection is not tied to specific scenes, which further boosts the motion dynamics observed from our subjects.
To realize the full potential of gaze, we propose a novel network architecture that enables bidirectional communication between the gaze and motion branches.
arXiv Detail & Related papers (2022-04-20T13:17:39Z) - Predicting the Future from First Person (Egocentric) Vision: A Survey [18.07516837332113]
This survey summarises the evolution of studies in the context of future prediction from egocentric vision.
It makes an overview of applications, devices, existing problems, commonly used datasets, models and input modalities.
Our analysis highlights that methods for future prediction from egocentric vision can have a significant impact in a range of applications.
arXiv Detail & Related papers (2021-07-28T14:58:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.