Related papers: 3D Gaze Tracking for Studying Collaborative Interactions in Mixed-Reality Environments

3D Gaze Tracking for Studying Collaborative Interactions in Mixed-Reality Environments

URL: http://arxiv.org/abs/2406.11003v1
Date: Sun, 16 Jun 2024 16:30:56 GMT
Title: 3D Gaze Tracking for Studying Collaborative Interactions in Mixed-Reality Environments
Authors: Eduardo Davalos, Yike Zhang, Ashwin T. S., Joyce H. Fonteles, Umesh Timalsina, Guatam Biswas,
Abstract summary: This study presents a novel framework for 3D gaze tracking tailored for mixed-reality settings. Our proposed framework leverages state-of-the-art computer vision and machine learning techniques to overcome obstacles.
Score: 3.8075244788223044
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This study presents a novel framework for 3D gaze tracking tailored for mixed-reality settings, aimed at enhancing joint attention and collaborative efforts in team-based scenarios. Conventional gaze tracking, often limited by monocular cameras and traditional eye-tracking apparatus, struggles with simultaneous data synchronization and analysis from multiple participants in group contexts. Our proposed framework leverages state-of-the-art computer vision and machine learning techniques to overcome these obstacles, enabling precise 3D gaze estimation without dependence on specialized hardware or complex data fusion. Utilizing facial recognition and deep learning, the framework achieves real-time, tracking of gaze patterns across several individuals, addressing common depth estimation errors, and ensuring spatial and identity consistency within the dataset. Empirical results demonstrate the accuracy and reliability of our method in group environments. This provides mechanisms for significant advances in behavior and interaction analysis in educational and professional training applications in dynamic and unstructured environments.

Related papers

CRIA: A Cross-View Interaction and Instance-Adapted Pre-training Framework for Generalizable EEG Representations [52.251569042852815]
CRIA is an adaptive framework that utilizes variable-length and variable-channel coding to achieve a unified representation of EEG data across different datasets.<n>The model employs a cross-attention mechanism to fuse temporal, spectral, and spatial features effectively.<n> Experimental results on the Temple University EEG corpus and the CHB-MIT dataset show that CRIA outperforms existing methods with the same pre-training conditions.
arXiv Detail & Related papers (2025-06-19T06:31:08Z)
Spatial Understanding from Videos: Structured Prompts Meet Simulation Data [79.52833996220059]
We present a unified framework for enhancing 3D spatial reasoning in pre-trained vision-language models without modifying their architecture.<n>This framework combines SpatialMind, a structured prompting strategy that decomposes complex scenes and questions into interpretable reasoning steps, with ScanForgeQA, a scalable question-answering dataset built from diverse 3D simulation scenes.
arXiv Detail & Related papers (2025-06-04T07:36:33Z)
Multi-person eye tracking for real-world scene perception in social settings [34.82692226532414]
We apply mobile eye tracking in a real-world multi-person setup and develop a system to stream, record, and analyse synchronised data. Our system achieves precise time synchronisation and accurate gaze projection in challenging dynamic scenes. This advancement enables insight into collaborative behaviour, group dynamics, and social interaction, with high ecological validity.
arXiv Detail & Related papers (2024-07-08T19:33:17Z)
I-MPN: Inductive Message Passing Network for Efficient Human-in-the-Loop Annotation of Mobile Eye Tracking Data [4.487146086221174]
We present a novel human-centered learning algorithm designed for automated object recognition within mobile eye-tracking settings. Our approach seamlessly integrates an object detector with a spatial relation-aware inductive message-passing network (I-MPN), harnessing node profile information and capturing object correlations.
arXiv Detail & Related papers (2024-06-10T13:08:31Z)
Robust Collaborative Perception without External Localization and Clock Devices [52.32342059286222]
A consistent spatial-temporal coordination across multiple agents is fundamental for collaborative perception. Traditional methods depend on external devices to provide localization and clock signals. We propose a novel approach: aligning by recognizing the inherent geometric patterns within the perceptual data of various agents.
arXiv Detail & Related papers (2024-05-05T15:20:36Z)
CoSD: Collaborative Stance Detection with Contrastive Heterogeneous Topic Graph Learning [18.75039816544345]
We present a novel collaborative stance detection framework called (CoSD) CoSD learns topic-aware semantics and collaborative signals among texts, topics, and stance labels. Experiments on two benchmark datasets demonstrate the state-of-the-art detection performance of CoSD.
arXiv Detail & Related papers (2024-04-26T02:04:05Z)
Enhancing HOI Detection with Contextual Cues from Large Vision-Language Models [56.257840490146]
ConCue is a novel approach for improving visual feature extraction in HOI detection. We develop a transformer-based feature extraction module with a multi-tower architecture that integrates contextual cues into both instance and interaction detectors.
arXiv Detail & Related papers (2023-11-26T09:11:32Z)
Re-mine, Learn and Reason: Exploring the Cross-modal Semantic Correlations for Language-guided HOI detection [57.13665112065285]
Human-Object Interaction (HOI) detection is a challenging computer vision task. We present a framework that enhances HOI detection by incorporating structured text knowledge.
arXiv Detail & Related papers (2023-07-25T14:20:52Z)
Asynchronous Collaborative Localization by Integrating Spatiotemporal Graph Learning with Model-Based Estimation [22.63837164001751]
Collaborative localization is an essential capability for a team of robots such as connected vehicles to collaboratively estimate object locations. To enable collaborative localization, four key challenges must be addressed, including modeling complex relationships between observed objects. We introduce a novel approach that integrates uncertainty-aware graph learning model and state estimation for a team of robots to collaboratively localize objects.
arXiv Detail & Related papers (2021-11-05T22:48:13Z)
A Variational Information Bottleneck Approach to Multi-Omics Data Integration [98.6475134630792]
We propose a deep variational information bottleneck (IB) approach for incomplete multi-view observations. Our method applies the IB framework on marginal and joint representations of the observed views to focus on intra-view and inter-view interactions that are relevant for the target. Experiments on real-world datasets show that our method consistently achieves gain from data integration and outperforms state-of-the-art benchmarks.
arXiv Detail & Related papers (2021-02-05T06:05:39Z)
Self-supervised Human Detection and Segmentation via Multi-view Consensus [116.92405645348185]
We propose a multi-camera framework in which geometric constraints are embedded in the form of multi-view consistency during training. We show that our approach outperforms state-of-the-art self-supervised person detection and segmentation techniques on images that visually depart from those of standard benchmarks.
arXiv Detail & Related papers (2020-12-09T15:47:21Z)
Benchmarking Unsupervised Object Representations for Video Sequences [111.81492107649889]
We compare the perceptual abilities of four object-centric approaches: ViMON, OP3, TBA and SCALOR. Our results suggest that the architectures with unconstrained latent representations learn more powerful representations in terms of object detection, segmentation and tracking. Our benchmark may provide fruitful guidance towards learning more robust object-centric video representations.
arXiv Detail & Related papers (2020-06-12T09:37:24Z)
Taskology: Utilizing Task Relations at Scale [28.09712466727001]
We show that we can leverage the inherent relationships among collections of tasks, as they are trained jointly. explicitly utilizing the relationships between tasks allows improving their performance while dramatically reducing the need for labeled data. We demonstrate our framework on subsets of the following collection of tasks: depth and normal prediction, semantic segmentation, 3D motion and ego-motion estimation, and object tracking and 3D detection in point clouds.
arXiv Detail & Related papers (2020-05-14T22:53:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.