Multi-person eye tracking for real-world scene perception in social settings
- URL: http://arxiv.org/abs/2407.06345v1
- Date: Mon, 8 Jul 2024 19:33:17 GMT
- Title: Multi-person eye tracking for real-world scene perception in social settings
- Authors: Shreshth Saxena, Areez Visram, Neil Lobo, Zahid Mirza, Mehak Rafi Khan, Biranugan Pirabaharan, Alexander Nguyen, Lauren K. Fink,
- Abstract summary: We apply mobile eye tracking in a real-world multi-person setup and develop a system to stream, record, and analyse synchronised data.
Our system achieves precise time synchronisation and accurate gaze projection in challenging dynamic scenes.
This advancement enables insight into collaborative behaviour, group dynamics, and social interaction, with high ecological validity.
- Score: 34.82692226532414
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Eye movements provide a window into human behaviour, attention, and interaction dynamics. Previous research suggests that eye movements are highly influenced by task, setting, and social others; however, most eye tracking research is conducted in single-person, in-lab settings and is yet to be validated in multi-person, naturalistic contexts. One such prevalent real-world context is the collective viewing of a shared scene in social settings, for example, viewing a concert, film, lecture, sports, etc. Here, we apply mobile eye tracking in a real-world multi-person setup and develop a system to stream, record, and analyse synchronised data. We tested our proposed, open-source system while participants (N=60) watched a live concert and a documentary film screening during a public event. We tackled challenges related to networking bandwidth requirements, real-time monitoring, and gaze projection from individual egocentric perspectives to a common coordinate space for shared gaze analysis. Our system achieves precise time synchronisation and accurate gaze projection in challenging dynamic scenes. Further, to illustrate the potential of collective eye-tracking data, we introduce and evaluate novel analysis metrics and visualisations. Overall, our approach contributes to the development and application of versatile multi-person eye tracking systems in real-world social settings. This advancement enables insight into collaborative behaviour, group dynamics, and social interaction, with high ecological validity. Moreover, it paves the path for innovative, interactive tools that promote collaboration and coordination in social contexts.
Related papers
- Temporally Consistent Dynamic Scene Graphs: An End-to-End Approach for Action Tracklet Generation [1.6584112749108326]
TCDSG, Temporally Consistent Dynamic Scene Graphs, is an end-to-end framework that detects, tracks, and links subject-object relationships across time.
Our work sets a new standard in multi-frame video analysis, opening new avenues for high-impact applications in surveillance, autonomous navigation, and beyond.
arXiv Detail & Related papers (2024-12-03T20:19:20Z) - Visual-Geometric Collaborative Guidance for Affordance Learning [63.038406948791454]
We propose a visual-geometric collaborative guided affordance learning network that incorporates visual and geometric cues.
Our method outperforms the representative models regarding objective metrics and visual quality.
arXiv Detail & Related papers (2024-10-15T07:35:51Z) - Spherical World-Locking for Audio-Visual Localization in Egocentric Videos [53.658928180166534]
We propose Spherical World-Locking as a general framework for egocentric scene representation.
Compared to conventional head-locked egocentric representations with a 2D planar field-of-view, SWL effectively offsets challenges posed by self-motion.
We design a unified encoder-decoder transformer architecture that preserves the spherical structure of the scene representation.
arXiv Detail & Related papers (2024-08-09T22:29:04Z) - 3D Gaze Tracking for Studying Collaborative Interactions in Mixed-Reality Environments [3.8075244788223044]
This study presents a novel framework for 3D gaze tracking tailored for mixed-reality settings.
Our proposed framework leverages state-of-the-art computer vision and machine learning techniques to overcome obstacles.
arXiv Detail & Related papers (2024-06-16T16:30:56Z) - I-MPN: Inductive Message Passing Network for Efficient Human-in-the-Loop Annotation of Mobile Eye Tracking Data [4.487146086221174]
We present a novel human-centered learning algorithm designed for automated object recognition within mobile eye-tracking settings.
Our approach seamlessly integrates an object detector with a spatial relation-aware inductive message-passing network (I-MPN), harnessing node profile information and capturing object correlations.
arXiv Detail & Related papers (2024-06-10T13:08:31Z) - Realtime Dynamic Gaze Target Tracking and Depth-Level Estimation [6.435984242701043]
Transparent Displays (TD) in various applications, such as Heads-Up Displays (HUDs) in vehicles, is a burgeoning field, poised to revolutionize user experiences.
This innovation brings forth significant challenges in realtime human-device interaction, particularly in accurately identifying and tracking a user's gaze on dynamically changing TDs.
We present a two-fold robust and efficient systematic solution for realtime gaze monitoring, comprised of: (1) a tree-based algorithm for identifying and dynamically tracking gaze targets; and (2) a multi-stream self-attention architecture to estimate the depth-level of human gaze from eye tracking data.
arXiv Detail & Related papers (2024-06-09T20:52:47Z) - HIG: Hierarchical Interlacement Graph Approach to Scene Graph Generation in Video Understanding [8.10024991952397]
Existing methods focus on complex interactivities while leveraging a simple relationship model.
We propose a new approach named Hierarchical Interlacement Graph (HIG), which leverages a unified layer and graph within a hierarchical structure.
Our approach demonstrates superior performance to other methods through extensive experiments conducted in various scenarios.
arXiv Detail & Related papers (2023-12-05T18:47:19Z) - Spatial-Temporal Knowledge-Embedded Transformer for Video Scene Graph
Generation [64.85974098314344]
Video scene graph generation (VidSGG) aims to identify objects in visual scenes and infer their relationships for a given video.
Inherently, object pairs and their relationships enjoy spatial co-occurrence correlations within each image and temporal consistency/transition correlations across different images.
We propose a spatial-temporal knowledge-embedded transformer (STKET) that incorporates the prior spatial-temporal knowledge into the multi-head cross-attention mechanism.
arXiv Detail & Related papers (2023-09-23T02:40:28Z) - Co-Located Human-Human Interaction Analysis using Nonverbal Cues: A
Survey [71.43956423427397]
We aim to identify the nonverbal cues and computational methodologies resulting in effective performance.
This survey differs from its counterparts by involving the widest spectrum of social phenomena and interaction settings.
Some major observations are: the most often used nonverbal cue, computational method, interaction environment, and sensing approach are speaking activity, support vector machines, and meetings composed of 3-4 persons equipped with microphones and cameras, respectively.
arXiv Detail & Related papers (2022-07-20T13:37:57Z) - Do Pedestrians Pay Attention? Eye Contact Detection in the Wild [75.54077277681353]
In urban environments, humans rely on eye contact for fast and efficient communication with nearby people.
In this paper, we focus on eye contact detection in the wild, i.e., real-world scenarios for autonomous vehicles with no control over the environment or the distance of pedestrians.
We introduce a model that leverages semantic keypoints to detect eye contact and show that this high-level representation achieves state-of-the-art results on the publicly-available dataset JAAD.
To study domain adaptation, we create LOOK: a large-scale dataset for eye contact detection in the wild, which focuses on diverse and un
arXiv Detail & Related papers (2021-12-08T10:21:28Z) - Weakly Supervised Human-Object Interaction Detection in Video via
Contrastive Spatiotemporal Regions [81.88294320397826]
A system does not know what human-object interactions are present in a video as or the actual location of the human and object.
We introduce a dataset comprising over 6.5k videos with human-object interaction that have been curated from sentence captions.
We demonstrate improved performance over weakly supervised baselines adapted to our annotations on our video dataset.
arXiv Detail & Related papers (2021-10-07T15:30:18Z) - TRiPOD: Human Trajectory and Pose Dynamics Forecasting in the Wild [77.59069361196404]
TRiPOD is a novel method for predicting body dynamics based on graph attentional networks.
To incorporate a real-world challenge, we learn an indicator representing whether an estimated body joint is visible/invisible at each frame.
Our evaluation shows that TRiPOD outperforms all prior work and state-of-the-art specifically designed for each of the trajectory and pose forecasting tasks.
arXiv Detail & Related papers (2021-04-08T20:01:00Z) - iGibson, a Simulation Environment for Interactive Tasks in Large
Realistic Scenes [54.04456391489063]
iGibson is a novel simulation environment to develop robotic solutions for interactive tasks in large-scale realistic scenes.
Our environment contains fifteen fully interactive home-sized scenes populated with rigid and articulated objects.
iGibson features enable the generalization of navigation agents, and that the human-iGibson interface and integrated motion planners facilitate efficient imitation learning of simple human demonstrated behaviors.
arXiv Detail & Related papers (2020-12-05T02:14:17Z) - Non-contact Real time Eye Gaze Mapping System Based on Deep
Convolutional Neural Network [0.0]
We propose a non-contact gaze mapping system applicable in real-world environments.
We introduce the GIST Gaze Mapping dataset, a Gaze mapping dataset created to learn and evaluate gaze mapping.
arXiv Detail & Related papers (2020-09-10T02:37:37Z) - Automated analysis of eye-tracker-based human-human interaction studies [2.433293618209319]
We investigate which state-of-the-art computer vision algorithms may be used to automate the post-analysis of mobile eye-tracking data.
For the case study in this paper, we focus on mobile eye-tracker recordings made during human-human face-to-face interactions.
We show that the use of this single-pipeline framework provides robust results, which are both more accurate and faster than previous work in the field.
arXiv Detail & Related papers (2020-07-09T10:00:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.