Realtime Dynamic Gaze Target Tracking and Depth-Level Estimation
- URL: http://arxiv.org/abs/2406.18595v1
- Date: Sun, 9 Jun 2024 20:52:47 GMT
- Title: Realtime Dynamic Gaze Target Tracking and Depth-Level Estimation
- Authors: Esmaeil Seraj, Harsh Bhate, Walter Talamonti,
- Abstract summary: Transparent Displays (TD) in various applications, such as Heads-Up Displays (HUDs) in vehicles, is a burgeoning field, poised to revolutionize user experiences.
This innovation brings forth significant challenges in realtime human-device interaction, particularly in accurately identifying and tracking a user's gaze on dynamically changing TDs.
We present a two-fold robust and efficient systematic solution for realtime gaze monitoring, comprised of: (1) a tree-based algorithm for identifying and dynamically tracking gaze targets; and (2) a multi-stream self-attention architecture to estimate the depth-level of human gaze from eye tracking data.
- Score: 6.435984242701043
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The integration of Transparent Displays (TD) in various applications, such as Heads-Up Displays (HUDs) in vehicles, is a burgeoning field, poised to revolutionize user experiences. However, this innovation brings forth significant challenges in realtime human-device interaction, particularly in accurately identifying and tracking a user's gaze on dynamically changing TDs. In this paper, we present a two-fold robust and efficient systematic solution for realtime gaze monitoring, comprised of: (1) a tree-based algorithm for identifying and dynamically tracking gaze targets (i.e., moving, size-changing, and overlapping 2D content) projected on a transparent display, in realtime; (2) a multi-stream self-attention architecture to estimate the depth-level of human gaze from eye tracking data, to account for the display's transparency and preventing undesired interactions with the TD. We collected a real-world eye-tracking dataset to train and test our gaze monitoring system. We present extensive results and ablation studies, including inference experiments on System on Chip (SoC) evaluation boards, demonstrating our model's scalability, precision, and realtime feasibility in both static and dynamic contexts. Our solution marks a significant stride in enhancing next-generation user-device interaction and experience, setting a new benchmark for algorithmic gaze monitoring technology in dynamic transparent displays.
Related papers
- Quantifying the Impact of Motion on 2D Gaze Estimation in Real-World Mobile Interactions [18.294511216241805]
This paper provides empirical evidence on how user mobility and behaviour affect mobile gaze tracking accuracy.
Head distance, head pose, and device orientation are key factors affecting accuracy.
Findings highlight the need for more robust, adaptive eye-tracking systems.
arXiv Detail & Related papers (2025-02-14T21:44:52Z) - Benchmarking Vision-Based Object Tracking for USVs in Complex Maritime Environments [0.8796261172196743]
Vision-based target tracking is crucial for unmanned surface vehicles.
Real-time tracking in maritime environments is challenging due to dynamic camera movement, low visibility, and scale variation.
This study proposes a vision-guided object-tracking framework for USVs.
arXiv Detail & Related papers (2024-12-10T10:35:17Z) - A Cross-Scene Benchmark for Open-World Drone Active Tracking [54.235808061746525]
Drone Visual Active Tracking aims to autonomously follow a target object by controlling the motion system based on visual observations.
We propose a unified cross-scene cross-domain benchmark for open-world drone active tracking called DAT.
We also propose a reinforcement learning-based drone tracking method called R-VAT.
arXiv Detail & Related papers (2024-12-01T09:37:46Z) - SocialEyes: Scaling mobile eye-tracking to multi-person social settings [34.82692226532414]
We developed a system to stream, record, and analyse synchronised data from multiple mobile eye-tracking devices during collective viewing experiences.
We tested the system in a live concert and a film screening with 30 simultaneous viewers during each of two public events (N=60)
Our novel analysis metrics and visualizations illustrate the potential of collective eye-tracking data for understanding collaborative behaviour and social interaction.
arXiv Detail & Related papers (2024-07-08T19:33:17Z) - Leveraging the Power of Data Augmentation for Transformer-based Tracking [64.46371987827312]
We propose two data augmentation methods customized for tracking.
First, we optimize existing random cropping via a dynamic search radius mechanism and simulation for boundary samples.
Second, we propose a token-level feature mixing augmentation strategy, which enables the model against challenges like background interference.
arXiv Detail & Related papers (2023-09-15T09:18:54Z) - MotionTrack: Learning Motion Predictor for Multiple Object Tracking [68.68339102749358]
We introduce a novel motion-based tracker, MotionTrack, centered around a learnable motion predictor.
Our experimental results demonstrate that MotionTrack yields state-of-the-art performance on datasets such as Dancetrack and SportsMOT.
arXiv Detail & Related papers (2023-06-05T04:24:11Z) - Towards Scale Consistent Monocular Visual Odometry by Learning from the
Virtual World [83.36195426897768]
We propose VRVO, a novel framework for retrieving the absolute scale from virtual data.
We first train a scale-aware disparity network using both monocular real images and stereo virtual data.
The resulting scale-consistent disparities are then integrated with a direct VO system.
arXiv Detail & Related papers (2022-03-11T01:51:54Z) - TRiPOD: Human Trajectory and Pose Dynamics Forecasting in the Wild [77.59069361196404]
TRiPOD is a novel method for predicting body dynamics based on graph attentional networks.
To incorporate a real-world challenge, we learn an indicator representing whether an estimated body joint is visible/invisible at each frame.
Our evaluation shows that TRiPOD outperforms all prior work and state-of-the-art specifically designed for each of the trajectory and pose forecasting tasks.
arXiv Detail & Related papers (2021-04-08T20:01:00Z) - RGB-D Railway Platform Monitoring and Scene Understanding for Enhanced
Passenger Safety [3.4298729855744026]
This paper proposes a flexible analysis scheme to detect and track humans on a ground plane.
We consider multiple combinations within a set of RGB- and depth-based detection and tracking modalities.
Results indicate that the combined use of depth-based spatial information and learned representations yields substantially enhanced detection and tracking accuracies.
arXiv Detail & Related papers (2021-02-23T14:44:34Z) - Training-free Monocular 3D Event Detection System for Traffic
Surveillance [93.65240041833319]
Existing event detection systems are mostly learning-based and have achieved convincing performance when a large amount of training data is available.
In real-world scenarios, collecting sufficient labeled training data is expensive and sometimes impossible.
We propose a training-free monocular 3D event detection system for traffic surveillance.
arXiv Detail & Related papers (2020-02-01T04:42:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.