Realtime Dynamic Gaze Target Tracking and Depth-Level Estimation
- URL: http://arxiv.org/abs/2406.18595v1
- Date: Sun, 9 Jun 2024 20:52:47 GMT
- Title: Realtime Dynamic Gaze Target Tracking and Depth-Level Estimation
- Authors: Esmaeil Seraj, Harsh Bhate, Walter Talamonti,
- Abstract summary: Transparent Displays (TD) in various applications, such as Heads-Up Displays (HUDs) in vehicles, is a burgeoning field, poised to revolutionize user experiences.
This innovation brings forth significant challenges in realtime human-device interaction, particularly in accurately identifying and tracking a user's gaze on dynamically changing TDs.
We present a two-fold robust and efficient systematic solution for realtime gaze monitoring, comprised of: (1) a tree-based algorithm for identifying and dynamically tracking gaze targets; and (2) a multi-stream self-attention architecture to estimate the depth-level of human gaze from eye tracking data.
- Score: 6.435984242701043
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The integration of Transparent Displays (TD) in various applications, such as Heads-Up Displays (HUDs) in vehicles, is a burgeoning field, poised to revolutionize user experiences. However, this innovation brings forth significant challenges in realtime human-device interaction, particularly in accurately identifying and tracking a user's gaze on dynamically changing TDs. In this paper, we present a two-fold robust and efficient systematic solution for realtime gaze monitoring, comprised of: (1) a tree-based algorithm for identifying and dynamically tracking gaze targets (i.e., moving, size-changing, and overlapping 2D content) projected on a transparent display, in realtime; (2) a multi-stream self-attention architecture to estimate the depth-level of human gaze from eye tracking data, to account for the display's transparency and preventing undesired interactions with the TD. We collected a real-world eye-tracking dataset to train and test our gaze monitoring system. We present extensive results and ablation studies, including inference experiments on System on Chip (SoC) evaluation boards, demonstrating our model's scalability, precision, and realtime feasibility in both static and dynamic contexts. Our solution marks a significant stride in enhancing next-generation user-device interaction and experience, setting a new benchmark for algorithmic gaze monitoring technology in dynamic transparent displays.
Related papers
- Salient Object Detection in Traffic Scene through the TSOD10K Dataset [22.615252113004402]
Traffic Salient Object Detection (TSOD) aims to segment the objects critical to driving safety by combining semantic (e.g., collision risks) and visual saliency.
Our research establishes the first foundation for safety-aware saliency analysis in intelligent transportation systems.
arXiv Detail & Related papers (2025-03-21T07:21:24Z) - Ground-level Viewpoint Vision-and-Language Navigation in Continuous Environments [10.953629652228024]
Vision-and-Language Navigation (VLN) agents associate time-sequenced visual observations with corresponding instructions to make decisions.
In this paper, we address the mismatch between human-centric instructions and quadruped robots with a low-height field of view.
We propose a Ground-level Viewpoint Navigation (GVNav) approach to mitigate this issue.
arXiv Detail & Related papers (2025-02-26T10:30:40Z) - Quantifying the Impact of Motion on 2D Gaze Estimation in Real-World Mobile Interactions [18.294511216241805]
This paper provides empirical evidence on how user mobility and behaviour affect mobile gaze tracking accuracy.
Head distance, head pose, and device orientation are key factors affecting accuracy.
Findings highlight the need for more robust, adaptive eye-tracking systems.
arXiv Detail & Related papers (2025-02-14T21:44:52Z) - A Cross-Scene Benchmark for Open-World Drone Active Tracking [54.235808061746525]
Drone Visual Active Tracking aims to autonomously follow a target object by controlling the motion system based on visual observations.
We propose a unified cross-scene cross-domain benchmark for open-world drone active tracking called DAT.
We also propose a reinforcement learning-based drone tracking method called R-VAT.
arXiv Detail & Related papers (2024-12-01T09:37:46Z) - Comparing Optical Flow and Deep Learning to Enable Computationally Efficient Traffic Event Detection with Space-Filling Curves [0.6322312717516407]
We compare Optical Flow (OF) and Deep Learning (DL) to feed computationally efficient event detection via space-filling curves on video data from a forward-facing, in-vehicle camera.
Our results yield that the OF approach excels in specificity and reduces false positives, while the DL approach demonstrates superior sensitivity.
arXiv Detail & Related papers (2024-07-15T13:44:52Z) - LEAP-VO: Long-term Effective Any Point Tracking for Visual Odometry [52.131996528655094]
We present the Long-term Effective Any Point Tracking (LEAP) module.
LEAP innovatively combines visual, inter-track, and temporal cues with mindfully selected anchors for dynamic track estimation.
Based on these traits, we develop LEAP-VO, a robust visual odometry system adept at handling occlusions and dynamic scenes.
arXiv Detail & Related papers (2024-01-03T18:57:27Z) - GADY: Unsupervised Anomaly Detection on Dynamic Graphs [18.1896489628884]
We propose a continuous dynamic graph model to capture the fine-grained information, which breaks the limit of existing discrete methods.
For the second challenge, we pioneer the use of Generative Adversarial Networks to generate negative interactions.
Our proposed GADY significantly outperforms the previous state-of-the-art method on three real-world datasets.
arXiv Detail & Related papers (2023-10-25T05:27:45Z) - Leveraging the Power of Data Augmentation for Transformer-based Tracking [64.46371987827312]
We propose two data augmentation methods customized for tracking.
First, we optimize existing random cropping via a dynamic search radius mechanism and simulation for boundary samples.
Second, we propose a token-level feature mixing augmentation strategy, which enables the model against challenges like background interference.
arXiv Detail & Related papers (2023-09-15T09:18:54Z) - Active Sensing with Predictive Coding and Uncertainty Minimization [0.0]
We present an end-to-end procedure for embodied exploration inspired by two biological computations.
We first demonstrate our approach in a maze navigation task and show that it can discover the underlying transition distributions and spatial features of the environment.
We show that our model builds unsupervised representations through exploration that allow it to efficiently categorize visual scenes.
arXiv Detail & Related papers (2023-07-02T21:14:49Z) - MotionTrack: Learning Motion Predictor for Multiple Object Tracking [68.68339102749358]
We introduce a novel motion-based tracker, MotionTrack, centered around a learnable motion predictor.
Our experimental results demonstrate that MotionTrack yields state-of-the-art performance on datasets such as Dancetrack and SportsMOT.
arXiv Detail & Related papers (2023-06-05T04:24:11Z) - Towards Scale Consistent Monocular Visual Odometry by Learning from the
Virtual World [83.36195426897768]
We propose VRVO, a novel framework for retrieving the absolute scale from virtual data.
We first train a scale-aware disparity network using both monocular real images and stereo virtual data.
The resulting scale-consistent disparities are then integrated with a direct VO system.
arXiv Detail & Related papers (2022-03-11T01:51:54Z) - TRiPOD: Human Trajectory and Pose Dynamics Forecasting in the Wild [77.59069361196404]
TRiPOD is a novel method for predicting body dynamics based on graph attentional networks.
To incorporate a real-world challenge, we learn an indicator representing whether an estimated body joint is visible/invisible at each frame.
Our evaluation shows that TRiPOD outperforms all prior work and state-of-the-art specifically designed for each of the trajectory and pose forecasting tasks.
arXiv Detail & Related papers (2021-04-08T20:01:00Z) - RGB-D Railway Platform Monitoring and Scene Understanding for Enhanced
Passenger Safety [3.4298729855744026]
This paper proposes a flexible analysis scheme to detect and track humans on a ground plane.
We consider multiple combinations within a set of RGB- and depth-based detection and tracking modalities.
Results indicate that the combined use of depth-based spatial information and learned representations yields substantially enhanced detection and tracking accuracies.
arXiv Detail & Related papers (2021-02-23T14:44:34Z) - Training-free Monocular 3D Event Detection System for Traffic
Surveillance [93.65240041833319]
Existing event detection systems are mostly learning-based and have achieved convincing performance when a large amount of training data is available.
In real-world scenarios, collecting sufficient labeled training data is expensive and sometimes impossible.
We propose a training-free monocular 3D event detection system for traffic surveillance.
arXiv Detail & Related papers (2020-02-01T04:42:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.