Dual-Path Enhancements in Event-Based Eye Tracking: Augmented Robustness and Adaptive Temporal Modeling
- URL: http://arxiv.org/abs/2504.09960v1
- Date: Mon, 14 Apr 2025 07:57:22 GMT
- Title: Dual-Path Enhancements in Event-Based Eye Tracking: Augmented Robustness and Adaptive Temporal Modeling
- Authors: Hoang M. Truong, Vinh-Thuan Ly, Huy G. Tran, Thuan-Phat Nguyen, Tram T. Doan,
- Abstract summary: Event-based eye tracking has become a pivotal technology for augmented reality and human-computer interaction.<n>Existing methods struggle with real-world challenges such as abrupt eye movements and environmental noise.<n>We introduce two key advancements. First, a robust data augmentation pipeline incorporating temporal shift, spatial flip, and event deletion improves model resilience.<n>Second, we propose KnightPupil, a hybrid architecture combining an EfficientNet-B3 backbone for spatial feature extraction, a bidirectional GRU for contextual temporal modeling, and a Linear Time-Varying State-Space Module.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Event-based eye tracking has become a pivotal technology for augmented reality and human-computer interaction. Yet, existing methods struggle with real-world challenges such as abrupt eye movements and environmental noise. Building on the efficiency of the Lightweight Spatiotemporal Network-a causal architecture optimized for edge devices-we introduce two key advancements. First, a robust data augmentation pipeline incorporating temporal shift, spatial flip, and event deletion improves model resilience, reducing Euclidean distance error by 12% (1.61 vs. 1.70 baseline) on challenging samples. Second, we propose KnightPupil, a hybrid architecture combining an EfficientNet-B3 backbone for spatial feature extraction, a bidirectional GRU for contextual temporal modeling, and a Linear Time-Varying State-Space Module to adapt to sparse inputs and noise dynamically. Evaluated on the 3ET+ benchmark, our framework achieved 1.61 Euclidean distance on the private test set of the Event-based Eye Tracking Challenge at CVPR 2025, demonstrating its effectiveness for practical deployment in AR/VR systems while providing a foundation for future innovations in neuromorphic vision.
Related papers
- S3MOT: Monocular 3D Object Tracking with Selective State Space Model [3.5047603107971397]
Multi-object tracking in 3D space is essential for advancing robotics and computer applications.
It remains a significant challenge in monocular setups due to the difficulty of mining 3D associations from 2D video streams.
We present three innovative techniques to enhance the fusion of heterogeneous cues for monocular 3D MOT.
arXiv Detail & Related papers (2025-04-25T04:45:35Z) - Spatiotemporal Attention Learning Framework for Event-Driven Object Recognition [1.0445957451908694]
Event-based vision sensors capture local pixel-level intensity changes as a sparse event stream containing position, polarity, and information.<n>This paper presents a novel learning framework for event-based object recognition, utilizing a VARGG network enhanced with Contemporalal Block Attention Module (CBAM)<n>Our approach achieves comparable performance to state-of-the-art ResNet-based methods while reducing parameter count by 2.3% compared to the original VGG model.
arXiv Detail & Related papers (2025-04-01T02:37:54Z) - ReCoM: Realistic Co-Speech Motion Generation with Recurrent Embedded Transformer [58.49950218437718]
We present ReCoM, an efficient framework for generating high-fidelity and generalizable human body motions synchronized with speech.<n>The core innovation lies in the Recurrent Embedded Transformer (RET), which integrates Dynamic Embedding Regularization (DER) into a Vision Transformer (ViT) core architecture.<n>To enhance model robustness, we incorporate the proposed DER strategy, which equips the model with dual capabilities of noise resistance and cross-domain generalization.
arXiv Detail & Related papers (2025-03-27T16:39:40Z) - OccLinker: Deflickering Occupancy Networks through Lightweight Spatio-Temporal Correlation [15.726401007342087]
Vision-based occupancy networks (VONs) provide an end-to-end solution for reconstructing 3D environments in autonomous driving.<n>Recent approaches have incorporated historical data to mitigate the issue, but they often incur high computational costs and introduce noisy information that interferes with object detection.<n>We propose OccLinker, a novel plugin framework designed to seamlessly integrate with existing VONs for boosting performance.
arXiv Detail & Related papers (2025-02-21T13:07:45Z) - You Only Crash Once v2: Perceptually Consistent Strong Features for One-Stage Domain Adaptive Detection of Space Terrain [4.339510167603377]
In-situ detection of planetary, lunar, and small-body surface terrain is crucial for autonomous spacecraft applications.<n>Unsupervised Domain Adaptation (UDA) offers a promising solution by facilitating model training with disparate data sources.<n>We propose novel additions to the VSA scheme that enhance terrain detection capabilities under UDA.
arXiv Detail & Related papers (2025-01-23T14:58:49Z) - ALOcc: Adaptive Lifting-based 3D Semantic Occupancy and Cost Volume-based Flow Prediction [89.89610257714006]
Existing methods prioritize higher accuracy to cater to the demands of these tasks.
We introduce a series of targeted improvements for 3D semantic occupancy prediction and flow estimation.
Our purelytemporalal architecture framework, named ALOcc, achieves an optimal tradeoff between speed and accuracy.
arXiv Detail & Related papers (2024-11-12T11:32:56Z) - Task-Oriented Real-time Visual Inference for IoVT Systems: A Co-design Framework of Neural Networks and Edge Deployment [61.20689382879937]
Task-oriented edge computing addresses this by shifting data analysis to the edge.
Existing methods struggle to balance high model performance with low resource consumption.
We propose a novel co-design framework to optimize neural network architecture.
arXiv Detail & Related papers (2024-10-29T19:02:54Z) - A Lightweight Spatiotemporal Network for Online Eye Tracking with Event Camera [0.8576354642891824]
Event-based data are commonly encountered in edge computing environments where efficiency and low latency are critical.
To interface with such data and leverage their rich temporal temporal, we propose a causal convolutional network.
We apply our model on the AIS 2024 event-based eye tracking challenge, reaching a score of 0.9916 p10 accuracy on the Kaggle private testset.
arXiv Detail & Related papers (2024-04-13T00:13:20Z) - Recurrent Vision Transformers for Object Detection with Event Cameras [62.27246562304705]
We present Recurrent Vision Transformers (RVTs), a novel backbone for object detection with event cameras.
RVTs can be trained from scratch to reach state-of-the-art performance on event-based object detection.
Our study brings new insights into effective design choices that can be fruitful for research beyond event-based vision.
arXiv Detail & Related papers (2022-12-11T20:28:59Z) - Spatio-temporal Modeling for Large-scale Vehicular Networks Using Graph
Convolutional Networks [110.80088437391379]
A graph-based framework called SMART is proposed to model and keep track of the statistics of vehicle-to-temporal (V2I) communication latency across a large geographical area.
We develop a graph reconstruction-based approach using a graph convolutional network integrated with a deep Q-networks algorithm.
Our results show that the proposed method can significantly improve both the accuracy and efficiency for modeling and the latency performance of large vehicular networks.
arXiv Detail & Related papers (2021-03-13T06:56:29Z) - Risk-Averse MPC via Visual-Inertial Input and Recurrent Networks for
Online Collision Avoidance [95.86944752753564]
We propose an online path planning architecture that extends the model predictive control (MPC) formulation to consider future location uncertainties.
Our algorithm combines an object detection pipeline with a recurrent neural network (RNN) which infers the covariance of state estimates.
The robustness of our methods is validated on complex quadruped robot dynamics and can be generally applied to most robotic platforms.
arXiv Detail & Related papers (2020-07-28T07:34:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.