MambaPupil: Bidirectional Selective Recurrent model for Event-based Eye tracking
- URL: http://arxiv.org/abs/2404.12083v2
- Date: Tue, 30 Apr 2024 11:17:55 GMT
- Title: MambaPupil: Bidirectional Selective Recurrent model for Event-based Eye tracking
- Authors: Zhong Wang, Zengyu Wan, Han Han, Bohao Liao, Yuliang Wu, Wei Zhai, Yang Cao, Zheng-jun Zha,
- Abstract summary: Event-based eye tracking has shown great promise with the high temporal resolution and low redundancy.
The diversity and abruptness of eye movement patterns, including blinking, fixating, saccades, and smooth pursuit, pose significant challenges for eye localization.
This paper proposes a bidirectional long-term sequence modeling and time-varying state selection mechanism to fully utilize contextual temporal information.
- Score: 50.26836546224782
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Event-based eye tracking has shown great promise with the high temporal resolution and low redundancy provided by the event camera. However, the diversity and abruptness of eye movement patterns, including blinking, fixating, saccades, and smooth pursuit, pose significant challenges for eye localization. To achieve a stable event-based eye-tracking system, this paper proposes a bidirectional long-term sequence modeling and time-varying state selection mechanism to fully utilize contextual temporal information in response to the variability of eye movements. Specifically, the MambaPupil network is proposed, which consists of the multi-layer convolutional encoder to extract features from the event representations, a bidirectional Gated Recurrent Unit (GRU), and a Linear Time-Varying State Space Module (LTV-SSM), to selectively capture contextual correlation from the forward and backward temporal relationship. Furthermore, the Bina-rep is utilized as a compact event representation, and the tailor-made data augmentation, called as Event-Cutout, is proposed to enhance the model's robustness by applying spatial random masking to the event image. The evaluation on the ThreeET-plus benchmark shows the superior performance of the MambaPupil, which secured the 1st place in CVPR'2024 AIS Event-based Eye Tracking challenge.
Related papers
- SpikeMOT: Event-based Multi-Object Tracking with Sparse Motion Features [52.213656737672935]
SpikeMOT is an event-based multi-object tracker.
SpikeMOT uses spiking neural networks to extract sparsetemporal features from event streams associated with objects.
arXiv Detail & Related papers (2023-09-29T05:13:43Z) - Dual Memory Aggregation Network for Event-Based Object Detection with
Learnable Representation [79.02808071245634]
Event-based cameras are bio-inspired sensors that capture brightness change of every pixel in an asynchronous manner.
Event streams are divided into grids in the x-y-t coordinates for both positive and negative polarity, producing a set of pillars as 3D tensor representation.
Long memory is encoded in the hidden state of adaptive convLSTMs while short memory is modeled by computing spatial-temporal correlation between event pillars.
arXiv Detail & Related papers (2023-03-17T12:12:41Z) - Modeling Continuous Motion for 3D Point Cloud Object Tracking [54.48716096286417]
This paper presents a novel approach that views each tracklet as a continuous stream.
At each timestamp, only the current frame is fed into the network to interact with multi-frame historical features stored in a memory bank.
To enhance the utilization of multi-frame features for robust tracking, a contrastive sequence enhancement strategy is proposed.
arXiv Detail & Related papers (2023-03-14T02:58:27Z) - Recurrent Vision Transformers for Object Detection with Event Cameras [62.27246562304705]
We present Recurrent Vision Transformers (RVTs), a novel backbone for object detection with event cameras.
RVTs can be trained from scratch to reach state-of-the-art performance on event-based object detection.
Our study brings new insights into effective design choices that can be fruitful for research beyond event-based vision.
arXiv Detail & Related papers (2022-12-11T20:28:59Z) - RGB-Event Fusion for Moving Object Detection in Autonomous Driving [3.5397758597664306]
Moving Object Detection (MOD) is a critical vision task for successfully achieving safe autonomous driving.
Recent advances in sensor technologies, especially the Event camera, can naturally complement the conventional camera approach to better model moving objects.
We propose RENet, a novel RGB-Event fusion Network, that jointly exploits the two complementary modalities to achieve more robust MOD.
arXiv Detail & Related papers (2022-09-17T12:59:08Z) - Asynchronous Optimisation for Event-based Visual Odometry [53.59879499700895]
Event cameras open up new possibilities for robotic perception due to their low latency and high dynamic range.
We focus on event-based visual odometry (VO)
We propose an asynchronous structure-from-motion optimisation back-end.
arXiv Detail & Related papers (2022-03-02T11:28:47Z) - Event-LSTM: An Unsupervised and Asynchronous Learning-based
Representation for Event-based Data [8.931153235278831]
Event cameras are activity-driven bio-inspired vision sensors.
We propose Event-LSTM, an unsupervised Auto-Encoder architecture made up of LSTM layers.
We also push state-of-the-art event de-noising forward by introducing memory into the de-noising process.
arXiv Detail & Related papers (2021-05-10T09:18:52Z) - A Differentiable Recurrent Surface for Asynchronous Event-Based Data [19.605628378366667]
We propose Matrix-LSTM, a grid of Long Short-Term Memory (LSTM) cells that efficiently process events and learn end-to-end task-dependent event-surfaces.
Compared to existing reconstruction approaches, our learned event-surface shows good flexibility and on optical flow estimation.
It improves the state-of-the-art of event-based object classification on the N-Cars dataset.
arXiv Detail & Related papers (2020-01-10T14:09:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.