3ET: Efficient Event-based Eye Tracking using a Change-Based ConvLSTM
Network
- URL: http://arxiv.org/abs/2308.11771v1
- Date: Tue, 22 Aug 2023 20:24:24 GMT
- Title: 3ET: Efficient Event-based Eye Tracking using a Change-Based ConvLSTM
Network
- Authors: Qinyu Chen, Zuowen Wang, Shih-Chii Liu, Chang Gao
- Abstract summary: This paper presents a sparse Change-Based Congenerational Long ShortTerm Memory (CB-ConvLSTM) model for event-based eye tracking.
We leverage the benefits of retina-inspired event cameras, their low-latency response and sparse output stream, over traditional frame-based cameras.
- Score: 12.697820427228573
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper presents a sparse Change-Based Convolutional Long Short-Term
Memory (CB-ConvLSTM) model for event-based eye tracking, key for
next-generation wearable healthcare technology such as AR/VR headsets. We
leverage the benefits of retina-inspired event cameras, namely their
low-latency response and sparse output event stream, over traditional
frame-based cameras. Our CB-ConvLSTM architecture efficiently extracts
spatio-temporal features for pupil tracking from the event stream,
outperforming conventional CNN structures. Utilizing a delta-encoded recurrent
path enhancing activation sparsity, CB-ConvLSTM reduces arithmetic operations
by approximately 4.7$\times$ without losing accuracy when tested on a
\texttt{v2e}-generated event dataset of labeled pupils. This increase in
efficiency makes it ideal for real-time eye tracking in resource-constrained
devices. The project code and dataset are openly available at
\url{https://github.com/qinche106/cb-convlstm-eyetracking}.
Related papers
- FACET: Fast and Accurate Event-Based Eye Tracking Using Ellipse Modeling for Extended Reality [14.120171971211777]
Event cameras offer a promising alternative due to their high temporal resolution and low power consumption.
We present FACET (Fast and Accurate Event-based Eye Tracking), an end-to-end neural network that directly outputs pupil ellipse parameters from event data.
On the enhanced EV-Eye test set, FACET achieves an average pupil center error of 0.20 pixels and an inference time of 0.53 ms.
arXiv Detail & Related papers (2024-09-23T22:31:38Z) - Are Vision xLSTM Embedded UNet More Reliable in Medical 3D Image Segmentation? [3.1777394653936937]
This paper investigates the integration of CNNs and Vision Extended Long Short-Term Memory (Vision-xLSTM) models by introducing a novel approach called UVixLSTM.
The Vision-xLSTM blocks captures temporal and global relationships within the patches extracted from the CNN feature maps.
UVixLSTM exhibits superior performance compared to state-of-the-art networks on the publicly-available dataset.
arXiv Detail & Related papers (2024-06-24T08:01:05Z) - MambaPupil: Bidirectional Selective Recurrent model for Event-based Eye tracking [50.26836546224782]
Event-based eye tracking has shown great promise with the high temporal resolution and low redundancy.
The diversity and abruptness of eye movement patterns, including blinking, fixating, saccades, and smooth pursuit, pose significant challenges for eye localization.
This paper proposes a bidirectional long-term sequence modeling and time-varying state selection mechanism to fully utilize contextual temporal information.
arXiv Detail & Related papers (2024-04-18T11:09:25Z) - EventTransAct: A video transformer-based framework for Event-camera
based action recognition [52.537021302246664]
Event cameras offer new opportunities compared to standard action recognition in RGB videos.
In this study, we employ a computationally efficient model, namely the video transformer network (VTN), which initially acquires spatial embeddings per event-frame.
In order to better adopt the VTN for the sparse and fine-grained nature of event data, we design Event-Contrastive Loss ($mathcalL_EC$) and event-specific augmentations.
arXiv Detail & Related papers (2023-08-25T23:51:07Z) - RN-Net: Reservoir Nodes-Enabled Neuromorphic Vision Sensing Network [7.112892720740359]
Event-based cameras are inspired by spiking and asynchronous spike representation of the biological visual system.
We propose a neural network architecture, based on simple convolution layers integrated with dynamic temporal encoding for local and global reservoirs.
RN-Net achieves the highest accuracy of 99.2% for DV128 Gesture reported to date, and one of the highest accuracy of 67.5% for DVS Lip dataset at a much smaller network size.
arXiv Detail & Related papers (2023-03-19T21:20:45Z) - Dual Memory Aggregation Network for Event-Based Object Detection with
Learnable Representation [79.02808071245634]
Event-based cameras are bio-inspired sensors that capture brightness change of every pixel in an asynchronous manner.
Event streams are divided into grids in the x-y-t coordinates for both positive and negative polarity, producing a set of pillars as 3D tensor representation.
Long memory is encoded in the hidden state of adaptive convLSTMs while short memory is modeled by computing spatial-temporal correlation between event pillars.
arXiv Detail & Related papers (2023-03-17T12:12:41Z) - HALSIE: Hybrid Approach to Learning Segmentation by Simultaneously
Exploiting Image and Event Modalities [6.543272301133159]
Event cameras detect changes in per-pixel intensity to generate asynchronous event streams.
They offer great potential for accurate semantic map retrieval in real-time autonomous systems.
Existing implementations for event segmentation suffer from sub-based performance.
We propose hybrid end-to-end learning framework HALSIE to reduce inference cost by up to $20times$ versus art.
arXiv Detail & Related papers (2022-11-19T17:09:50Z) - AEGNN: Asynchronous Event-based Graph Neural Networks [54.528926463775946]
Event-based Graph Neural Networks generalize standard GNNs to process events as "evolving"-temporal graphs.
AEGNNs are easily trained on synchronous inputs and can be converted to efficient, "asynchronous" networks at test time.
arXiv Detail & Related papers (2022-03-31T16:21:12Z) - Learning Spatio-Appearance Memory Network for High-Performance Visual
Tracking [79.80401607146987]
Existing object tracking usually learns a bounding-box based template to match visual targets across frames, which cannot accurately learn a pixel-wise representation.
This paper presents a novel segmentation-based tracking architecture, which is equipped with a local-temporal memory network to learn accurate-temporal correspondence.
arXiv Detail & Related papers (2020-09-21T08:12:02Z) - Object Tracking through Residual and Dense LSTMs [67.98948222599849]
Deep learning-based trackers based on LSTMs (Long Short-Term Memory) recurrent neural networks have emerged as a powerful alternative.
DenseLSTMs outperform Residual and regular LSTM, and offer a higher resilience to nuisances.
Our case study supports the adoption of residual-based RNNs for enhancing the robustness of other trackers.
arXiv Detail & Related papers (2020-06-22T08:20:17Z) - A Differentiable Recurrent Surface for Asynchronous Event-Based Data [19.605628378366667]
We propose Matrix-LSTM, a grid of Long Short-Term Memory (LSTM) cells that efficiently process events and learn end-to-end task-dependent event-surfaces.
Compared to existing reconstruction approaches, our learned event-surface shows good flexibility and on optical flow estimation.
It improves the state-of-the-art of event-based object classification on the N-Cars dataset.
arXiv Detail & Related papers (2020-01-10T14:09:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.