Related papers: A Lightweight Spatiotemporal Network for Online Eye Tracking with Event Camera

A Lightweight Spatiotemporal Network for Online Eye Tracking with Event Camera

URL: http://arxiv.org/abs/2404.08858v1
Date: Sat, 13 Apr 2024 00:13:20 GMT
Title: A Lightweight Spatiotemporal Network for Online Eye Tracking with Event Camera
Authors: Yan Ru Pei, Sasskia Brüers, Sébastien Crouzet, Douglas McLelland, Olivier Coenen,
Abstract summary: Event-based data are commonly encountered in edge computing environments where efficiency and low latency are critical. To interface with such data and leverage their rich temporal temporal, we propose a causal convolutional network. We apply our model on the AIS 2024 event-based eye tracking challenge, reaching a score of 0.9916 p10 accuracy on the Kaggle private testset.
Score: 0.8576354642891824
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Event-based data are commonly encountered in edge computing environments where efficiency and low latency are critical. To interface with such data and leverage their rich temporal features, we propose a causal spatiotemporal convolutional network. This solution targets efficient implementation on edge-appropriate hardware with limited resources in three ways: 1) deliberately targets a simple architecture and set of operations (convolutions, ReLU activations) 2) can be configured to perform online inference efficiently via buffering of layer outputs 3) can achieve more than 90% activation sparsity through regularization during training, enabling very significant efficiency gains on event-based processors. In addition, we propose a general affine augmentation strategy acting directly on the events, which alleviates the problem of dataset scarcity for event-based systems. We apply our model on the AIS 2024 event-based eye tracking challenge, reaching a score of 0.9916 p10 accuracy on the Kaggle private testset.

Related papers

Self-Supervised Event Representations: Towards Accurate, Real-Time Perception on SoC FPGAs [0.0]
Event cameras offer significant advantages over traditional frame-based sensors.<n>The effective processing of their sparse, asynchronous event streams remains challenging.<n>This paper introduces a novel Self-Supervised Event Representation (SSER) method.
arXiv Detail & Related papers (2025-05-12T13:32:08Z)
Dual-Path Enhancements in Event-Based Eye Tracking: Augmented Robustness and Adaptive Temporal Modeling [0.0]
Event-based eye tracking has become a pivotal technology for augmented reality and human-computer interaction. Existing methods struggle with real-world challenges such as abrupt eye movements and environmental noise. We introduce two key advancements. First, a robust data augmentation pipeline incorporating temporal shift, spatial flip, and event deletion improves model resilience. Second, we propose KnightPupil, a hybrid architecture combining an EfficientNet-B3 backbone for spatial feature extraction, a bidirectional GRU for contextual temporal modeling, and a Linear Time-Varying State-Space Module.
arXiv Detail & Related papers (2025-04-14T07:57:22Z)
Spatiotemporal Attention Learning Framework for Event-Driven Object Recognition [1.0445957451908694]
Event-based vision sensors capture local pixel-level intensity changes as a sparse event stream containing position, polarity, and information. This paper presents a novel learning framework for event-based object recognition, utilizing a VARGG network enhanced with Contemporalal Block Attention Module (CBAM) Our approach achieves comparable performance to state-of-the-art ResNet-based methods while reducing parameter count by 2.3% compared to the original VGG model.
arXiv Detail & Related papers (2025-04-01T02:37:54Z)
FLARES: Fast and Accurate LiDAR Multi-Range Semantic Segmentation [52.89847760590189]
3D scene understanding is a critical yet challenging task in autonomous driving. Recent methods leverage the range-view representation to improve processing efficiency. We re-design the workflow for range-view-based LiDAR semantic segmentation.
arXiv Detail & Related papers (2025-02-13T12:39:26Z)
Co-designing a Sub-millisecond Latency Event-based Eye Tracking System with Submanifold Sparse CNN [8.613703056677457]
Eye-tracking technology is integral to numerous consumer electronics applications, particularly in virtual and augmented reality (VR/AR) Yet, achieving optimal performance across all these fronts presents a formidable challenge. We tackle this challenge through a synergistic software/ hardware co-design of the system with an event camera. Our system achieves 81% p5 accuracy, 99.5% p10 accuracy, and 3.71 Meanean Distance with 0.7 ms latency while only consuming 2.29 mJ per inference.
arXiv Detail & Related papers (2024-04-22T15:28:42Z)
Exploring Dynamic Transformer for Efficient Object Tracking [58.120191254379854]
We propose DyTrack, a dynamic transformer framework for efficient tracking. DyTrack automatically learns to configure proper reasoning routes for various inputs, gaining better utilization of the available computational budget. Experiments on multiple benchmarks demonstrate that DyTrack achieves promising speed-precision trade-offs with only a single model.
arXiv Detail & Related papers (2024-03-26T12:31:58Z)
Ev-Edge: Efficient Execution of Event-based Vision Algorithms on Commodity Edge Platforms [10.104371980353973]
Ev-Edge is a framework that contains three key optimizations to boost the performance of event-based vision systems on edge platforms. On several state-of-art networks for a range of autonomous navigation tasks, Ev-Edge achieves 1.28x-2.05x improvements in latency and 1.23x-2.15x in energy.
arXiv Detail & Related papers (2024-03-23T04:44:55Z)
SpikePoint: An Efficient Point-based Spiking Neural Network for Event Cameras Action Recognition [11.178792888084692]
Spiking Neural Networks (SNNs) have gained significant attention due to their remarkable efficiency and fault tolerance. We propose SpikePoint, a novel end-to-end point-based SNN architecture. SpikePoint excels at processing sparse event cloud data, effectively extracting both global and local features.
arXiv Detail & Related papers (2023-10-11T04:38:21Z)
Latency-aware Unified Dynamic Networks for Efficient Image Recognition [72.8951331472913]
LAUDNet is a framework to bridge the theoretical and practical efficiency gap in dynamic networks. It integrates three primary dynamic paradigms-spatially adaptive computation, dynamic layer skipping, and dynamic channel skipping. It can notably reduce the latency of models like ResNet by over 50% on platforms such as V100,3090, and TX2 GPUs.
arXiv Detail & Related papers (2023-08-30T10:57:41Z)
Pushing the Limits of Asynchronous Graph-based Object Detection with Event Cameras [62.70541164894224]
We introduce several architecture choices which allow us to scale the depth and complexity of such models while maintaining low computation. Our method runs 3.7 times faster than a dense graph neural network, taking only 8.4 ms per forward pass.
arXiv Detail & Related papers (2022-11-22T15:14:20Z)
HALSIE: Hybrid Approach to Learning Segmentation by Simultaneously Exploiting Image and Event Modalities [6.543272301133159]
Event cameras detect changes in per-pixel intensity to generate asynchronous event streams. They offer great potential for accurate semantic map retrieval in real-time autonomous systems. Existing implementations for event segmentation suffer from sub-based performance. We propose hybrid end-to-end learning framework HALSIE to reduce inference cost by up to $20times$ versus art.
arXiv Detail & Related papers (2022-11-19T17:09:50Z)
AEGNN: Asynchronous Event-based Graph Neural Networks [54.528926463775946]
Event-based Graph Neural Networks generalize standard GNNs to process events as "evolving"-temporal graphs. AEGNNs are easily trained on synchronous inputs and can be converted to efficient, "asynchronous" networks at test time.
arXiv Detail & Related papers (2022-03-31T16:21:12Z)
Multi-Exit Semantic Segmentation Networks [78.44441236864057]
We propose a framework for converting state-of-the-art segmentation models to MESS networks. specially trained CNNs that employ parametrised early exits along their depth to save during inference on easier samples. We co-optimise the number, placement and architecture of the attached segmentation heads, along with the exit policy, to adapt to the device capabilities and application-specific requirements.
arXiv Detail & Related papers (2021-06-07T11:37:03Z)
Event-based Asynchronous Sparse Convolutional Networks [54.094244806123235]
Event cameras are bio-inspired sensors that respond to per-pixel brightness changes in the form of asynchronous and sparse "events" We present a general framework for converting models trained on synchronous image-like event representations into asynchronous models with identical output. We show both theoretically and experimentally that this drastically reduces the computational complexity and latency of high-capacity, synchronous neural networks.
arXiv Detail & Related papers (2020-03-20T08:39:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.