A Lightweight Spatiotemporal Network for Online Eye Tracking with Event Camera
- URL: http://arxiv.org/abs/2404.08858v1
- Date: Sat, 13 Apr 2024 00:13:20 GMT
- Title: A Lightweight Spatiotemporal Network for Online Eye Tracking with Event Camera
- Authors: Yan Ru Pei, Sasskia Brüers, Sébastien Crouzet, Douglas McLelland, Olivier Coenen,
- Abstract summary: Event-based data are commonly encountered in edge computing environments where efficiency and low latency are critical.
To interface with such data and leverage their rich temporal temporal, we propose a causal convolutional network.
We apply our model on the AIS 2024 event-based eye tracking challenge, reaching a score of 0.9916 p10 accuracy on the Kaggle private testset.
- Score: 0.8576354642891824
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Event-based data are commonly encountered in edge computing environments where efficiency and low latency are critical. To interface with such data and leverage their rich temporal features, we propose a causal spatiotemporal convolutional network. This solution targets efficient implementation on edge-appropriate hardware with limited resources in three ways: 1) deliberately targets a simple architecture and set of operations (convolutions, ReLU activations) 2) can be configured to perform online inference efficiently via buffering of layer outputs 3) can achieve more than 90% activation sparsity through regularization during training, enabling very significant efficiency gains on event-based processors. In addition, we propose a general affine augmentation strategy acting directly on the events, which alleviates the problem of dataset scarcity for event-based systems. We apply our model on the AIS 2024 event-based eye tracking challenge, reaching a score of 0.9916 p10 accuracy on the Kaggle private testset.
Related papers
- Co-designing a Sub-millisecond Latency Event-based Eye Tracking System with Submanifold Sparse CNN [8.613703056677457]
Eye-tracking technology is integral to numerous consumer electronics applications, particularly in virtual and augmented reality (VR/AR)
Yet, achieving optimal performance across all these fronts presents a formidable challenge.
We tackle this challenge through a synergistic software/ hardware co-design of the system with an event camera.
Our system achieves 81% p5 accuracy, 99.5% p10 accuracy, and 3.71 Meanean Distance with 0.7 ms latency while only consuming 2.29 mJ per inference.
arXiv Detail & Related papers (2024-04-22T15:28:42Z) - Exploring Dynamic Transformer for Efficient Object Tracking [58.120191254379854]
We propose DyTrack, a dynamic transformer framework for efficient tracking.
DyTrack automatically learns to configure proper reasoning routes for various inputs, gaining better utilization of the available computational budget.
Experiments on multiple benchmarks demonstrate that DyTrack achieves promising speed-precision trade-offs with only a single model.
arXiv Detail & Related papers (2024-03-26T12:31:58Z) - Ev-Edge: Efficient Execution of Event-based Vision Algorithms on Commodity Edge Platforms [10.104371980353973]
Ev-Edge is a framework that contains three key optimizations to boost the performance of event-based vision systems on edge platforms.
On several state-of-art networks for a range of autonomous navigation tasks, Ev-Edge achieves 1.28x-2.05x improvements in latency and 1.23x-2.15x in energy.
arXiv Detail & Related papers (2024-03-23T04:44:55Z) - SpikePoint: An Efficient Point-based Spiking Neural Network for Event
Cameras Action Recognition [11.178792888084692]
Spiking Neural Networks (SNNs) have gained significant attention due to their remarkable efficiency and fault tolerance.
We propose SpikePoint, a novel end-to-end point-based SNN architecture.
SpikePoint excels at processing sparse event cloud data, effectively extracting both global and local features.
arXiv Detail & Related papers (2023-10-11T04:38:21Z) - Latency-aware Unified Dynamic Networks for Efficient Image Recognition [72.8951331472913]
LAUDNet is a framework to bridge the theoretical and practical efficiency gap in dynamic networks.
It integrates three primary dynamic paradigms-spatially adaptive computation, dynamic layer skipping, and dynamic channel skipping.
It can notably reduce the latency of models like ResNet by over 50% on platforms such as V100,3090, and TX2 GPUs.
arXiv Detail & Related papers (2023-08-30T10:57:41Z) - Pushing the Limits of Asynchronous Graph-based Object Detection with
Event Cameras [62.70541164894224]
We introduce several architecture choices which allow us to scale the depth and complexity of such models while maintaining low computation.
Our method runs 3.7 times faster than a dense graph neural network, taking only 8.4 ms per forward pass.
arXiv Detail & Related papers (2022-11-22T15:14:20Z) - HALSIE: Hybrid Approach to Learning Segmentation by Simultaneously
Exploiting Image and Event Modalities [6.543272301133159]
Event cameras detect changes in per-pixel intensity to generate asynchronous event streams.
They offer great potential for accurate semantic map retrieval in real-time autonomous systems.
Existing implementations for event segmentation suffer from sub-based performance.
We propose hybrid end-to-end learning framework HALSIE to reduce inference cost by up to $20times$ versus art.
arXiv Detail & Related papers (2022-11-19T17:09:50Z) - AEGNN: Asynchronous Event-based Graph Neural Networks [54.528926463775946]
Event-based Graph Neural Networks generalize standard GNNs to process events as "evolving"-temporal graphs.
AEGNNs are easily trained on synchronous inputs and can be converted to efficient, "asynchronous" networks at test time.
arXiv Detail & Related papers (2022-03-31T16:21:12Z) - Multi-Exit Semantic Segmentation Networks [78.44441236864057]
We propose a framework for converting state-of-the-art segmentation models to MESS networks.
specially trained CNNs that employ parametrised early exits along their depth to save during inference on easier samples.
We co-optimise the number, placement and architecture of the attached segmentation heads, along with the exit policy, to adapt to the device capabilities and application-specific requirements.
arXiv Detail & Related papers (2021-06-07T11:37:03Z) - Event-based Asynchronous Sparse Convolutional Networks [54.094244806123235]
Event cameras are bio-inspired sensors that respond to per-pixel brightness changes in the form of asynchronous and sparse "events"
We present a general framework for converting models trained on synchronous image-like event representations into asynchronous models with identical output.
We show both theoretically and experimentally that this drastically reduces the computational complexity and latency of high-capacity, synchronous neural networks.
arXiv Detail & Related papers (2020-03-20T08:39:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.