Related papers: EA: An Event Autoencoder for High-Speed Vision Sensing

EA: An Event Autoencoder for High-Speed Vision Sensing

URL: http://arxiv.org/abs/2507.06459v1
Date: Wed, 09 Jul 2025 00:21:15 GMT
Title: EA: An Event Autoencoder for High-Speed Vision Sensing
Authors: Riadul Islam, Joey Mulé, Dhandeep Challagundla, Shahmir Rizvi, Sean Carson,
Abstract summary: Event cameras offer a promising alternative but pose challenges in object detection due to sparse and noisy event streams.<n>We propose an event autoencoder architecture that efficiently compresses and reconstructs event data.<n>We show that our approach achieves comparable accuracy to the YOLO-v4 model while utilizing up to $35.5times$ fewer parameters.
Score: 0.9401004127785267
License: http://creativecommons.org/licenses/by/4.0/
Abstract: High-speed vision sensing is essential for real-time perception in applications such as robotics, autonomous vehicles, and industrial automation. Traditional frame-based vision systems suffer from motion blur, high latency, and redundant data processing, limiting their performance in dynamic environments. Event cameras, which capture asynchronous brightness changes at the pixel level, offer a promising alternative but pose challenges in object detection due to sparse and noisy event streams. To address this, we propose an event autoencoder architecture that efficiently compresses and reconstructs event data while preserving critical spatial and temporal features. The proposed model employs convolutional encoding and incorporates adaptive threshold selection and a lightweight classifier to enhance recognition accuracy while reducing computational complexity. Experimental results on the existing Smart Event Face Dataset (SEFD) demonstrate that our approach achieves comparable accuracy to the YOLO-v4 model while utilizing up to $35.5\times$ fewer parameters. Implementations on embedded platforms, including Raspberry Pi 4B and NVIDIA Jetson Nano, show high frame rates ranging from 8 FPS up to 44.8 FPS. The proposed classifier exhibits up to 87.84x better FPS than the state-of-the-art and significantly improves event-based vision performance, making it ideal for low-power, high-speed applications in real-time edge computing.

Related papers

Event Quality Score (EQS): Assessing the Realism of Simulated Event Camera Streams via Distances in Latent Space [20.537672896807063]
Event cameras promise a paradigm shift in vision sensing with their low latency, high dynamic range, and asynchronous nature of events.<n>We introduce event quality score (EQS), a quality metric that utilizes activations of the RVT architecture.
arXiv Detail & Related papers (2025-04-16T22:25:57Z)
TOFFE -- Temporally-binned Object Flow from Events for High-speed and Energy-Efficient Object Detection and Tracking [10.458676835674847]
Event-based cameras offer a biologically-inspired solution to this by capturing only changes in intensity levels at exceptionally high temporal resolution and low power consumption.<n>We propose TOFFE, a lightweight hybrid framework for performing event-based object motion estimation.
arXiv Detail & Related papers (2025-01-21T20:20:34Z)
Low-Latency Scalable Streaming for Event-Based Vision [0.5242869847419834]
We propose a scalable streaming method for event-based data based on Media Over QUIC.<n>We show that a state-of-the-art object detection application is resilient to dramatic data loss.<n>We observe an average reduction in detection mAP as low as 0.36.
arXiv Detail & Related papers (2024-12-10T19:48:57Z)
A Novel Spike Transformer Network for Depth Estimation from Event Cameras via Cross-modality Knowledge Distillation [3.355813093377501]
Event cameras encode temporal changes in light intensity as asynchronous binary spikes.<n>Their unconventional spiking output and the scarcity of labelled datasets pose significant challenges to traditional image-based depth estimation methods.<n>We propose a novel energy-efficient Spike-Driven Transformer Network (SDT) for depth estimation, leveraging the unique properties of spiking data.
arXiv Detail & Related papers (2024-04-26T11:32:53Z)
EventTransAct: A video transformer-based framework for Event-camera based action recognition [52.537021302246664]
Event cameras offer new opportunities compared to standard action recognition in RGB videos. In this study, we employ a computationally efficient model, namely the video transformer network (VTN), which initially acquires spatial embeddings per event-frame. In order to better adopt the VTN for the sparse and fine-grained nature of event data, we design Event-Contrastive Loss ($mathcalL_EC$) and event-specific augmentations.
arXiv Detail & Related papers (2023-08-25T23:51:07Z)
Spatiotemporal Attention-based Semantic Compression for Real-time Video Recognition [117.98023585449808]
We propose a temporal attention-based autoencoder (STAE) architecture to evaluate the importance of frames and pixels in each frame. We develop a lightweight decoder that leverages a 3D-2D CNN combined to reconstruct missing information. Experimental results show that ViT_STAE can compress the video dataset H51 by 104x with only 5% accuracy loss.
arXiv Detail & Related papers (2023-05-22T07:47:27Z)
EV-Catcher: High-Speed Object Catching Using Low-latency Event-based Neural Networks [107.62975594230687]
We demonstrate an application where event cameras excel: accurately estimating the impact location of fast-moving objects. We introduce a lightweight event representation called Binary Event History Image (BEHI) to encode event data at low latency. We show that the system is capable of achieving a success rate of 81% in catching balls targeted at different locations, with a velocity of up to 13 m/s even on compute-constrained embedded platforms.
arXiv Detail & Related papers (2023-04-14T15:23:28Z)
Optical flow estimation from event-based cameras and spiking neural networks [0.4899818550820575]
Event-based sensors are an excellent fit for Spiking Neural Networks (SNNs) We propose a U-Net-like SNN which, after supervised training, is able to make dense optical flow estimations. Thanks to separable convolutions, we have been able to develop a light model that can nonetheless yield reasonably accurate optical flow estimates.
arXiv Detail & Related papers (2023-02-13T16:17:54Z)
Data-Model-Circuit Tri-Design for Ultra-Light Video Intelligence on Edge Devices [90.30316433184414]
We propose a data-model-hardware tri-design framework for high- throughput, low-cost, and high-accuracy MOT on HD video stream. Compared to the state-of-the-art MOT baseline, our tri-design approach can achieve 12.5x latency reduction, 20.9x effective frame rate improvement, 5.83x lower power, and 9.78x better energy efficiency, without much accuracy drop.
arXiv Detail & Related papers (2022-10-16T16:21:40Z)
Asynchronous Optimisation for Event-based Visual Odometry [53.59879499700895]
Event cameras open up new possibilities for robotic perception due to their low latency and high dynamic range. We focus on event-based visual odometry (VO) We propose an asynchronous structure-from-motion optimisation back-end.
arXiv Detail & Related papers (2022-03-02T11:28:47Z)
Event-based Asynchronous Sparse Convolutional Networks [54.094244806123235]
Event cameras are bio-inspired sensors that respond to per-pixel brightness changes in the form of asynchronous and sparse "events" We present a general framework for converting models trained on synchronous image-like event representations into asynchronous models with identical output. We show both theoretically and experimentally that this drastically reduces the computational complexity and latency of high-capacity, synchronous neural networks.
arXiv Detail & Related papers (2020-03-20T08:39:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.